Summary: | ASTERISK-24983: IAX deadlock between hangup and scheduled actions (ex. largrq) | ||
Reporter: | Y Ateya (yateya) | Labels: | |
Date Opened: | 2015-04-20 15:16:23 | Date Closed: | 2015-06-10 12:21:24 |
Priority: | Major | Regression? | |
Status: | Closed/Complete | Components: | Channels/chan_iax2 |
Versions: | 13.3.2 | Frequency of Occurrence | Occasional |
Related Issues: | |||
Environment: | Ubuntu | Attachments: | ( 0) 0001-ASTERISK-24983-Prevent-deadlock-between-hanup-and-se.patch ( 1) iax_hangup_deadlock.diff |
Description: | Randomly some of my asterisk servers (SIP-to-IAX) _freezes_. After some investigation I found that this happens because of a deadlock between {{iax2_hangup}} and {{send_lagrq}} (It can happen with {{send_ping}} too).
Here is the sequence of _unfortunate_ events to have this deadlock: - When a call starts, {{send_lagrq}} is scheduled to run after some time. - {{iax2_hangup}} is called. - It locks the call number lock {{ast_mutex_lock(&iaxsl\[callno\])}}. Note that later in hangup procedures, we will try to delete scheduled {{send_lagrq}}. - Before Deleting {{send_lagrq}}, context switch happened and scheduler found that it is time to run the scheduled {{send_lagrq}}! - {{send_lagrq}} is called and tries to acquire call number lock {{ast_mutex_lock(&iaxsl\[callno\])}}. So {{send_lagrq}} is waiting for hangup to finish. - After sometime, {{iax2_hangup}} reaches the place to delete scheduled lagrq and ping events. This occurs in function {{iax2_destroy_helper}} by calling {{AST_SCHED_DEL_SPINLOCK(sched, pvt->lagid, &iaxsl\[pvt->callno\])}}, which calls {{ast_sched_del}}, which finds that {{send_lagrq}} is still being serverd {{else if (con->currently_executing && (id == con->currently_executing->id))}}, so it **wait indefinitly**. - *Scheduler is blocked*: All events in the scheduler are waiting for this event to finish. - *IAX call is blocked*: every one tries to lock the call lock is locked too. After minutes I ended up with hundreds of locked threads. I don't know which is better: - Fixing chan_iax2 to prevent this deadlock. - Fixing scheduler to prevent this deadlock. Changing scheduler behavior will impact many people, so I decided to change chan_iax to fix the problem AND change scheduler to report when this deadlock happens. Patch attached, gerrit added too (https://gerrit.asterisk.org/#/c/169/). | ||
Comments: | By: Y Ateya (yateya) 2015-04-20 17:01:25.639-0500 Patch againt git master. By: Y Ateya (yateya) 2015-04-22 15:33:09.674-0500 Updated patch. |