Summary: | ASTERISK-16593: ast_add_hint deadlock in pbx.c MUTEX ast_hint_state_changed | ||
Reporter: | Alan Graham (zerohalo) | Labels: | |
Date Opened: | 2010-08-19 13:33:20 | Date Closed: | 2010-11-02 09:27:57 |
Priority: | Major | Regression? | No |
Status: | Closed/Complete | Components: | Core/PBX |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) bt_scrubbed.txt ( 1) csl.txt | |
Description: | getting deadlock randomly, call processing stops - appears to happen when the hint state changes when a call is hung up. 1.4.34 w/ patch from ASTERISK-16365 'core show locks' and backtrace from ast_grab_core attached | ||
Comments: | By: Alan Graham (zerohalo) 2010-08-23 12:37:46 getting these multiple times a day, sometimes during an extensions reload, sometimes not. I don't see any real pattern to when these happen. By: Alan Graham (zerohalo) 2010-08-31 09:59:13 is there anything else I can provide that would help with this? By: Digium Subversion (svnbot) 2010-09-10 15:03:52 Repository: asterisk Revision: 286070 U branches/1.4/channels/chan_sip.c ------------------------------------------------------------------------ r286070 | dvossel | 2010-09-10 15:03:51 -0500 (Fri, 10 Sep 2010) | 32 lines Fixes sip extension state update DEADLOCK PROBLEM: In chan_sip, and all the other channel drivers, it is common for us to hold the tech_pvt lock while we ask the Asterisk core about an extension and context. Every time we do this the locking order becomes, (1. tech_pvt lock ---> 2. global context lock). In chan_sip when a dialog subscribes to a hint, that locking order is reversed in the extensionstate callback which will occur outside of the channel_driver's monitor loop. So, on an extension state update we have (1. global context lock ----> 2. tech_pvt lock). Typically when we have to do a reversed locking order like this we'd just do some sort of deadlock avoidance to fix the problem... That will not work here. There are more locks involved here than just the context and tech_pvt. Those are the two that are colliding, but it is impossible to give up the context lock because the global hints list lock MUST be held as well and we can not give that lock up during the extensionstate callback traversal... The locking order for the context and hints are (1. global context lock ----> 2. hints list lock). Deadlock avoidance is not an option here. SOLUTION: The solution this patch implements is to queue the extension state updates into a list and send the NOTIFY messages out during the do_monitor pvt traversal. This clears out the problem of having to hold the context lock before the tech_pvt lock entirely. (closes issue ASTERISK-16593) Reported by: zerohalo ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=286070 By: Digium Subversion (svnbot) 2010-09-10 15:04:43 Repository: asterisk Revision: 286071 _U branches/1.6.2/ ------------------------------------------------------------------------ r286071 | dvossel | 2010-09-10 15:04:43 -0500 (Fri, 10 Sep 2010) | 37 lines Blocked revisions 286070 via svnmerge ........ r286070 | dvossel | 2010-09-10 15:03:50 -0500 (Fri, 10 Sep 2010) | 32 lines Fixes sip extension state update DEADLOCK PROBLEM: In chan_sip, and all the other channel drivers, it is common for us to hold the tech_pvt lock while we ask the Asterisk core about an extension and context. Every time we do this the locking order becomes, (1. tech_pvt lock ---> 2. global context lock). In chan_sip when a dialog subscribes to a hint, that locking order is reversed in the extensionstate callback which will occur outside of the channel_driver's monitor loop. So, on an extension state update we have (1. global context lock ----> 2. tech_pvt lock). Typically when we have to do a reversed locking order like this we'd just do some sort of deadlock avoidance to fix the problem... That will not work here. There are more locks involved here than just the context and tech_pvt. Those are the two that are colliding, but it is impossible to give up the context lock because the global hints list lock MUST be held as well and we can not give that lock up during the extensionstate callback traversal... The locking order for the context and hints are (1. global context lock ----> 2. hints list lock). Deadlock avoidance is not an option here. SOLUTION: The solution this patch implements is to queue the extension state updates into a list and send the NOTIFY messages out during the do_monitor pvt traversal. This clears out the problem of having to hold the context lock before the tech_pvt lock entirely. (closes issue ASTERISK-16593) Reported by: zerohalo ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=286071 |