[Home]

Summary:ASTERISK-16593: ast_add_hint deadlock in pbx.c MUTEX ast_hint_state_changed
Reporter:Alan Graham (zerohalo)Labels:
Date Opened:2010-08-19 13:33:20Date Closed:2010-11-02 09:27:57
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Core/PBX
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) bt_scrubbed.txt
( 1) csl.txt
Description:getting deadlock randomly, call processing stops - appears to happen when the hint state changes when a call is hung up.

1.4.34 w/ patch from ASTERISK-16365

'core show locks' and backtrace from ast_grab_core attached
Comments:By: Alan Graham (zerohalo) 2010-08-23 12:37:46

getting these multiple times a day, sometimes during an extensions reload, sometimes not. I don't see any real pattern to when these happen.

By: Alan Graham (zerohalo) 2010-08-31 09:59:13

is there anything else I can provide that would help with this?

By: Digium Subversion (svnbot) 2010-09-10 15:03:52

Repository: asterisk
Revision: 286070

U   branches/1.4/channels/chan_sip.c

------------------------------------------------------------------------
r286070 | dvossel | 2010-09-10 15:03:51 -0500 (Fri, 10 Sep 2010) | 32 lines

Fixes sip extension state update DEADLOCK

PROBLEM:
In chan_sip, and all the other channel drivers, it is common for
us to hold the tech_pvt lock while we ask the Asterisk core about
an extension and context.  Every time we do this the locking
order becomes, (1. tech_pvt lock ---> 2. global context lock). In
chan_sip when a dialog subscribes to a hint, that locking order
is reversed in the extensionstate callback which will occur outside
of the channel_driver's monitor loop.  So, on an extension state
update we have (1. global context lock ----> 2. tech_pvt lock).

Typically when we have to do a reversed locking order like this
we'd just do some sort of deadlock avoidance to fix the problem...
That will not work here.  There are more locks involved here than
just the context and tech_pvt.  Those are the two that are colliding,
but it is impossible to give up the context lock because the global
hints list lock MUST be held as well and we can not give that lock
up during the extensionstate callback traversal... The locking order
for the context and hints are (1. global context lock ----> 2.
hints list lock).  Deadlock avoidance is not an option here.

SOLUTION:
The solution this patch implements is to queue the extension state updates
into a list and send the NOTIFY messages out during the do_monitor pvt
traversal.  This clears out the problem of having to hold the context
lock before the tech_pvt lock entirely.

(closes issue ASTERISK-16593)
Reported by: zerohalo


------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=286070

By: Digium Subversion (svnbot) 2010-09-10 15:04:43

Repository: asterisk
Revision: 286071

_U  branches/1.6.2/

------------------------------------------------------------------------
r286071 | dvossel | 2010-09-10 15:04:43 -0500 (Fri, 10 Sep 2010) | 37 lines

Blocked revisions 286070 via svnmerge

........
 r286070 | dvossel | 2010-09-10 15:03:50 -0500 (Fri, 10 Sep 2010) | 32 lines
 
 Fixes sip extension state update DEADLOCK
 
 PROBLEM:
 In chan_sip, and all the other channel drivers, it is common for
 us to hold the tech_pvt lock while we ask the Asterisk core about
 an extension and context.  Every time we do this the locking
 order becomes, (1. tech_pvt lock ---> 2. global context lock). In
 chan_sip when a dialog subscribes to a hint, that locking order
 is reversed in the extensionstate callback which will occur outside
 of the channel_driver's monitor loop.  So, on an extension state
 update we have (1. global context lock ----> 2. tech_pvt lock).
 
 Typically when we have to do a reversed locking order like this
 we'd just do some sort of deadlock avoidance to fix the problem...
 That will not work here.  There are more locks involved here than
 just the context and tech_pvt.  Those are the two that are colliding,
 but it is impossible to give up the context lock because the global
 hints list lock MUST be held as well and we can not give that lock
 up during the extensionstate callback traversal... The locking order
 for the context and hints are (1. global context lock ----> 2.
 hints list lock).  Deadlock avoidance is not an option here.
 
 SOLUTION:
 The solution this patch implements is to queue the extension state updates
 into a list and send the NOTIFY messages out during the do_monitor pvt
 traversal.  This clears out the problem of having to hold the context
 lock before the tech_pvt lock entirely.
 
 (closes issue ASTERISK-16593)
 Reported by: zerohalo
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=286071