[Home]

Summary:ASTERISK-15072: Revision 202007 Introduces Deadlock
Reporter:paul-tg (paul-tg)Labels:
Date Opened:2009-11-03 19:23:35.000-0600Date Closed:2009-11-10 16:24:28.000-0600
Priority:MajorRegression?Yes
Status:Closed/CompleteComponents:Channels/chan_sip/CodecHandling
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:
Description:After upgrading from 1.6.0-beta9 to 1.6.2-rc2, we are experiencing deadlocks in try_suggested_sip_codec().  "core show locks" points to code introduced in revision 202007, which ironically was intended to protect against deadlocks.  The Asterisk build is completely stock except for a modified SIP response code in chan_sip.c:handle_request_invite().

We have had two deadlocks so far in as many weeks, both times during periods of lower than average call load.

****** ADDITIONAL INFORMATION ******

# sudo /usr/sbin/asterisk -rvvvx "core show locks"
Asterisk 1.6.2.0-rc2, Copyright (C) 1999 - 2009 Digium, Inc. and others.
Created by Mark Spencer <markster@digium.com>
Asterisk comes with ABSOLUTELY NO WARRANTY; type 'core show warranty' for details.
This is free software, with components licensed under the GNU General Public
License version 2 and other licenses; you are welcome to redistribute it under
certain conditions. Type 'core show license' for details.
=========================================================================
 == Parsing '/etc/asterisk/asterisk.conf':   == Found
 == Parsing '/etc/asterisk/extconfig.conf':   == Found
=======================================================================
=== Currently Held Locks ==============================================
=======================================================================
===
=== <pending> <lock#> (<file>): <lock type> <line num> <function> <lock name> <lock addr> (times locked)
===
=== Thread ID: 1078311232 (do_monitor           started at [21909] chan_sip.c restart_monitor())
=== ---> Lock #0 (chan_sip.c): MUTEX 5932 try_suggested_sip_codec p 0x2aaad0129d60 (1)
       /usr/sbin/asterisk(ast_bt_get_addresses+0xe) [0x4edaee]
       /usr/sbin/asterisk(_ao2_lock+0x290) [0x43d280]
       /usr/lib/asterisk/modules/chan_sip.so [0x2aaac416f17d]
       /usr/lib/asterisk/modules/chan_sip.so [0x2aaac4174284]
       /usr/lib/asterisk/modules/chan_sip.so [0x2aaac417828b]
       /usr/sbin/asterisk(ast_sched_runq+0x1e3) [0x55ac13]
       /usr/lib/asterisk/modules/chan_sip.so [0x2aaac41ce643]
       /usr/sbin/asterisk [0x57de33]
       /lib64/libpthread.so.0 [0x3abfa062f7]
       /lib64/libc.so.6(clone+0x6d) [0x3abf2d1b6d]
=== ---> Lock #1 (chan_sip.c): MUTEX 21881 do_monitor &monlock 0x2aaac43f6f20 (1)
       /usr/sbin/asterisk(ast_bt_get_addresses+0xe) [0x4edaee]
       /usr/lib/asterisk/modules/chan_sip.so [0x2aaac41ced16]
       /usr/sbin/asterisk [0x57de33]
       /lib64/libpthread.so.0 [0x3abfa062f7]
       /lib64/libc.so.6(clone+0x6d) [0x3abf2d1b6d]
=== ---> Tried and failed to get Lock #2 (chan_sip.c): MUTEX 3566 retrans_pkt &pkt->owner->owner->lock_dont_use 0x2aaad00fa1e0 (0)                                              
       /usr/sbin/asterisk(ast_bt_get_addresses+0xe) [0x4edaee]
       /usr/lib/asterisk/modules/chan_sip.so [0x2aaac417dc51]
       /usr/sbin/asterisk(ast_sched_runq+0x1e3) [0x55ac13]
       /usr/lib/asterisk/modules/chan_sip.so [0x2aaac41ce643]
       /usr/sbin/asterisk [0x57de33]
       /lib64/libpthread.so.0 [0x3abfa062f7]
       /lib64/libc.so.6(clone+0x6d) [0x3abf2d1b6d]
=== -------------------------------------------------------------------
===
=== Thread ID: 1099135296 (pbx_thread           started at [ 4546] pbx.c ast_pbx_start())
=== ---> Lock #0 (channel.c): MUTEX 1709 ast_hangup &chan->lock_dont_use 0x2aaad00fa1e0 (1)
       /usr/sbin/asterisk(ast_bt_get_addresses+0xe) [0x4edaee]
       /usr/sbin/asterisk [0x46189e]
       /usr/sbin/asterisk(ast_hangup+0x322) [0x4743e2]
       /usr/sbin/asterisk [0x5296c7]
       /usr/sbin/asterisk [0x52ab3b]
       /usr/sbin/asterisk [0x57de33]
       /lib64/libpthread.so.0 [0x3abfa062f7]
       /lib64/libc.so.6(clone+0x6d) [0x3abf2d1b6d]
=== ---> Waiting for Lock #1 (chan_sip.c): MUTEX 5767 sip_hangup p 0x2aaad0129d60 (1)
       /usr/sbin/asterisk(ast_bt_get_addresses+0xe) [0x4edaee]
       /usr/sbin/asterisk(_ao2_lock+0x290) [0x43d280]
       /usr/lib/asterisk/modules/chan_sip.so [0x2aaac417dd0f]
       /usr/sbin/asterisk(ast_sched_runq+0x1e3) [0x55ac13]
       /usr/lib/asterisk/modules/chan_sip.so [0x2aaac41ce643]
       /usr/sbin/asterisk [0x57de33]
       /lib64/libpthread.so.0 [0x3abfa062f7]
       /lib64/libc.so.6(clone+0x6d) [0x3abf2d1b6d]
=== --- ---> Locked Here: chan_sip.c line 5932 (try_suggested_sip_codec)
=== -------------------------------------------------------------------
===
=======================================================================
Comments:By: Leif Madsen (lmadsen) 2009-11-04 08:46:42.000-0600

Just wanted to verify that you're still seeing this change and issue in something newer than 1.6.2.0-rc2?

By: Leif Madsen (lmadsen) 2009-11-04 08:52:23.000-0600

Additionally, this is marked as crash; do you have a backtrace, or is this not actually crashing?

By: paul-tg (paul-tg) 2009-11-04 17:45:00.000-0600

Apologies for that -- I marked it as a crash because that seemed to be the most accurate description.  The symptom is a deadlock; Asterisk stops responding, but there is no crash so unfortunately I haven't a core dump for you.  I was mulling over instructing Operations to force a crash via SIGSEGV, but decided that a core wouldn't be useful; would it be?

I would verify this in 1.6.2.0-rc3, but the issue occurs on a production server and its deadlocking costs us money :-).  I believe that try_suggested_sip_codec() has not changed in the interim though (although it has changed in trunk), so unless the other "side" of the deadlock has been fixed since, this issue is unlikely to have been resolved.

In order to avoid this in future, I have reverted try_suggested_sip_codec() to how it was before 1.6.2.0-beta4, and expect the problem not to recur.

By: Leif Madsen (lmadsen) 2009-11-05 09:29:00.000-0600

When the deadlock happens, you could attach to the existing Asterisk process and gather a backtrace that way. Once attached, you could run:

thread apply all bt
bt
bt full

And that information may be useful for a developer in tracking down the issue.

By: Matthew Nicholson (mnicholson) 2009-11-10 09:22:23.000-0600

The deadlock is caused because my modifications to try_suggested_sip_codec expect the sip_pvt to be locked.  In some cases it is not locked when try_suggested_sip_codec is called.  In these cases, during the deadlock protection code the pvt is locked and never unlocked causing your deadlock.  I am going to revert this change as I do not think it is necessary in this branch.

By: Digium Subversion (svnbot) 2009-11-10 09:22:43.000-0600

Repository: asterisk
Revision: 229088

_U  branches/1.6.2/UPGRADE-1.6.txt
_U  branches/1.6.2/UPGRADE.txt
U   branches/1.6.2/channels/chan_sip.c
_U  branches/1.6.2/main/editline/makelist.in

------------------------------------------------------------------------
r229088 | mnicholson | 2009-11-10 09:22:42 -0600 (Tue, 10 Nov 2009) | 6 lines

Reverted revision 202007.

(closes issue ASTERISK-15072)
Reported by: paul-tg
Tested by: paul-tg

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=229088

By: Digium Subversion (svnbot) 2009-11-10 09:28:08.000-0600

Repository: asterisk
Revision: 229091

U   branches/1.4/channels/chan_sip.c

------------------------------------------------------------------------
r229091 | mnicholson | 2009-11-10 09:28:07 -0600 (Tue, 10 Nov 2009) | 5 lines

Reverted revision 202022.

(closes issue ASTERISK-15072)
Reported by: paul-tg

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=229091

By: Digium Subversion (svnbot) 2009-11-10 09:44:14.000-0600

Repository: asterisk
Revision: 229099

U   branches/1.6.1/channels/chan_sip.c

------------------------------------------------------------------------
r229099 | mnicholson | 2009-11-10 09:44:14 -0600 (Tue, 10 Nov 2009) | 5 lines

Reverted revision 202008.

(closes issue ASTERISK-15072)
Reported by: paul-tg

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=229099

By: Digium Subversion (svnbot) 2009-11-10 09:47:40.000-0600

Repository: asterisk
Revision: 229100

U   branches/1.6.0/channels/chan_sip.c

------------------------------------------------------------------------
r229100 | mnicholson | 2009-11-10 09:47:40 -0600 (Tue, 10 Nov 2009) | 5 lines

Reverted revision 202006.

(closes issue ASTERISK-15072)
Reported by: paul-tg

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=229100