Summary: | ASTERISK-15072: Revision 202007 Introduces Deadlock | ||
Reporter: | paul-tg (paul-tg) | Labels: | |
Date Opened: | 2009-11-03 19:23:35.000-0600 | Date Closed: | 2009-11-10 16:24:28.000-0600 |
Priority: | Major | Regression? | Yes |
Status: | Closed/Complete | Components: | Channels/chan_sip/CodecHandling |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ||
Description: | After upgrading from 1.6.0-beta9 to 1.6.2-rc2, we are experiencing deadlocks in try_suggested_sip_codec(). "core show locks" points to code introduced in revision 202007, which ironically was intended to protect against deadlocks. The Asterisk build is completely stock except for a modified SIP response code in chan_sip.c:handle_request_invite(). We have had two deadlocks so far in as many weeks, both times during periods of lower than average call load. ****** ADDITIONAL INFORMATION ****** # sudo /usr/sbin/asterisk -rvvvx "core show locks" Asterisk 1.6.2.0-rc2, Copyright (C) 1999 - 2009 Digium, Inc. and others. Created by Mark Spencer <markster@digium.com> Asterisk comes with ABSOLUTELY NO WARRANTY; type 'core show warranty' for details. This is free software, with components licensed under the GNU General Public License version 2 and other licenses; you are welcome to redistribute it under certain conditions. Type 'core show license' for details. ========================================================================= == Parsing '/etc/asterisk/asterisk.conf': == Found == Parsing '/etc/asterisk/extconfig.conf': == Found ======================================================================= === Currently Held Locks ============================================== ======================================================================= === === <pending> <lock#> (<file>): <lock type> <line num> <function> <lock name> <lock addr> (times locked) === === Thread ID: 1078311232 (do_monitor started at [21909] chan_sip.c restart_monitor()) === ---> Lock #0 (chan_sip.c): MUTEX 5932 try_suggested_sip_codec p 0x2aaad0129d60 (1) /usr/sbin/asterisk(ast_bt_get_addresses+0xe) [0x4edaee] /usr/sbin/asterisk(_ao2_lock+0x290) [0x43d280] /usr/lib/asterisk/modules/chan_sip.so [0x2aaac416f17d] /usr/lib/asterisk/modules/chan_sip.so [0x2aaac4174284] /usr/lib/asterisk/modules/chan_sip.so [0x2aaac417828b] /usr/sbin/asterisk(ast_sched_runq+0x1e3) [0x55ac13] /usr/lib/asterisk/modules/chan_sip.so [0x2aaac41ce643] /usr/sbin/asterisk [0x57de33] /lib64/libpthread.so.0 [0x3abfa062f7] /lib64/libc.so.6(clone+0x6d) [0x3abf2d1b6d] === ---> Lock #1 (chan_sip.c): MUTEX 21881 do_monitor &monlock 0x2aaac43f6f20 (1) /usr/sbin/asterisk(ast_bt_get_addresses+0xe) [0x4edaee] /usr/lib/asterisk/modules/chan_sip.so [0x2aaac41ced16] /usr/sbin/asterisk [0x57de33] /lib64/libpthread.so.0 [0x3abfa062f7] /lib64/libc.so.6(clone+0x6d) [0x3abf2d1b6d] === ---> Tried and failed to get Lock #2 (chan_sip.c): MUTEX 3566 retrans_pkt &pkt->owner->owner->lock_dont_use 0x2aaad00fa1e0 (0) /usr/sbin/asterisk(ast_bt_get_addresses+0xe) [0x4edaee] /usr/lib/asterisk/modules/chan_sip.so [0x2aaac417dc51] /usr/sbin/asterisk(ast_sched_runq+0x1e3) [0x55ac13] /usr/lib/asterisk/modules/chan_sip.so [0x2aaac41ce643] /usr/sbin/asterisk [0x57de33] /lib64/libpthread.so.0 [0x3abfa062f7] /lib64/libc.so.6(clone+0x6d) [0x3abf2d1b6d] === ------------------------------------------------------------------- === === Thread ID: 1099135296 (pbx_thread started at [ 4546] pbx.c ast_pbx_start()) === ---> Lock #0 (channel.c): MUTEX 1709 ast_hangup &chan->lock_dont_use 0x2aaad00fa1e0 (1) /usr/sbin/asterisk(ast_bt_get_addresses+0xe) [0x4edaee] /usr/sbin/asterisk [0x46189e] /usr/sbin/asterisk(ast_hangup+0x322) [0x4743e2] /usr/sbin/asterisk [0x5296c7] /usr/sbin/asterisk [0x52ab3b] /usr/sbin/asterisk [0x57de33] /lib64/libpthread.so.0 [0x3abfa062f7] /lib64/libc.so.6(clone+0x6d) [0x3abf2d1b6d] === ---> Waiting for Lock #1 (chan_sip.c): MUTEX 5767 sip_hangup p 0x2aaad0129d60 (1) /usr/sbin/asterisk(ast_bt_get_addresses+0xe) [0x4edaee] /usr/sbin/asterisk(_ao2_lock+0x290) [0x43d280] /usr/lib/asterisk/modules/chan_sip.so [0x2aaac417dd0f] /usr/sbin/asterisk(ast_sched_runq+0x1e3) [0x55ac13] /usr/lib/asterisk/modules/chan_sip.so [0x2aaac41ce643] /usr/sbin/asterisk [0x57de33] /lib64/libpthread.so.0 [0x3abfa062f7] /lib64/libc.so.6(clone+0x6d) [0x3abf2d1b6d] === --- ---> Locked Here: chan_sip.c line 5932 (try_suggested_sip_codec) === ------------------------------------------------------------------- === ======================================================================= | ||
Comments: | By: Leif Madsen (lmadsen) 2009-11-04 08:46:42.000-0600 Just wanted to verify that you're still seeing this change and issue in something newer than 1.6.2.0-rc2? By: Leif Madsen (lmadsen) 2009-11-04 08:52:23.000-0600 Additionally, this is marked as crash; do you have a backtrace, or is this not actually crashing? By: paul-tg (paul-tg) 2009-11-04 17:45:00.000-0600 Apologies for that -- I marked it as a crash because that seemed to be the most accurate description. The symptom is a deadlock; Asterisk stops responding, but there is no crash so unfortunately I haven't a core dump for you. I was mulling over instructing Operations to force a crash via SIGSEGV, but decided that a core wouldn't be useful; would it be? I would verify this in 1.6.2.0-rc3, but the issue occurs on a production server and its deadlocking costs us money :-). I believe that try_suggested_sip_codec() has not changed in the interim though (although it has changed in trunk), so unless the other "side" of the deadlock has been fixed since, this issue is unlikely to have been resolved. In order to avoid this in future, I have reverted try_suggested_sip_codec() to how it was before 1.6.2.0-beta4, and expect the problem not to recur. By: Leif Madsen (lmadsen) 2009-11-05 09:29:00.000-0600 When the deadlock happens, you could attach to the existing Asterisk process and gather a backtrace that way. Once attached, you could run: thread apply all bt bt bt full And that information may be useful for a developer in tracking down the issue. By: Matthew Nicholson (mnicholson) 2009-11-10 09:22:23.000-0600 The deadlock is caused because my modifications to try_suggested_sip_codec expect the sip_pvt to be locked. In some cases it is not locked when try_suggested_sip_codec is called. In these cases, during the deadlock protection code the pvt is locked and never unlocked causing your deadlock. I am going to revert this change as I do not think it is necessary in this branch. By: Digium Subversion (svnbot) 2009-11-10 09:22:43.000-0600 Repository: asterisk Revision: 229088 _U branches/1.6.2/UPGRADE-1.6.txt _U branches/1.6.2/UPGRADE.txt U branches/1.6.2/channels/chan_sip.c _U branches/1.6.2/main/editline/makelist.in ------------------------------------------------------------------------ r229088 | mnicholson | 2009-11-10 09:22:42 -0600 (Tue, 10 Nov 2009) | 6 lines Reverted revision 202007. (closes issue ASTERISK-15072) Reported by: paul-tg Tested by: paul-tg ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=229088 By: Digium Subversion (svnbot) 2009-11-10 09:28:08.000-0600 Repository: asterisk Revision: 229091 U branches/1.4/channels/chan_sip.c ------------------------------------------------------------------------ r229091 | mnicholson | 2009-11-10 09:28:07 -0600 (Tue, 10 Nov 2009) | 5 lines Reverted revision 202022. (closes issue ASTERISK-15072) Reported by: paul-tg ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=229091 By: Digium Subversion (svnbot) 2009-11-10 09:44:14.000-0600 Repository: asterisk Revision: 229099 U branches/1.6.1/channels/chan_sip.c ------------------------------------------------------------------------ r229099 | mnicholson | 2009-11-10 09:44:14 -0600 (Tue, 10 Nov 2009) | 5 lines Reverted revision 202008. (closes issue ASTERISK-15072) Reported by: paul-tg ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=229099 By: Digium Subversion (svnbot) 2009-11-10 09:47:40.000-0600 Repository: asterisk Revision: 229100 U branches/1.6.0/channels/chan_sip.c ------------------------------------------------------------------------ r229100 | mnicholson | 2009-11-10 09:47:40 -0600 (Tue, 10 Nov 2009) | 5 lines Reverted revision 202006. (closes issue ASTERISK-15072) Reported by: paul-tg ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=229100 |