Summary: | ASTERISK-12250: Deadlock | ||
Reporter: | callguy (callguy) | Labels: | |
Date Opened: | 2008-06-23 14:37:29 | Date Closed: | 2011-06-07 14:00:19 |
Priority: | Blocker | Regression? | No |
Status: | Closed/Complete | Components: | Channels/chan_sip/General |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ||
Description: | We're seeing 1.4.20.1 deadlock periodically, but not permanently at first. Most of the time the output of sip show peers or core show locks will not return anything, then after a lengthy delay they will scroll across the buffer. call processing also stops completely during this time, then will resume. after a few cycles the box stops responding in any meaningful way. output of two core show locks captures attached. ****** ADDITIONAL INFORMATION ****** ======================================================================= === Currently Held Locks ============================================== ======================================================================= === === <file> <line num> <function> <lock name> <lock addr> (times locked) === === Thread ID: 3345312 (do_monitor started at [15834] chan_sip.c restart_monitor()) === ---> Lock #0 (chan_sip.c): MUTEX 15778 do_monitor &monlock 0x791720 (1) === ---> Waiting for Lock #1 (chan_sip.c): MUTEX 15799 do_monitor &(&peerl)->_lock 0x7939a0 (1) === --- ---> Locked Here: chan_sip.c line 10075 (_sip_show_peers) === ------------------------------------------------------------------- === === Thread ID: 103148448 (pbx_thread started at [ 2660] pbx.c ast_pbx_start()) === ---> Lock #0 (channel.c): MUTEX 1424 ast_hangup &chan->lock 0x9475b20 (1) === ---> Lock #1 (chan_sip.c): MUTEX 3530 sip_hangup &p->lock 0x884df68 (1) === ---> Waiting for Lock #2 (chan_sip.c): MUTEX 2669 find_peer &(&peerl)->_lock 0x7939a0 (1) === --- ---> Locked Here: chan_sip.c line 10075 (_sip_show_peers) === ------------------------------------------------------------------- === === Thread ID: 91798432 (session_do started at [ 2399] manager.c accept_thread()) === ---> Lock #0 (manager.c): RDLOCK 2159 process_message &actionlock 0x8172f00 (1) === ---> Lock #1 (chan_sip.c): MUTEX 10075 _sip_show_peers &(&peerl)->_lock 0x7939a0 (1) === ---> Lock #2 (chan_sip.c): MUTEX 10075 _sip_show_peers &(iterator)->_lock 0x8513bbc (1) === ---> Lock #3 (manager.c): MUTEX 483 astman_append &s->__lock 0x843853c (1) === ------------------------------------------------------------------- === === Thread ID: 15997856 (netconsole started at [ 1011] asterisk.c listener()) === ---> Waiting for Lock #0 (chan_sip.c): MUTEX 10075 _sip_show_peers &(&peerl)->_lock 0x7939a0 (1) === --- ---> Locked Here: chan_sip.c line 10075 (_sip_show_peers) --------------------- ======================================================================= === Currently Held Locks ============================================== ======================================================================= === === <file> <line num> <function> <lock name> <lock addr> (times locked) === === Thread ID: 3345312 (do_monitor started at [15834] chan_sip.c restart_monitor()) === ---> Lock #0 (chan_sip.c): MUTEX 15778 do_monitor &monlock 0x791720 (1) === ---> Waiting for Lock #1 (chan_sip.c): MUTEX 15799 do_monitor &(&peerl)->_lock 0x7939a0 (1) === --- ---> Locked Here: chan_sip.c line 10075 (_sip_show_peers) === ------------------------------------------------------------------- === === Thread ID: 3085618080 (netconsole started at [ 1011] asterisk.c listener()) === ---> Waiting for Lock #0 (chan_sip.c): MUTEX 10075 _sip_show_peers &(&peerl)->_lock 0x7939a0 (1) === --- ---> Locked Here: chan_sip.c line 10075 (_sip_show_peers) === ------------------------------------------------------------------- === === Thread ID: 116587424 (session_do started at [ 2399] manager.c accept_thread()) === ---> Lock #0 (manager.c): RDLOCK 2159 process_message &actionlock 0x8172f00 (1) === ---> Lock #1 (chan_sip.c): MUTEX 10075 _sip_show_peers &(&peerl)->_lock 0x7939a0 (1) === ---> Lock #2 (chan_sip.c): MUTEX 10075 _sip_show_peers &(iterator)->_lock 0xb7ae978c (1) === ---> Lock #3 (manager.c): MUTEX 483 astman_append &s->__lock 0x843853c (1) === ------------------------------------------------------------------- === | ||
Comments: | By: Mark Michelson (mmichelson) 2008-06-23 15:22:53 Was this "core show locks" output taken during one of the temporary freezes or during the final for-real lockup? By: callguy (callguy) 2008-06-23 15:26:09 putnopvut: the 2nd one was closer to the final lockup, at this point the machine is up but there is no output from core show locks at all, just this: ======================================================================= === Currently Held Locks ============================================== ======================================================================= === === <file> <line num> <function> <lock name> <lock addr> (times locked) === ======================================================================= By: Tilghman Lesher (tilghman) 2008-07-02 15:12:00 It is highly likely that this was fixed with the recent release of 1.4.21.1. Could you update to that version and check, please? By: callguy (callguy) 2008-07-02 15:31:17 Corydon76: I've been working with putnopvut on this, we can check that version but this wasn't determined to be an actual deadlock, but a resource starvation issue in certain situations related to manager connections. By: agjohnson_tv (agjohnson_tv) 2008-07-22 11:25:19 I have experienced a very similair problem. I am using 1.4.21.1. At the time this was taken, the system was using 50% cpu when there were no actual active calls. 'show channels' displayed 3 calls, though no calls were actually active. ======================================================================= === Currently Held Locks ============================================== ======================================================================= === === <file> <line num> <function> <lock name> <lock addr> (times locked) === === Thread ID: 3033103248 (pbx_thread started at [ 2660] pbx.c ast_pbx_start()) === ---> Lock #0 (channel.c): MUTEX 2580 ast_write (channel lock) 0x9d65a08 (1))->lock 0x9a17f10 (1) === ------------------------------------------------------------------- === === Thread ID: 3064003472 (ast_httpd_helper_thread started at [ 548] http.c http_root()) === ---> Lock #0 (http.c): RDLOCK 342 handle_uri &uris_lock 0x8187000 (1) === ---> Lock #1 (manager.c): RDLOCK 2200 process_message &actionlock 0x81871e0(1) === ------------------------------------------------------------------- === === Thread ID: 3072277392 (ast_httpd_helper_thread started at [ 548] http.c http_root()) === ---> Lock #0 (http.c): RDLOCK 342 handle_uri &uris_lock 0x8187000 (1) === ---> Lock #1 (manager.c): RDLOCK 2200 process_message &actionlock 0x81871e0(1) === ------------------------------------------------------------------- === ======================================================================= Manually doing a soft-hangup on the active calls resulted in the locks going away. I don't know if it is related, but I hope it helps. By: Russell Bryant (russell) 2008-10-05 17:20:09 This is not actually a bug. The code is doing exactly what we would expect. The good is news is that I would expect this type of usage to work _much_ better in 1.6.1 or later. The way the data structures are managed in chan_sip in versions prior to 1.6.1, access to the list of all SIP peers will be completely blocked while any sort of "sip show peers" type operation is performed. |