[Home]

Summary:ASTERISK-12250: Deadlock
Reporter:callguy (callguy)Labels:
Date Opened:2008-06-23 14:37:29Date Closed:2011-06-07 14:00:19
Priority:BlockerRegression?No
Status:Closed/CompleteComponents:Channels/chan_sip/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:
Description:We're seeing 1.4.20.1 deadlock periodically, but not permanently at first. Most of the time the output of sip show peers or core show locks will not return anything, then after a lengthy delay they will scroll across the buffer.

call processing also stops completely during this time, then will resume. after a few cycles the box stops responding in any meaningful way. output of two core show locks captures attached.

****** ADDITIONAL INFORMATION ******

=======================================================================
=== Currently Held Locks ==============================================
=======================================================================
===
=== <file> <line num> <function> <lock name> <lock addr> (times locked)
===
=== Thread ID: 3345312 (do_monitor           started at [15834] chan_sip.c restart_monitor())
=== ---> Lock #0 (chan_sip.c): MUTEX 15778 do_monitor &monlock 0x791720 (1)
=== ---> Waiting for Lock #1 (chan_sip.c): MUTEX 15799 do_monitor &(&peerl)->_lock 0x7939a0 (1)
=== --- ---> Locked Here: chan_sip.c line 10075 (_sip_show_peers)
=== -------------------------------------------------------------------
===
=== Thread ID: 103148448 (pbx_thread           started at [ 2660] pbx.c ast_pbx_start())
=== ---> Lock #0 (channel.c): MUTEX 1424 ast_hangup &chan->lock 0x9475b20 (1)
=== ---> Lock #1 (chan_sip.c): MUTEX 3530 sip_hangup &p->lock 0x884df68 (1)
=== ---> Waiting for Lock #2 (chan_sip.c): MUTEX 2669 find_peer &(&peerl)->_lock 0x7939a0 (1)
=== --- ---> Locked Here: chan_sip.c line 10075 (_sip_show_peers)
=== -------------------------------------------------------------------
===
=== Thread ID: 91798432 (session_do           started at [ 2399] manager.c accept_thread())
=== ---> Lock #0 (manager.c): RDLOCK 2159 process_message &actionlock 0x8172f00 (1)
=== ---> Lock #1 (chan_sip.c): MUTEX 10075 _sip_show_peers &(&peerl)->_lock 0x7939a0 (1)
=== ---> Lock #2 (chan_sip.c): MUTEX 10075 _sip_show_peers &(iterator)->_lock 0x8513bbc (1)
=== ---> Lock #3 (manager.c): MUTEX 483 astman_append &s->__lock 0x843853c (1)
=== -------------------------------------------------------------------
===
=== Thread ID: 15997856 (netconsole           started at [ 1011] asterisk.c listener())
=== ---> Waiting for Lock #0 (chan_sip.c): MUTEX 10075 _sip_show_peers &(&peerl)->_lock 0x7939a0 (1)
=== --- ---> Locked Here: chan_sip.c line 10075 (_sip_show_peers)

---------------------

=======================================================================
=== Currently Held Locks ==============================================
=======================================================================
===
=== <file> <line num> <function> <lock name> <lock addr> (times locked) ===
=== Thread ID: 3345312 (do_monitor           started at [15834] chan_sip.c restart_monitor())
=== ---> Lock #0 (chan_sip.c): MUTEX 15778 do_monitor &monlock 0x791720 (1) === ---> Waiting for Lock #1 (chan_sip.c): MUTEX 15799 do_monitor &(&peerl)->_lock 0x7939a0 (1) === --- ---> Locked Here: chan_sip.c line 10075 (_sip_show_peers) === -------------------------------------------------------------------
===
=== Thread ID: 3085618080 (netconsole           started at [ 1011] asterisk.c listener())
=== ---> Waiting for Lock #0 (chan_sip.c): MUTEX 10075 _sip_show_peers &(&peerl)->_lock 0x7939a0 (1) === --- ---> Locked Here: chan_sip.c line 10075 (_sip_show_peers) === -------------------------------------------------------------------
===
=== Thread ID: 116587424 (session_do           started at [ 2399] manager.c accept_thread())
=== ---> Lock #0 (manager.c): RDLOCK 2159 process_message &actionlock 0x8172f00 (1) === ---> Lock #1 (chan_sip.c): MUTEX 10075 _sip_show_peers &(&peerl)->_lock 0x7939a0 (1) === ---> Lock #2 (chan_sip.c): MUTEX 10075 _sip_show_peers &(iterator)->_lock 0xb7ae978c (1) === ---> Lock #3 (manager.c): MUTEX 483 astman_append &s->__lock 0x843853c (1) === -------------------------------------------------------------------
===
Comments:By: Mark Michelson (mmichelson) 2008-06-23 15:22:53

Was this "core show locks" output taken during one of the temporary freezes or during the final for-real lockup?

By: callguy (callguy) 2008-06-23 15:26:09

putnopvut: the 2nd one was closer to the final lockup, at this point the machine is up but there is no output from core show locks at all, just this:

=======================================================================
=== Currently Held Locks ==============================================
=======================================================================
===
=== <file> <line num> <function> <lock name> <lock addr> (times locked)
===
=======================================================================

By: Tilghman Lesher (tilghman) 2008-07-02 15:12:00

It is highly likely that this was fixed with the recent release of 1.4.21.1.  Could you update to that version and check, please?

By: callguy (callguy) 2008-07-02 15:31:17

Corydon76: I've been working with putnopvut on this, we can check that version but this wasn't determined to be an actual deadlock, but a resource starvation issue in certain situations related to manager connections.

By: agjohnson_tv (agjohnson_tv) 2008-07-22 11:25:19

I have experienced a very similair problem. I am using 1.4.21.1.  At the time this was taken, the system was using 50% cpu when there were no actual active calls.  'show channels' displayed 3 calls, though no calls were actually active.

=======================================================================
=== Currently Held Locks ==============================================
=======================================================================
===
=== <file> <line num> <function> <lock name> <lock addr> (times locked)
===
=== Thread ID: 3033103248 (pbx_thread           started at [ 2660] pbx.c ast_pbx_start())
=== ---> Lock #0 (channel.c): MUTEX 2580 ast_write (channel lock) 0x9d65a08 (1))->lock 0x9a17f10 (1)
=== -------------------------------------------------------------------
===
=== Thread ID: 3064003472 (ast_httpd_helper_thread started at [  548] http.c http_root())
=== ---> Lock #0 (http.c): RDLOCK 342 handle_uri &uris_lock 0x8187000 (1)
=== ---> Lock #1 (manager.c): RDLOCK 2200 process_message &actionlock 0x81871e0(1)
=== -------------------------------------------------------------------
===
=== Thread ID: 3072277392 (ast_httpd_helper_thread started at [  548] http.c http_root())
=== ---> Lock #0 (http.c): RDLOCK 342 handle_uri &uris_lock 0x8187000 (1)
=== ---> Lock #1 (manager.c): RDLOCK 2200 process_message &actionlock 0x81871e0(1)
=== -------------------------------------------------------------------
===
=======================================================================

Manually doing a soft-hangup on the active calls resulted in the locks going away.
I don't know if it is related, but I hope it helps.



By: Russell Bryant (russell) 2008-10-05 17:20:09

This is not actually a bug.  The code is doing exactly what we would expect.  The good is news is that I would expect this type of usage to work _much_ better in 1.6.1 or later.

The way the data structures are managed in chan_sip in versions prior to 1.6.1, access to the list of all SIP peers will be completely blocked while any sort of "sip show peers" type operation is performed.