[Home]

Summary:ASTERISK-17950: [patch] Deadlock in queue handling
Reporter:Byron Clark (byronclark)Labels:
Date Opened:2011-06-01 13:11:25Date Closed:2011-12-19 11:58:50.000-0600
Priority:MinorRegression?No
Status:Closed/CompleteComponents:Applications/app_queue
Versions:1.8.4 Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) ASTERISK-17950.patch
( 1) backtrace.full.txt
( 2) bug19397.diff
Description:Hit a deadlock on queue handling on a lightly loaded system. Output of "core show locks" in the Additional Information section.

It looks like BETTER_BACKTRACES doesn't really work when the debug information lives in a separate (installed) debuginfo package. Backtrace attached to the ticket.

Here's a quick analysis (thread nubmers are gdb/core show locks):
Thread 20/1081149760
====================
Locks Held
----------
hints (pbx.c:4269)
hint (pbx.c:4270)

Blocking On
-----------
queues (app_queue.c:1563)


Thread 15/1088944448
====================
Locks Held
----------
dialogs (astobj2.c:657)
dialog [sip_pvt] (chan_sip.c:16147)

Blocking On
-----------
hints (pbx.c:4425)


Thread 11/1081657664
====================
Locks Held
----------

Blocking On
-----------
queues (app_queue.c:1463)


Thread 5/1102928192
===================
Locks Held
----------
queues (app_queue.c:4427)

Blocking On
-----------
dialogs (chan_sip.c:7358)


Thread 3/1109186880
===================
Locks Held
----------

Blocking On
-----------
queues (app_queue:2337)

****** ADDITIONAL INFORMATION ******

=======================================================================
=== Currently Held Locks ==============================================
=======================================================================
===
=== <pending> <lock#> (<file>): <lock type> <line num> <function> <lock name> <lock addr> (times locked)
===
=== Thread ID: 1081149760 (tps_processing_function started at [  451] taskprocessor.c ast_taskprocessor_get())
=== ---> Lock #0 (pbx.c): MUTEX 4269 handle_statechange hints 0x8ade080 (1)
asterisk ast_bt_get_addresses() (0x4b7440+E)
asterisk __ast_pthread_mutex_lock() (0x4b3d20+BB)
asterisk <unknown>()
asterisk <unknown>()
asterisk <unknown>()
libpthread.so.0 <unknown>()
libc.so.6 clone() (0x37e12d3bc0+6D)
=== ---> Lock #1 (pbx.c): MUTEX 4270 handle_statechange hint 0xa84bb50 (1)
asterisk ast_bt_get_addresses() (0x4b7440+E)
asterisk __ast_pthread_mutex_lock() (0x4b3d20+BB)
asterisk <unknown>()
asterisk <unknown>()
asterisk <unknown>()
libpthread.so.0 <unknown>()
libc.so.6 clone() (0x37e12d3bc0+6D)
=== ---> Waiting for Lock #2 (astobj2.c): MUTEX 842 internal_ao2_iterator_next a->c 0xa07d070 (1)
asterisk ast_bt_get_addresses() (0x4b7440+E)
asterisk __ast_pthread_mutex_lock() (0x4b3d20+BB)
asterisk <unknown>()
asterisk __ao2_callback() (0x43d930+31)
app_queue.so <unknown>()
app_queue.so <unknown>()
asterisk pbx_exec() (0x4d91e0+CF)
res_agi.so <unknown>()
res_agi.so <unknown>()
res_agi.so <unknown>()
asterisk pbx_exec() (0x4d91e0+CF)
asterisk <unknown>()
asterisk ast_spawn_extension() (0x4e3760+30)
asterisk <unknown>()
asterisk <unknown>()
asterisk <unknown>()
libpthread.so.0 <unknown>()
libc.so.6 clone() (0x37e12d3bc0+6D)
=== --- ---> Locked Here: app_queue.c line 4427 (try_calling)
=== -------------------------------------------------------------------
===
=== Thread ID: 1088944448 (do_monitor           started at [24734] chan_sip.c restart_monitor())
=== ---> Lock #0 (astobj2.c): MUTEX 657 internal_ao2_callback c 0x2aaaac1b3f40 (1)
asterisk ast_bt_get_addresses() (0x4b7440+E)
asterisk __ast_pthread_mutex_lock() (0x4b3d20+BB)
asterisk <unknown>()
asterisk __ao2_callback() (0x43d930+31)
chan_sip.so <unknown>()
asterisk <unknown>()
libpthread.so.0 <unknown>()
libc.so.6 clone() (0x37e12d3bc0+6D)
=== ---> Waiting for Lock #1 (astobj2.c): MUTEX 657 internal_ao2_callback c 0x8ade080 (1)
asterisk ast_bt_get_addresses() (0x4b7440+E)
asterisk __ast_pthread_mutex_lock() (0x4b3d20+BB)
asterisk <unknown>()
asterisk __ao2_callback() (0x43d930+31)
asterisk ast_extension_state_del() (0x4d2160+AB)
chan_sip.so <unknown>()
chan_sip.so <unknown>()
asterisk <unknown>()
asterisk __ao2_callback() (0x43d930+31)
chan_sip.so <unknown>()
asterisk <unknown>()
libpthread.so.0 <unknown>()
libc.so.6 clone() (0x37e12d3bc0+6D)
=== --- ---> Locked Here: pbx.c line 4269 (handle_statechange)
=== -------------------------------------------------------------------
===
=== Thread ID: 1081657664 (tps_processing_function started at [  451] taskprocessor.c ast_taskprocessor_get())
=== ---> Waiting for Lock #0 (astobj2.c): MUTEX 842 internal_ao2_iterator_next a->c 0xa07d070 (1)
asterisk ast_bt_get_addresses() (0x4b7440+E)
asterisk __ast_pthread_mutex_lock() (0x4b3d20+BB)
asterisk <unknown>()
asterisk __ao2_callback() (0x43d930+31)
app_queue.so <unknown>()
app_queue.so <unknown>()
asterisk pbx_exec() (0x4d91e0+CF)
res_agi.so <unknown>()
res_agi.so <unknown>()
res_agi.so <unknown>()
asterisk pbx_exec() (0x4d91e0+CF)
asterisk <unknown>()
asterisk ast_spawn_extension() (0x4e3760+30)
asterisk <unknown>()
asterisk <unknown>()
asterisk <unknown>()
libpthread.so.0 <unknown>()
libc.so.6 clone() (0x37e12d3bc0+6D)
=== --- ---> Locked Here: app_queue.c line 4427 (try_calling)
=== -------------------------------------------------------------------
===
=== Thread ID: 1102928192 (pbx_thread           started at [ 5038] pbx.c ast_pbx_start())
=== ---> Lock #0 (app_queue.c): MUTEX 4427 try_calling queues 0xa07d070 (1)
asterisk ast_bt_get_addresses() (0x4b7440+E)
asterisk __ast_pthread_mutex_lock() (0x4b3d20+BB)
app_queue.so <unknown>()
app_queue.so <unknown>()
asterisk pbx_exec() (0x4d91e0+CF)
res_agi.so <unknown>()
res_agi.so <unknown>()
res_agi.so <unknown>()
asterisk pbx_exec() (0x4d91e0+CF)
asterisk <unknown>()
asterisk ast_spawn_extension() (0x4e3760+30)
asterisk <unknown>()
asterisk <unknown>()
asterisk <unknown>()
libpthread.so.0 <unknown>()
libc.so.6 clone() (0x37e12d3bc0+6D)
=== ---> Waiting for Lock #1 (astobj2.c): MUTEX 493 internal_ao2_link c 0x2aaaac1b3f40 (1)
asterisk ast_bt_get_addresses() (0x4b7440+E)
asterisk __ast_pthread_mutex_lock() (0x4b3d20+BB)
asterisk <unknown>()
asterisk __ao2_callback() (0x43d930+31)
asterisk __ao2_unlink() (0x43da00+58)
chan_sip.so <unknown>()
chan_sip.so <unknown>()
asterisk <unknown>()
asterisk __ao2_callback() (0x43d930+31)
chan_sip.so <unknown>()
asterisk <unknown>()
libpthread.so.0 <unknown>()
libc.so.6 clone() (0x37e12d3bc0+6D)
=== --- ---> Locked Here: astobj2.c line 657 (internal_ao2_callback)
=== -------------------------------------------------------------------
===
=== Thread ID: 1109186880 (pbx_thread           started at [ 5038] pbx.c ast_pbx_start())
=== ---> Waiting for Lock #0 (astobj2.c): MUTEX 657 internal_ao2_callback c 0xa07d070 (1)
asterisk ast_bt_get_addresses() (0x4b7440+E)
asterisk __ast_pthread_mutex_lock() (0x4b3d20+BB)
asterisk <unknown>()
asterisk __ao2_callback() (0x43d930+31)
app_queue.so <unknown>()
app_queue.so <unknown>()
asterisk pbx_exec() (0x4d91e0+CF)
res_agi.so <unknown>()
res_agi.so <unknown>()
res_agi.so <unknown>()
asterisk pbx_exec() (0x4d91e0+CF)
asterisk <unknown>()
asterisk ast_spawn_extension() (0x4e3760+30)
asterisk <unknown>()
asterisk <unknown>()
asterisk <unknown>()
libpthread.so.0 <unknown>()
libc.so.6 clone() (0x37e12d3bc0+6D)
=== --- ---> Locked Here: app_queue.c line 4427 (try_calling)
=== -------------------------------------------------------------------
===
=======================================================================
Comments:By: Gregory Hinton Nietsky (irroot) 2011-06-01 14:18:59

https://reviewboard.asterisk.org/r/1166/diff/#index_header

look at pbx.c patch this makes it go away but not fixes it ... the patch is a mess looking at it again ... but the idea is there im one for DL avoidance to make the problem not affect production while its fixed

By: Byron Clark (byronclark) 2011-06-01 17:25:57

irroot - That patch isn't really going to help here because, unfortunately, this deadlock isn't caused by any of the paths that your patch attempts to fix.

By: Gregory Hinton Nietsky (irroot) 2011-06-02 02:41:00

Locking the hints container in ast_extension_state_del will ensure proper locking
order and prevent this lock by allowing the state change to complete before running how often this happen i suspect its a rare random ??

there is lots optomised out so not not 100% sure



By: Byron Clark (byronclark) 2011-06-02 12:11:33

irroot - The attached patch (bug19397.diff) doesn't fix the issue. The problem, in this case, is that the caller of ast_extension_state_del already holds the dialogs lock when it calls ao2_callback(hints, ...).  The call to ao2_callback results in the hints lock being taken anyways.  We get into the deadlock because that path attempts to take the hints lock while the dialogs lock is held.

Admittedly, the deadlock is pretty rare so far.

By: Byron Clark (byronclark) 2011-07-11 10:06:28.476-0500

[^ASTERISK-17950.patch] works around the deadlock by separating the removal of a dialog from the {{dialogs}} container from the actual clearing and deleting of the dialog.

By: Matt Jordan (mjordan) 2011-12-19 09:14:19.009-0600

Byron - the solution for ASTERISK-18747 in the 1.8 branch may have resolved this issue.  Can you retest with either the latest from the 1.8 branch or 1.8.8?

Thanks

Matt

By: Byron Clark (byronclark) 2011-12-19 09:23:39.564-0600

It looks like I dropped this patch after moving to 1.8.5.0 (and adding the patch from ASTERISK-17760). I haven't seen this deadlock since then so I consider the issue resolved; sorry I forgot to update it.