Summary:ASTERISK-18205: Deadlock in app_queue when loading real-time queues and handling state change.
Reporter:Steven Wheeler (swheeler)Labels:
Date Opened:2011-07-28 14:11:34Date Closed:2011-07-28 19:04:11
Status:Closed/CompleteComponents:Applications/app_queue PBX/General
Versions:Frequency of
duplicatesASTERISK-17760 [patch] deadlock in chan_sip
Environment:CentOS Linux 2.6.18-238.9.1.el5PAE #1 SMP Tue Apr 12 18:52:55 EDT 2011 i686 i686 i386 GNU/LinuxAttachments:( 0) core-show-locks.2011-07-28.txt
( 1) gdb.txt
Description:We are seeing deadlocks when a queue call is loading the queue & member from the real-time database at the same time that another thread is updating the state of a different agent.  This happens a few times a week when the queues are under higher than average load.  I have gathered the 'core show locks' output taken while the deadlock was occurring.  It indicates that the locks are being acquired out of order in one of the threads, I don't know the asterisk source well enough to know which order is correct.

tps_processing_function acquires in this order:
Lock #0 &conlock(0x8216240) in ast_rdlock_contexts(pbx.c:9367)
Lock #1 &(&hints)->lock(0x8217508) in handle_statechange(pbx.c:3861)
Waiting for Lock #2 &p->priv_data.lock(0xb7587f58) in ao2_lock(astobj2.c:164)  This lock is already held as #0 in pbx_thread thread.

pbx_thread acquires in this order:
Lock #0 queues(0xb7587f58) in load_realtime_queue(app_queue.c:1956)
Lock #1 q(0x90d2858) in find_queue_by_name_rt(app_queue.c:1803)
Lock #2 &conlock(0x8216240) in ast_rdlock_contexts(pbx.c:9367)
Waiting for Lock #3 &(&hints)->lock(0x8217508) in ast_add_hint(pbx.c:4076)  This lock is already held as #1 in tps_processing_function thread.

I will upload the full output as well as the core file.  Please let me know if there is anymore information you need to debug and I will try to get it next time the deadlock pops up.
Comments:By: Steven Wheeler (swheeler) 2011-07-28 14:12:35.727-0500

Output of asterisk -rx 'core show locks' while the system was deadlocked.

By: Steven Wheeler (swheeler) 2011-07-28 14:13:32.750-0500

The core file is 136 MB so I can't upload it here.  I would be happy to perform any actions on it and upload the output.

By: Steven Wheeler (swheeler) 2011-07-28 14:16:06.269-0500

Output of gdb thread apply all bt.

By: Richard Mudgett (rmudgett) 2011-07-28 19:04:11.783-0500

Thanks for the report.  This is a duplicate of a deadlock already fixed.  See ASTERISK-17760