[Home]

Summary:ASTERISK-11321: Possible deadlock on realtime queues.
Reporter:Fernando Lujan (flujan)Labels:
Date Opened:2008-01-29 07:07:32.000-0600Date Closed:2008-02-08 12:52:00.000-0600
Priority:BlockerRegression?No
Status:Closed/CompleteComponents:Applications/app_queue
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 11862.patch
( 1) locks.txt
Description:After some time working queues stop working. I try to place calls and it works, i can even place members on a queue but the callers are not delivered to the agents.


Recompiled asterisk with DONT_OPTIMIZE and DEBUG_THREADS and the core show locks of the problems follows.

****** ADDITIONAL INFORMATION ******

Scenario:
15 differente queues.
One queue with a high traffic volume and 60 members.

After some time working queues stop working. I try to place calls and it works, i can even place members on a queue but the callers are not delivered to the agents.


Recompiled asterisk with DONT_OPTIMIZE and DEBUG_THREADS and the core show locks of the problems follows.

show queue "queue name" and other commands also do not respond.
Comments:By: Fernando Lujan (flujan) 2008-01-29 07:23:20.000-0600

Just int time I also debug the log and aterisk stop making queries to the database. Since it continues to write the cdr it is probably a problem with the app_queue not the pgsql connector.



By: Mark Michelson (mmichelson) 2008-01-29 11:41:35.000-0600

There is a three-way deadlock here, and like most deadlocks it is caused by an invalid locking order in certain cases. Two of the three threads involved in the deadlock (threads 3070786448 and 3033901968) are doing what seems logical. The other thread (thread 3036162960), however, is where the real problem seems to be. The add_to_interfaces function locks the interfaces lock, and this lock is all that is needed. However, prior to being called, in some cases it has the queue list locked, and in other cases it has a queue locked. What needs to happen here is that no locks need to be held prior to calling add_to_interfaces.

By: Norman Franke (norman) 2008-01-29 12:28:18.000-0600

I'm getting the same (or similar) behavior with tons of failed locks in do_devstate_changes. However, my "core show locks" doesn't seem to show an obvious deadlock as this one does.

BTW, can these "Tried and failed to get Lock" cause general instability? If so, that could explain two issues I'm having.

By: Mark Michelson (mmichelson) 2008-01-29 13:11:04.000-0600

norman: the "tried and failed to get lock" messages indicate that a trylock failed. I've seen some cases (issue ASTERISK-11181 for instance) where there wasn't an obvious deadlock like there is here, but the number of "tried and failed to get loock" messages for some of the threads numbered in the millions, indicating there was some sort of loop that could not progress due to its inability to grab a lock. Most of the time, though, the tried and failed to get lock messages do not indicate a deeper problem, since they are expected to fail every now and again.

By: Mark Michelson (mmichelson) 2008-01-29 16:41:58.000-0600

After looking at the code more carefully, I actually found that add_to_interfaces was actually very consistent in the order used for locking (always queue list, then individual queue, then interface list). What I found was inconsistent, however, was the mess of locking done in remove_from_interfaces. I moved a statement in remove_from_interfaces so that the locking would be less messy, hopefully. I would appreciate your testing 11862.patch to see if the deadlock happens with it. Thanks!

By: Mark Michelson (mmichelson) 2008-02-06 14:27:44.000-0600

I haven't heard anything with regards to this for a week now. Is it safe to assume that this patch is working correctly?

By: Mark Michelson (mmichelson) 2008-02-08 12:41:34.000-0600

flujan reported to me on IRC that the patch is working for him properly. I am goin g to close this. If necessary, I can reopen later.

By: Digium Subversion (svnbot) 2008-02-08 12:47:33.000-0600

Repository: asterisk
Revision: 103120

U   branches/1.4/apps/app_queue.c

------------------------------------------------------------------------
r103120 | mmichelson | 2008-02-08 12:47:31 -0600 (Fri, 08 Feb 2008) | 10 lines

Prevent a potential three-thread deadlock. Also added a comment block
to explicitly state the locking order necessary inside app_queue.

(closes issue ASTERISK-11321)
Reported by: flujan
Patches:
     11862.patch uploaded by putnopvut (license 60)
 Tested by: flujan


------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=103120

By: Digium Subversion (svnbot) 2008-02-08 12:51:59.000-0600

Repository: asterisk
Revision: 103121

_U  trunk/
U   trunk/apps/app_queue.c

------------------------------------------------------------------------
r103121 | mmichelson | 2008-02-08 12:51:49 -0600 (Fri, 08 Feb 2008) | 18 lines

Merged revisions 103120 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r103120 | mmichelson | 2008-02-08 12:48:17 -0600 (Fri, 08 Feb 2008) | 10 lines

Prevent a potential three-thread deadlock. Also added a comment block
to explicitly state the locking order necessary inside app_queue.

(closes issue ASTERISK-11321)
Reported by: flujan
Patches:
     11862.patch uploaded by putnopvut (license 60)
 Tested by: flujan


........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=103121