Summary:ASTERISK-03707: [patch] deadlock in app_queue compare_weight()
Reporter:k3v (k3v)Labels:
Date Opened:2005-03-19 10:03:30.000-0600Date Closed:2008-01-15 15:28:12.000-0600
Versions:Frequency of
Environment:Attachments:( 0) queue-weight-deadlock-fix.patch.txt
( 1) queue-weight-deadlock-fix2.patch.txt
( 2) queue-weight-deadlock-fix3.patch.txt
( 3) queue-weight-deadlock-fix4.patch.txt
Description:When I submitted the compare_weight patch in December, I was unable to produce this deadlock, but it can happen under a certain condition.  If another queue-walking function, such as set_member_paused() or manager_queues_status(), happens to hold &qlock after try_calling() locks the current &q->lock, then compare_weight() will block on &qlock, causing &q->lock to never release.

I'm not sure unlocking &q->lock briefly is the best idea or not.  Could this cause some unexpected change if a perfectly timed reload moved the queue lists around?  This does accurately and consistently avoid the deadlock, however.  I threw 20/sec AddQueueMember()/RemoveQueueMember() and 50/sec UnpauseQueueMember() at this for 20 minutes with 6 calls in each queue, one weight 50, one weight 0, with no problem whatsoever.

The alternative is a trylock loop, but in testing that affected accuracy and let a tiny number of lower-weight calls leak through.  

In order to reproduce this consistently, I was calling set_member_paused() about 50 times/sec.  Eventually one would run at "the same time" as compare_weight().


disclaimer on file

built on CVS 03/11/2005, but patched against 3/19/2005
Comments:By: k3v (k3v) 2005-03-19 10:12:03.000-0600

this also has some code cleanups

By: k3v (k3v) 2005-03-19 10:35:18.000-0600

fix2 patch moves the &q->lock in the walking loop to the right place.  brain malfunction when applying to cvs.

By: Mark Spencer (markster) 2005-03-19 11:43:31.000-0600

Added to CVS, thanks!  I think you're safe with your unlock/lock, but we can certainly address it when we AST_OBJ'ify queues.

By: k3v (k3v) 2005-03-21 12:48:35.000-0600

this didn't solve the deadlock.  it occured again under brutal testing.  will post update later today.

By: k3v (k3v) 2005-03-21 14:57:37.000-0600

fix3 is applied against the already committed patch fix2.  This locks &qlock in try_calling() before wandering off into ring_one().  There just didn't seem to be any better way.  I'm going to do some rwlock development when I find time, but this will fix the block until then.

By: Mark Spencer (markster) 2005-03-21 15:37:40.000-0600

How about a compromise and we only hold qlock for try_calling *if* use_weight is enabled?

By: k3v (k3v) 2005-03-21 16:05:53.000-0600

Done.  I've been trying to think of a better place to compare weights, like during metric recalc, but I'm boggling.  I'm open to suggestions, or we can chat in IRC sometime.

edited on: 03-21-05 16:07

By: Mark Spencer (markster) 2005-03-21 16:13:56.000-0600

Fixed in CVS head.  Thanks!

By: Digium Subversion (svnbot) 2008-01-15 15:27:54.000-0600

Repository: asterisk
Revision: 5205

U   trunk/apps/app_queue.c

r5205 | markster | 2008-01-15 15:27:53 -0600 (Tue, 15 Jan 2008) | 2 lines

Fix queue stuff (bug ASTERISK-3707)



By: Digium Subversion (svnbot) 2008-01-15 15:28:12.000-0600

Repository: asterisk
Revision: 5225

U   trunk/apps/app_queue.c

r5225 | markster | 2008-01-15 15:28:11 -0600 (Tue, 15 Jan 2008) | 2 lines

Fix queue weight issue (bug ASTERISK-3707, take 2)