Summary: | ASTERISK-03707: [patch] deadlock in app_queue compare_weight() | ||
Reporter: | k3v (k3v) | Labels: | |
Date Opened: | 2005-03-19 10:03:30.000-0600 | Date Closed: | 2008-01-15 15:28:12.000-0600 |
Priority: | Blocker | Regression? | No |
Status: | Closed/Complete | Components: | Applications/app_queue |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) queue-weight-deadlock-fix.patch.txt ( 1) queue-weight-deadlock-fix2.patch.txt ( 2) queue-weight-deadlock-fix3.patch.txt ( 3) queue-weight-deadlock-fix4.patch.txt | |
Description: | When I submitted the compare_weight patch in December, I was unable to produce this deadlock, but it can happen under a certain condition. If another queue-walking function, such as set_member_paused() or manager_queues_status(), happens to hold &qlock after try_calling() locks the current &q->lock, then compare_weight() will block on &qlock, causing &q->lock to never release. I'm not sure unlocking &q->lock briefly is the best idea or not. Could this cause some unexpected change if a perfectly timed reload moved the queue lists around? This does accurately and consistently avoid the deadlock, however. I threw 20/sec AddQueueMember()/RemoveQueueMember() and 50/sec UnpauseQueueMember() at this for 20 minutes with 6 calls in each queue, one weight 50, one weight 0, with no problem whatsoever. The alternative is a trylock loop, but in testing that affected accuracy and let a tiny number of lower-weight calls leak through. In order to reproduce this consistently, I was calling set_member_paused() about 50 times/sec. Eventually one would run at "the same time" as compare_weight(). ****** ADDITIONAL INFORMATION ****** disclaimer on file built on CVS 03/11/2005, but patched against 3/19/2005 | ||
Comments: | By: k3v (k3v) 2005-03-19 10:12:03.000-0600 this also has some code cleanups By: k3v (k3v) 2005-03-19 10:35:18.000-0600 fix2 patch moves the &q->lock in the walking loop to the right place. brain malfunction when applying to cvs. By: Mark Spencer (markster) 2005-03-19 11:43:31.000-0600 Added to CVS, thanks! I think you're safe with your unlock/lock, but we can certainly address it when we AST_OBJ'ify queues. By: k3v (k3v) 2005-03-21 12:48:35.000-0600 this didn't solve the deadlock. it occured again under brutal testing. will post update later today. By: k3v (k3v) 2005-03-21 14:57:37.000-0600 fix3 is applied against the already committed patch fix2. This locks &qlock in try_calling() before wandering off into ring_one(). There just didn't seem to be any better way. I'm going to do some rwlock development when I find time, but this will fix the block until then. By: Mark Spencer (markster) 2005-03-21 15:37:40.000-0600 How about a compromise and we only hold qlock for try_calling *if* use_weight is enabled? By: k3v (k3v) 2005-03-21 16:05:53.000-0600 Done. I've been trying to think of a better place to compare weights, like during metric recalc, but I'm boggling. I'm open to suggestions, or we can chat in IRC sometime. edited on: 03-21-05 16:07 By: Mark Spencer (markster) 2005-03-21 16:13:56.000-0600 Fixed in CVS head. Thanks! By: Digium Subversion (svnbot) 2008-01-15 15:27:54.000-0600 Repository: asterisk Revision: 5205 U trunk/apps/app_queue.c ------------------------------------------------------------------------ r5205 | markster | 2008-01-15 15:27:53 -0600 (Tue, 15 Jan 2008) | 2 lines Fix queue stuff (bug ASTERISK-3707) ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=5205 By: Digium Subversion (svnbot) 2008-01-15 15:28:12.000-0600 Repository: asterisk Revision: 5225 U trunk/apps/app_queue.c ------------------------------------------------------------------------ r5225 | markster | 2008-01-15 15:28:11 -0600 (Tue, 15 Jan 2008) | 2 lines Fix queue weight issue (bug ASTERISK-3707, take 2) ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=5225 |