Summary: | ASTERISK-14623: [patch] app_queue crashes randomly, it seems to be during call-transfers | ||
Reporter: | Raimund Sacherer (hatrix) | Labels: | |
Date Opened: | 2009-08-10 02:16:57 | Date Closed: | 2009-11-30 10:45:00.000-0600 |
Priority: | Critical | Regression? | No |
Status: | Closed/Complete | Components: | Applications/app_queue |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) backtrace.txt ( 1) queue_ao2.diff ( 2) v2_queue_ao2.diff | |
Description: | We have had lot's of crashes in app_queue in our system. As the system was never really stable it received software upgrades as well as totally new (IBM) Hardware. The crashes to app_queue are once or twice a week, sometimes more often, mostly we have NO indication in the asterisk Log-Files. Ultimately (and this may be because of a debug recompile) we get lines in the error log like these: [Aug 7 19:07:23] ERROR[27115] /usr/local/src/asterisk-1.4.26/include/asterisk/lock.h: app_queue.c line 2559 (update_queue): Error obtaining mutex: Invalid argument Funny thing is the system crashed on August the 6th (I do not have a coredump) and on August the 7th at nearly the same time: Aug 7 19:07:23 logitravel-voip2 kernel: [7905412.479435] asterisk[27115]: segfault at d5d69fd0 ip b798757e sp b5258ef0 error 5 in app_queue.so[b7984000+1c000] Aug 6 19:00:57 logitravel-voip2 kernel: [7720142.069484] asterisk[7274]: segfault at bb0f1be8 ip b796e57e sp b55f8ef0 error 4 in app_queue.so[b796b000+1c000] I have attached a backtrace, with bt, bt full and threads applied. I hope it's any help because our client (a medium sized call-center) is waiting for a solution. We consider right now downgrading to debian stable (1.4.21.2) | ||
Comments: | By: David Vossel (dvossel) 2009-11-12 16:42:00.000-0600 The patch I just uploaded should resolve this issue. If you can, please test it and report your results. By: David Vossel (dvossel) 2009-11-12 16:53:05.000-0600 The patch for this is also on reviewboard, https://reviewboard.asterisk.org/r/427/ By: David Brillert (aragon) 2009-11-23 15:25:50.000-0600 dvossel: Can you upload diff2 patch version to bug report and I will test in my lab? By: David Vossel (dvossel) 2009-11-24 10:07:19.000-0600 I uploaded the new patch By: David Brillert (aragon) 2009-11-24 10:28:01.000-0600 I'm running tests now. I'll update bug notes if I see a crash or some other weirdness else I will let test run for two days and update bug notes if no crash etc... By: David Vossel (dvossel) 2009-11-24 10:43:26.000-0600 sounds great, thanks! By: David Brillert (aragon) 2009-11-25 09:56:28.000-0600 dvossel: There were no crashes prior to testing your patch. There are no crashes as a result of using your patch. However the number of these warnings has hugely increased since installing the patch. I think this makes sense since the patch appears to address some issues with hangups and the warnings only appear during hangups. 18 hours of testing my /var/log/asterisk/messages file went from 0 bytes to 24MB and 99.9% contains only these types of warnings: [Nov 25 00:17:17] WARNING[31107] channel.c: Exceptionally long voice queue length queuing to Local/1637@default-agent-0ce0,1 [Nov 25 00:17:17] WARNING[31402] channel.c: Exceptionally long voice queue length queuing to Local/1614@default-agent-0002,1 [Nov 25 00:17:17] WARNING[31436] channel.c: Exceptionally long voice queue length queuing to Local/1638@default-agent-35b1,1 [Nov 25 00:17:17] WARNING[31107] channel.c: Exceptionally long voice queue length queuing to Local/1637@default-agent-0ce0,1 [Nov 25 00:17:17] WARNING[31402] channel.c: Exceptionally long voice queue length queuing to Local/1614@default-agent-0002,1 Therefore to be bearer of bad news there seems to be some relation of this issue to bug ASTERISK-14558 By: David Brillert (aragon) 2009-11-27 12:15:32.000-0600 dvossel: Recent developments in ASTERISK-14558 have removed the channel.c warnings. Your patch was in service for 24 hours processing 24 calls per second and no crashes. By: David Vossel (dvossel) 2009-11-30 10:31:41.000-0600 Thanks for the update aragon! I should have it committed soon. Note that I found a small error in the patch you tested that could cause a deadlock, so make sure to update to the patch I commit. By: Digium Subversion (svnbot) 2009-11-30 10:40:25.000-0600 Repository: asterisk Revision: 231437 U branches/1.4/apps/app_queue.c ------------------------------------------------------------------------ r231437 | dvossel | 2009-11-30 10:40:23 -0600 (Mon, 30 Nov 2009) | 18 lines app_queue crashes randomly, often during call-transfers In app_queue, it is possible for a call_queue to be destroyed while another object still holds a pointer to it. This patch converts call_queue objects to ao2 objects allowing them to be ref counted. This makes it safe for the queue_ent object in queue_exec() to reference it's parent call_queue even after it has left the queue. (closes issue ASTERISK-14623) Reported by: Hatrix Patches: v2_queue_ao2.diff uploaded by dvossel (license 671) Tested by: dvossel, aragon Review: https://reviewboard.asterisk.org/r/427/ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=231437 By: Digium Subversion (svnbot) 2009-11-30 10:44:59.000-0600 Repository: asterisk Revision: 231438 _U trunk/ ------------------------------------------------------------------------ r231438 | dvossel | 2009-11-30 10:44:59 -0600 (Mon, 30 Nov 2009) | 23 lines Blocked revisions 231437 via svnmerge ........ r231437 | dvossel | 2009-11-30 10:32:58 -0600 (Mon, 30 Nov 2009) | 18 lines app_queue crashes randomly, often during call-transfers In app_queue, it is possible for a call_queue to be destroyed while another object still holds a pointer to it. This patch converts call_queue objects to ao2 objects allowing them to be ref counted. This makes it safe for the queue_ent object in queue_exec() to reference it's parent call_queue even after it has left the queue. (closes issue ASTERISK-14623) Reported by: Hatrix Patches: v2_queue_ao2.diff uploaded by dvossel (license 671) Tested by: dvossel, aragon Review: https://reviewboard.asterisk.org/r/427/ ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=231438 |