[Home]

Summary:ASTERISK-14623: [patch] app_queue crashes randomly, it seems to be during call-transfers
Reporter:Raimund Sacherer (hatrix)Labels:
Date Opened:2009-08-10 02:16:57Date Closed:2009-11-30 10:45:00.000-0600
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Applications/app_queue
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) backtrace.txt
( 1) queue_ao2.diff
( 2) v2_queue_ao2.diff
Description:We have had lot's of crashes in app_queue in our system. As the system was never really stable it received software upgrades as well as totally new (IBM) Hardware.

The crashes to app_queue are once or twice a week, sometimes more often, mostly we have NO indication in the asterisk Log-Files. Ultimately (and this may be because of a debug recompile) we get lines in the error log like these:

[Aug  7 19:07:23] ERROR[27115] /usr/local/src/asterisk-1.4.26/include/asterisk/lock.h: app_queue.c line 2559 (update_queue): Error obtaining mutex: Invalid argument

Funny thing is the system crashed on August the 6th (I do not have a coredump) and on August the 7th at nearly the same time:

Aug  7 19:07:23 logitravel-voip2 kernel: [7905412.479435] asterisk[27115]: segfault at d5d69fd0 ip b798757e sp b5258ef0 error 5 in app_queue.so[b7984000+1c000]
Aug  6 19:00:57 logitravel-voip2 kernel: [7720142.069484] asterisk[7274]: segfault at bb0f1be8 ip b796e57e sp b55f8ef0 error 4 in app_queue.so[b796b000+1c000]


I have attached a backtrace, with bt, bt full and threads applied. I hope it's any help because our client (a medium sized call-center) is waiting for a solution.

We consider right now downgrading to debian stable (1.4.21.2)
Comments:By: David Vossel (dvossel) 2009-11-12 16:42:00.000-0600

The patch I just uploaded should resolve this issue.   If you can, please test it and report your results.

By: David Vossel (dvossel) 2009-11-12 16:53:05.000-0600

The patch for this is also on reviewboard, https://reviewboard.asterisk.org/r/427/

By: David Brillert (aragon) 2009-11-23 15:25:50.000-0600

dvossel: Can you upload diff2 patch version to bug report and I will test in my lab?

By: David Vossel (dvossel) 2009-11-24 10:07:19.000-0600

I uploaded the new patch

By: David Brillert (aragon) 2009-11-24 10:28:01.000-0600

I'm running tests now. I'll update bug notes if I see a crash or some other weirdness else I will let test run for two days and update bug notes if no crash etc...

By: David Vossel (dvossel) 2009-11-24 10:43:26.000-0600

sounds great, thanks!

By: David Brillert (aragon) 2009-11-25 09:56:28.000-0600

dvossel:
There were no crashes prior to testing your patch.
There are no crashes as a result of using your patch.
However the number of these warnings has hugely increased since installing the patch.  I think this makes sense since the patch appears to address some issues with hangups and the warnings only appear during hangups.
18 hours of testing my /var/log/asterisk/messages file went from 0 bytes to 24MB and 99.9% contains only these types of warnings:
[Nov 25 00:17:17] WARNING[31107] channel.c: Exceptionally long voice queue length queuing to Local/1637@default-agent-0ce0,1
[Nov 25 00:17:17] WARNING[31402] channel.c: Exceptionally long voice queue length queuing to Local/1614@default-agent-0002,1
[Nov 25 00:17:17] WARNING[31436] channel.c: Exceptionally long voice queue length queuing to Local/1638@default-agent-35b1,1
[Nov 25 00:17:17] WARNING[31107] channel.c: Exceptionally long voice queue length queuing to Local/1637@default-agent-0ce0,1
[Nov 25 00:17:17] WARNING[31402] channel.c: Exceptionally long voice queue length queuing to Local/1614@default-agent-0002,1

Therefore to be bearer of bad news there seems to be some relation of this issue to bug ASTERISK-14558

By: David Brillert (aragon) 2009-11-27 12:15:32.000-0600

dvossel: Recent developments in ASTERISK-14558 have removed the channel.c warnings.
Your patch was in service for 24 hours processing 24 calls per second and no crashes.

By: David Vossel (dvossel) 2009-11-30 10:31:41.000-0600

Thanks for the update aragon!  I should have it committed soon.  Note that I found a small error in the patch you tested that could cause a deadlock, so make sure to update to the patch I commit.

By: Digium Subversion (svnbot) 2009-11-30 10:40:25.000-0600

Repository: asterisk
Revision: 231437

U   branches/1.4/apps/app_queue.c

------------------------------------------------------------------------
r231437 | dvossel | 2009-11-30 10:40:23 -0600 (Mon, 30 Nov 2009) | 18 lines

app_queue crashes randomly, often during call-transfers

In app_queue, it is possible for a call_queue to be destroyed
while another object still holds a pointer to it.  This patch
converts call_queue objects to ao2 objects allowing them to be
ref counted.  This makes it safe for the queue_ent object in
queue_exec() to reference it's parent call_queue even after it
has left the queue.

(closes issue ASTERISK-14623)
Reported by: Hatrix
Patches:
     v2_queue_ao2.diff uploaded by dvossel (license 671)
Tested by: dvossel, aragon

Review: https://reviewboard.asterisk.org/r/427/


------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=231437

By: Digium Subversion (svnbot) 2009-11-30 10:44:59.000-0600

Repository: asterisk
Revision: 231438

_U  trunk/

------------------------------------------------------------------------
r231438 | dvossel | 2009-11-30 10:44:59 -0600 (Mon, 30 Nov 2009) | 23 lines

Blocked revisions 231437 via svnmerge

........
 r231437 | dvossel | 2009-11-30 10:32:58 -0600 (Mon, 30 Nov 2009) | 18 lines
 
 app_queue crashes randomly, often during call-transfers
 
 In app_queue, it is possible for a call_queue to be destroyed
 while another object still holds a pointer to it.  This patch
 converts call_queue objects to ao2 objects allowing them to be
 ref counted.  This makes it safe for the queue_ent object in
 queue_exec() to reference it's parent call_queue even after it
 has left the queue.
 
 (closes issue ASTERISK-14623)
 Reported by: Hatrix
 Patches:
       v2_queue_ao2.diff uploaded by dvossel (license 671)
 Tested by: dvossel, aragon
 
 Review: https://reviewboard.asterisk.org/r/427/
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=231438