Summary:ASTERISK-12100: AGI 100% CPU utilization (deadlock?)
Reporter:Chris Coleman (reallost1)Labels:
Date Opened:2008-05-28 14:56:32Date Closed:2011-06-07 14:02:37
Versions:Frequency of
Environment:Attachments:( 0) another_bt_full
( 1) bt_full_thread
( 2) different_bt_full
( 3) valgrind.txt.pid77655
Description:Asterisk threads use 100% CPU when trying to spawn an AGI process.
It happens with multple different AGI scripts.
It happens several times per hour and has multiple threads that need to be killed manually.

>core show channels
SIP/67.55..xxx-08 4352941645@default:9 Up      AGI(agirunner)                
SIP/xxx..xxx-0894f000  8@podcast-menu:1     Up      AGI(bored)  

When the threads are killed, the channels hangup.



>bt full

#0  0x283154f7 in pthread_testcancel () from /lib/libpthread.so.2
No symbol table info available.
#1  0x28316525 in __error () from /lib/libpthread.so.2
No symbol table info available.
#2  0x2830270e in sigaction () from /lib/libpthread.so.2
No symbol table info available.
#3  0x2837c8d2 in signal () from /lib/libc.so.6
No symbol table info available.
#4  0x28431829 in agi_exec_full (chan=0x8903800, data=0xbf528f80, enhanced=0, dead=0) at res_agi.c:694
       sighup = 0x16 <Error reading address 0x16: Bad address>
       buf = "agirunner", '\0' <repeats 2038 times>
       fds = {81, 84}
       efd = -1
       pid = 57625
       args = {argc = 1, argv = 0xbf5290a4, arg = {0xbf5292b0 "agirunner", 0x0 <repeats 127 times>}}
       agi = {fd = 0, audio = 0, ctrl = 0, fast = 0, speech = 0x0}
       __PRETTY_FUNCTION__ = "agi_exec_full"
ASTERISK-1  0x080d0bb0 in pbx_extension_helper (c=0x8903800, con=0x0, context=0x8903970 "default", exten=0x89039c0 "4352941645", priority=9, label=0x0, callerid=0x845db70 "6755159140", action=E_SPAWN,
   found=0x81b55e0, combined_find_spawn=1) at strings.h:33
       e = (struct ast_exten *) 0x0
       res = 136009184
       q = {incstack = {0x0 <repeats 128 times>}, stacklen = 0, status = 5, swo = 0x0, data = 0x0, foundcontext = 0x8903970 "default"}
       passdata = "agirunner\000wait\00045\000,weird\000\00093e660f194443b2ad@\r\nCSeq: 102 BYE\r\nUser-Agent: Asterisk PBX 1.6.0-beta9\r\nContent-Length: 0\r\n\r\n", '\0' <repeats 1212 times>, "CDR(userfield)\000)", '\0' <repeats 2012 times>, "?c1(", '\0' <repeats 12 times>, "\001\000\000\000?\2071(\000?\216\bH?R?\005\236"...
       matching_action = 0
       __PRETTY_FUNCTION__ = "pbx_extension_helper"
ASTERISK-2  0x080d4f02 in __ast_pbx_run (c=0x8903800) at pbx.c:3395
       dst_exten = "?\2071(?\2071(??R??\2041(", '\0' <repeats 12 times>, "?\2041(?y+\b\000\000\000\000??R?\032\0230(?\2071(\000\000\000\000\000\000\000\000\200\235>(\210?\025\b\200\235>(h?R??]7(@\t?(\000\000\000\000h?R?Ca7(\200?\027\bH?\224\b\017\000\000\000\001\000\000\000`?\030\b\000?\224\b(?R?\005\2360(\200?\027\bH?\224\b\000\000\000\000\020\000\000\000\000?\224\b`?\030\b\200?\027\b?c1(\200?\027\bH?\224\b8?R?\001\000\000\000\000?\224\b`?\030\bX?R???0(\200?\027\bH?\224\bX?"...
       pos = 0
       digit = 0
       found = 1
       res = 0
       error = 0
       __PRETTY_FUNCTION__ = "__ast_pbx_run"
ASTERISK-3  0x080d6e5a in pbx_thread (data=0x16) at pbx.c:3760
No locals.
ASTERISK-4  0x08105b1b in dummy_start (data=0x816e400) at utils.c:872
       ret = (void *) 0x283dedd3
       a = {start_routine = 0x80d6e4c <pbx_thread>, data = 0x8903800, name = 0x870fc00 "pbx_thread", ' ' <repeats 11 times>, "started at [ 3781] pbx.c ast_pbx_start()"}
ASTERISK-5  0x283063a5 in pthread_create () from /lib/libpthread.so.2
No symbol table info available.
ASTERISK-6 0x283c3137 in _ctx_start () from /lib/libc.so.6

Comments:By: Tilghman Lesher (tilghman) 2008-05-30 09:19:16

Can you give me a stack backtrace from an unoptimized build?  There are several values which are optimized out in the attached backtrace.

By: Chris Coleman (reallost1) 2008-05-30 11:50:52

I recompiled asterisk with DONTOPTIMIZE and here is a new backtrace.

[moved bt to uploaded files]

By: Tilghman Lesher (tilghman) 2008-05-30 15:19:43

Is this backtrace from an actual crash or are you attaching to the process with gdb to get a backtrace?

Could you also provide (in gdb) a 'thread apply all bt full' and _UPLOAD_ the resulting output (i.e. not in a bugnote)?

By: Chris Coleman (reallost1) 2008-05-30 15:58:20

I have been connecting to the stuck thread directly with gdb to get the backtrace.  This time I attached to the master process and did a thread apply all bt full as requested.  It is uploaded as "bt full thread"

By: Tilghman Lesher (tilghman) 2008-05-30 16:20:28

The file appears to be missing some of the output.  It's only showing 5 threads of the total 55 threads.

By: Chris Coleman (reallost1) 2008-05-30 17:09:21

Hmm..  I tried it again and got the same thing.  Am I missing something?

By: Chris Coleman (reallost1) 2008-06-02 00:26:40

I recompiled part of the system with debug symbols and uploaded a new bt full.

By: Chris Coleman (reallost1) 2008-06-02 02:43:50

I just uploaded a different bt full.

By: Tilghman Lesher (tilghman) 2008-06-02 16:04:47

It would appear that you are experiencing memory corruption of some internal libc locks.  Please read doc/valgrind.txt.

By: Tilghman Lesher (tilghman) 2008-06-16 10:58:21

reallost1:  have you had a chance yet to run valgrind against this?

By: Chris Coleman (reallost1) 2008-06-16 17:53:14

I'll be doing that this week, I hope.  I've been sandbagging levees.

By: Tilghman Lesher (tilghman) 2008-07-14 16:50:15

Well, I'm puzzled.  Are you running any special hardware here?  Anything out of the ordinary?

By: Tilghman Lesher (tilghman) 2008-09-11 18:11:05


By: Chris Coleman (reallost1) 2008-09-11 22:34:52

I believe this to be an OS issue in the threading libraries.  I have upgraded the OS and am not having this trouble anymore.

By: Tilghman Lesher (tilghman) 2008-09-12 00:08:48

Reporter resolved issue with external libraries