Summary:ASTERISK-11092: Repeatedly calling Page with a LOCAL channel crashes asterisk
Reporter:dtyoo (dtyoo)Labels:
Date Opened:2007-12-20 13:41:41.000-0600Date Closed:2007-12-21 10:57:10.000-0600
Versions:Frequency of
Environment:Attachments:( 0) bt-btfull-malloc-debug.txt
( 1) page-crash.txt
( 2) valgrind.txt
( 3) valgrind2.txt
Description:We recently had a situation where we had a dialplan programming error that caused an infinite loop.  Placing a call into this piece of dialplan logic re-producibly crashes asterisk.  While the dialplan logic is certainly in error, I thought that someone could get something out of looking at these backtraces.  It looks like there is some sort of memory corruption going on that ultimately takes down asterisk.

The dialplan that causes the problem looks like this:

exten => s,1,Set(TIMEOUT(absolute)=60)
exten => s,2,Page(LOCAL/1234@tpn-dev,,)

exten => 1234,1,Goto(page-crash,s,1)

Calling 1234@tpn-dev triggers the loop which ultimately crashes asterisk.  I don't expect good things to happen with a dialplan like this, but I'm wondering if this scenario is exposing an underlying memory problem somewhere.  I will upload backtrace info as well.


Re-produced with asterisk
Comments:By: Tilghman Lesher (tilghman) 2007-12-20 14:13:22.000-0600

Please read and follow the instructions in doc/valgrind.txt.

By: dtyoo (dtyoo) 2007-12-20 22:09:56.000-0600


With MALLOC_DEBUG enabled I get different results under the same scenario.  Without running under valgrind, asterisk still crashes, but it seems to take longer to do so, and the bt looks different even though there is still corruption.  I've uploaded it so you can see.

Under valgrind asterisk doesn't crash, it hangs.  After an initial burst of looping activity, the console is just full of messages "chan_iax2.c:6699 socket_read: Out of idle IAX2 threads for I/O, pausing!" over and over.  Asterisk isn't taking any calls at this point.  valgrind.txt shows a problem similar to the original bt I posted.

One other thing worth mentioning is that we run "ulimit -n 63536" before starting asterisk on our servers.  It seems that this is required in order for the problem to happen.  If I leave this at the default (1024) I cannot reproduce the issue.

By: Mark Michelson (mmichelson) 2007-12-21 10:57:10.000-0600

The issue here was that list elements were being freed during a traversal of the list. I have fixed this in revision 94468 of 1.4 and in revision 94477 of trunk. Thanks for your help in getting this solved.