[Home]

Summary:ASTERISK-07573: Crash due to running out of stack space -- nested macros
Reporter:Philip Walls (malverian)Labels:
Date Opened:2006-08-21 15:11:53Date Closed:2006-10-04 11:17:38
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Core/Configuration
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) bt-full.txt
Description:Regression not present in 1.2.7.1 causes Asterisk to crash occasionally on incoming calls that are traversing dial plan logic. We've been having these crashes approximately once per week.

I've tried calling in and dialing "000" from an external line, but I'm thinking that it's actually receiving NULL (\0\0\0...) extension rather than actually 000. The server did not crash as many times as I dialed 000.

****** ADDITIONAL INFORMATION ******

This crash began occurring very shortly after an upgrade from Asterisk 1.2.7.1 to 1.2.9, and continues to occur in 1.2.10.

My log file seems to correlate with the exten = '\0' <...> in __ast_pbx_run at frame ASTERISK-27. I have multiple backtraces of these crashes, and they all are nearly identical.

In /var/log/asterisk/messages I see this just before the crash:

Aug 21 14:09:47 WARNING[21833] chan_sip.c: No such host: 000
Aug 21 14:09:47 NOTICE[21833] app_dial.c: Unable to create channel of type 'SIP' (cause 3 - No route to destination)
Aug 21 14:09:47 WARNING[21833] app_voicemail.c: No entry in voicemail config file for '000'
Comments:By: Serge Vecher (serge-v) 2006-08-21 15:13:28

compiled with 'make dont-optimize' ?

By: Tilghman Lesher (tilghman) 2006-08-21 16:08:20

This is not what you suspect.  What this is is a crash due to you running out of stack space, mostly due to you using macros within macros within macros.

I would suggest that you find a way to combine some of your macros into a single macro, instead of doing multiple levels the way that you've done here.

You might also want to consider using Gosub()/Return() as those subroutines do not suffer the same fundamental problem of using a lot of stack space.

By: Philip Walls (malverian) 2006-08-21 17:23:06

Yes, sorry. I compiled Asterisk without dont-optimize. I will recompile after hours as this is our production server. It's odd that this problem just began to surface with the new Asterisk release, and also odd how intermittent the problem is.

By: Philip Walls (malverian) 2006-08-21 17:25:58

Also, I will audit my use of macros, but the majority of the time I'm using macros, I actually am passing arguments. This is much more convenient than using channel vars to pass data around and seems to be the intended functionality of macros.

By: Steve Murphy (murf) 2006-09-06 22:58:44

malverian-- take a look at 7780, try applying it to your 1.2 source, and try replacing all Macro app calls with matching Gosub calls. Make sure to take the macro- stuff off the context names. And ensure that Return() app calls are called at appropriate moments in all your macros. See if this makes your dialplan more robust. If not, we have something else to track down!


By: Philip Walls (malverian) 2006-09-13 10:15:41

I'm in the process of migrating my dialplan from use of Macro() to the 'GoSub() with arguments' patch in ASTERISK-7502 to this to see if it solves my problem.

If this is truly an issue with running out of stack space, could I stop these crashes (or at least cut down on them) in the meantime by increasing my limits? My stack size limit (ulimit -s) is 8192 kbytes. It does seem odd to me that I would be hitting this limit considering Macros have an artificial depth cap of seven.

On a separate note, why couldn't Macro() be rewritten to be less stack hungry, similar to the way app_stack was done rather than hacking argument support onto GoSub? Looking briefly through the code I don't see a reason why this couldn't be done. Am I missing something else that makes GoSub() is superior to macros?

By: Tilghman Lesher (tilghman) 2006-09-13 13:35:21

Macro cannot be rewritten to the way that Gosub works, because they work in completely different ways.  Macro implements its own dialplan loop internally, which terminates Macro only when the macro returns.  That is, if Macro calls another instance of Macro, you have two applications executing, one within the other, each taking up a certain amount of stack space.  Gosub, on the other hand, simply alters the next context/extension/priority and returns immediately.  This is the reason Gosub doesn't take so many system resources.

On the other hand, Gosub always needs an explicit Return, whereas Macro returns as soon as it reaches the end of available priorities.

By: Philip Walls (malverian) 2006-09-13 14:16:01

Would increasing the size of my stack with ulimit -s help me avoid crashes for the time being?

By: Tilghman Lesher (tilghman) 2006-09-13 15:01:06

No, it will not.  Asterisk starts all threads with an identical stack size, regardless of the parameter set by ulimit.

By: Serge Vecher (serge-v) 2006-10-03 16:22:16

alright, anything else that needs to be done in this bug?

By: Steve Murphy (murf) 2006-10-04 07:56:10

Malverian-- in regards to issue 7776, how goes the battle?


By: Philip Walls (malverian) 2006-10-04 08:33:22

From looking at a list of backtraces of the crashes we were having, I was able to track down and fix the part of our dialplan that was causing the stack overflow. I currently have testers slamming our development server with the new change on it to make sure I didn't create regressions in the mean time.

I was able to find the part of dialplan and reproduce the crash 100% of the time on 1.2.10. The crash however did not occur in version 1.2.12.1 when I tested it. The point where the crash was occuring had a macro depth of 6, which I brought down to 2 with my changes.

Thanks to Corydon and everyone else for their assistance with this matter. The problem is solved for me, but I wonder if some sort of checking should be added to prevent problems like this from happening to others before this bug is closed? Apparently the very simple macro depth check 7 isn't good enough in some cases.

By: Serge Vecher (serge-v) 2006-10-04 08:42:47

malverian: is is possible for you to make some abstract dialplan example that could crash a stock Asterisk? Maybe we could use it as a scarecrow with a respective notice in extension.conf or UPGRADE.txt?

By: Philip Walls (malverian) 2006-10-04 09:04:52

serge-v,

I will attempt to do this for you as soon as testing is done and my dev server is available for me to toy with again (sometime today). Keep in mind though that the crash is no longer reproducible in 1.2.12.1 as it is in 1.2.10.

By: Serge Vecher (serge-v) 2006-10-04 09:25:10

oh, that's good to know. I'm not sure if we need the "example snippet" then, if you can no longer reproduce it -- we are always expecting people to run the latest stable release....

By: Steve Murphy (murf) 2006-10-04 11:07:27

Malverian-- Since you can't recreate the bug in 1.2.12.1, I don't see the need to supply a test case. There was at least one macro-related bug fixed in the past month or so, that may or may not have some sort of impact here. Any optimization of stack space usage in any of the applications involved in your macro nesting may have also been responsible.

I propose we close this bug; There will be other battles to fight along this issue. I've modified AEL in trunk to use Gosub instead of macro when generating macro calls; and I have added a warning about nesting limits for macros in the doc strings for the Macro apps (see revs.  44343,  (44337 & 44338), and 44336 for updates to 1.2, (1.4), and trunk, respectively.

By: Steve Murphy (murf) 2006-10-04 11:10:58

See preceding note.

By: Steve Murphy (murf) 2006-10-04 11:17:37

Not much more to do.

Symptoms: In any 1.4, 1.2 or earlier release, Random crashes of Asterisk in deeply (say 4 or more levels) nested Macro calls.

Workaround: Use Corydon's cool patch, and use gosubs with args instead of Macro() calls. Make sure you add explict Return calls, or you'll be having probs for sure.