[Home]

Summary:ASTERISK-14971: [patch] Crash on deeply nested while/if statements in AEL
Reporter:dillec (dillec)Labels:
Date Opened:2009-10-11 17:49:57Date Closed:2010-06-21 15:48:11
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:PBX/pbx_ael
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 7720_1.ael
( 1) bt.txt
( 2) bt-full.txt
( 3) bug16053.patch
( 4) cpuinfo.txt
( 5) extensions.ael-2010-05-04
( 6) extensions.ael-2010-05-04_depth13
( 7) extensions.ael-2010-05-04_depth13-backtrace
( 8) extensions.ael-2010-05-04_depth13-backtrace_no-optimize
Description:While reloading ael via ael reload the asterisk process runs into SIGSEGV.

This bug is curios since Asterisk starts and loads the ael very well. Also if ael reload is typed in at the cli when starting Asterisk with -c is working fine.

The issue only shows up when using asterisk manager or asterisk -r.
Comments:By: Steve Murphy (murf) 2009-10-11 20:24:26

I'll be looking into this. Interesing. 20 levels deep in if's, about 24 levels deep in match_pval... At the outset, it could be a stack overflow issue (Asterisk has hard stack size limits), or it could just be a bug in the match_pval stuff. I'll try and see. Give me some time, I'm up to my neck right now in other issues.

By: Steve Murphy (murf) 2009-11-09 11:18:08.000-0600

I am not able to reproduce this bug! Neither on 1.6.1 or trunk. Neither on 32 or 64 bit test machines.

However, there are a lot of missing items that might affect the outcome here.
There are tons of references to macros and contexts that are not included in your test dataset.

If you want to pursue this, then give me the definitions for context 7720_1_end;
the macros Play, DesitnationNumberCall, Increment, PlayNDDig, and any others that might be involved.

It's crashing while trying to find a lable in the same extension... but it's difficult to make out exactly where this label is being called/referenced.

By: michael yu (smartcrab) 2009-11-18 23:15:14.000-0600

hi murf

I have meet the same bug in the 1.4.23.1 version,i have deeply nested 7 layer if-else statements .and then execute "ael reload" .Aserisk will crash.

By: Steve Murphy (murf) 2009-11-19 00:11:41.000-0600

smartcrab, diLLec, help me reproduce this bug. I can't fix what I can't see. Give me a good, solid test case I can make crash on a test machine. Neither of you are telling me what your OS/hardware is, it might have a bearing.

By: michael yu (smartcrab) 2009-11-25 21:05:13.000-0600

hi murf

sorry for reply you so late because of being on a business trip, I only tell you
my OS/cpuinfo now.  to give you the test case in several days later.
the result via "uanme -a" is "Linux cngufipt01.wrigley.com 2.6.18-92.el5 #1 SMP Tue Jun 10 18:49:47 EDT 2008 i686 i686 i386 GNU/Linux".
the cpuinfo pls lookfor attached files

By: Leif Madsen (lmadsen) 2010-01-25 09:32:42.000-0600

This issue needs to be retested on the latest 1.6.x branch(es) in order to determine if this is still an issue.

By: Leif Madsen (lmadsen) 2010-03-23 10:26:38

Suspending this issue due to lack of feedback from the reporter, who is welcome to reopen the issue if they are able to provide the requested information. Thanks!

By: dillec (dillec) 2010-05-04 12:46:43

I'd like to reopen the issue since it still occurs with 1.6.1.18 and I'm now able to provide murf with the information he needs to reproduce the issue.

I've created a short test with deeply nested if clauses and saw that asterisk crashs at the 13th level. If you use the extensions.ael-2010-05-04 I've already attached to the case you can easily reproduce the issue.

One could say that nobody needs to have 13 levels of if clauses. But our software solution build on top of asterisk creates them upon a dialplan which the user is able to build with a GUI tool. So, for them it's no problem to keep track of the complexity.

The system I use is based on VMware Workstation 6.5.3, VM uses CentOs 5.3., the asterisk version is build with standard configure params from source. The uname -a command prints the following:

Linux ivr-redesign 2.6.18-164.11.1.el5 #1 SMP Wed Jan 20 07:32:21 EST 2010 x86_64 x86_64 x86_64 GNU/Linux

If you need more information I'm very interested that you receive it fast since the issue generates some problems with our software solution.

By: dillec (dillec) 2010-05-04 12:58:43

Sorry, just noted that I've commented the 13th level out in extensions.ael-2010-05-04. I've attached extensions.ael-2010-05-04_depth13 which does the job.

Further please note that you can't reproduce the issue if you are using

asterisk -cvvvvvvvv

with a following 'ael reload'. The way I've been able to reproduce it is when asterisk is already started and you connect on a seperate thread with

asterisk -rvvvvvv

and then issue 'ael reload'. Please remember that starting asterisk with that configuration is no problem!

Following that, getting a backtrace from the process after SIGSEG is thrown needs to be done by executing:

gdb /usr/sbin/asterisk
(gdb) run -cvvvvvvvvv
...

on the first terminal session. After that you connect to asterisk via

asterisk -rvvvvvvvvvv

and start 'ael reload'.

You should then see the SIGSEGV in gdb and be able to call the BT for it.

By: dillec (dillec) 2010-05-04 13:00:27

please also find extensions.ael-2010-05-04_depth13-backtrace attached which shows a GDB backtrace after SISEGV was thrown by asterisk

By: Leif Madsen (lmadsen) 2010-05-04 16:20:24

Your backtrace seems to have a lot of <value optimized out> in it.

Please follow the instructions at http://svn.asterisk.org/svn/asterisk/trunk/doc/backtrace.txt for how to correctly submit a backtrace.

By: dillec (dillec) 2010-05-04 17:39:46

you're right. Actually I've figured something out while trying to get a valid stacktrace:

The uname I've pasted earlier is, as mentioned, from a VM which is installed as a x64 system. Anyway the RPM packages I've used to install asterisk were built on a i686 machine.

I've then tried to compile asterisk with DONT_OPTIMIZE on the x64 system and got no problem while reloading the dialplan. So, it seams to be a 32bit problem.

To verify that I've started to compile (in turn with DONT_OPTIMIZE) on the 32bit system where the RPM packages are built and after install tried to load the dialplan there. The result was that I'm now able to load up to a depth of 14 but still got the same issue when I try to use a depth of 15.

I've created a BT/BT Full on that and attached it: extensions.ael-2010-05-04_depth13-backtrace_no-optimize

The uname of the system is:

2.6.18-128.2.1.el5PAE

free says:

# free -m
            total       used       free     shared    buffers     cached
Mem:          4051       1134       2916          0        200        739
-/+ buffers/cache:        194       3856
Swap:         1983          0       1983

cpuinfo:
# cat /proc/cpuinfo  | grep "model name"
model name      : Intel(R) Xeon(R) CPU           X5450  @ 3.00GHz
model name      : Intel(R) Xeon(R) CPU           X5450  @ 3.00GHz
model name      : Intel(R) Xeon(R) CPU           X5450  @ 3.00GHz
model name      : Intel(R) Xeon(R) CPU           X5450  @ 3.00GHz
model name      : Intel(R) Xeon(R) CPU           X5450  @ 3.00GHz
model name      : Intel(R) Xeon(R) CPU           X5450  @ 3.00GHz
model name      : Intel(R) Xeon(R) CPU           X5450  @ 3.00GHz
model name      : Intel(R) Xeon(R) CPU           X5450  @ 3.00GHz

By: dillec (dillec) 2010-05-04 17:42:53

Sorry, please don't get confused. The problem persists on 32bit + DONT_OPTIMIZE on a depth of 13. extensions.ael-2010-05-04_depth13 is still valid.

By: Leif Madsen (lmadsen) 2010-05-10 10:47:18

It sounds like it's running out of stack space to me... not sure if that is valid or not.

By: dillec (dillec) 2010-05-31 15:17:23

hey there - any news on that issue ?

By: Steve Murphy (murf) 2010-06-07 13:03:51

OK, I think lmadsen could be correct, and I think I've seen this to, on reloads.
Perhaps, a reload is a little more demanding of memory capacity than a normal start.

You have some options here.

First option:  Is it really a memory issue? Remember that Asterisk is heavily threaded. Also remember that each thread is allocated a certain amount of stack space. You can tweek this amount when you compile Asterisk.

In utils/utils.c (and asterisk/utils.h) the thread stacksize is set to 240K for 32 bit machines (512K for 64bit machines). If you change the definition in utils.h, and recompile, every thread will take up more stack space.

As long as you don't max out total stack allocation, you'll be OK. Double it and see if this solves the problem.



Option 2.

You know, you don't have to recompile your extensions.ael file every time you fire up asterisk. You can compile it into extensions.conf format, and just
include that in your normal extensions.conf file. Use "aelparse -d -w" while parked in the same dir as your extensions.ael file. A file called "extensions.conf.aeldump" will be created. If you put a #include extensions.conf.aeldump in your extensions.conf file, then the end affect is the same as starting up with the extensions.ael being present. Make sure to move/remove the extensions.ael file before starting asterisk.

The advantage of option #2 is that the ael compiler is run outside asterisk and not limited by the thread stack size.

So, experiment and see if the stack size is the problem. Or, if the AEL compiler embedded in Asterisk isn't working in the thin slice of stack available to it, run the standalone.

By: Leif Madsen (lmadsen) 2010-06-08 09:53:40

Setting to Feedback while we await information from the reporter.

murf: thanks for the feedback!

By: dillec (dillec) 2010-06-10 01:40:51

I'm very glad to hear from you murf.

Option 1 solved the problem. I've simply changed the AST_STACKSIZE line to

#define AST_STACKSIZE 1536 * 1024

in asterisk/utils.h. Are there any implications I have to be aware of when using such a high stack size?

Many thanks again for looking over my problem and your detailed explanations.

By: Steve Murphy (murf) 2010-06-10 13:03:14

Yes, there are big implications! Asterisk, when idle, runs several threads, each for some separate purpose. During call processing, new threads are created. You have to make sure that all these threads added together do not exhaust the supply of virtual memory, with each thread getting so much larger a space... It might be better to increase the number from the original value slightly until the problem goes away. Then add in a percentage for safety, and try running that way.

By: Steve Murphy (murf) 2010-06-16 10:10:59

I'm closing this bug as "won't fix" because the fix is increase the stack space for the thread that runs the AEL compiler (in this case). There is no mech. in Asterisk to create one thread with a different stack size than the others, and bumping up the size for all threads could end up limiting the total number of threads that can be created under Asterisk to an unacceptably low number.

Workarounds are to go ahead and increase the default stack size; or to run 'aelparse -w -d -q'  outside Asterisk, and load the resulting extensions.conf format file via an include in your extensions.conf file.

A more permanent fix would be to allow  modules to be loaded with different stack sizes, but this would be a lot of work, and would really only benefit a very small number of users!

By: Jeff Peeler (jpeeler) 2010-06-18 09:51:59

Hey murf, changing the character arrays in gen_prios to be malloced seems to fix the problem. I'll check to make sure there's nothing wrong with this approach.

By: Digium Subversion (svnbot) 2010-06-18 14:28:24

Repository: asterisk
Revision: 271399

U   branches/1.4/pbx/pbx_ael.c

------------------------------------------------------------------------
r271399 | jpeeler | 2010-06-18 14:28:23 -0500 (Fri, 18 Jun 2010) | 11 lines

Fix crash when parsing some heavily nested statements in AEL on reload.

Due to the recursion used when compiling AEL in gen_prios, all the stack space
was being consumed when parsing some AEL that contained nesting 13 levels deep.
Changing a few large buffers to be heap allocated fixed the crash, although I
did not test how many more levels can now be safely used.

(closes issue ASTERISK-14971)
Reported by: diLLec
Tested by: jpeeler

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=271399

By: Digium Subversion (svnbot) 2010-06-18 16:32:09

Repository: asterisk
Revision: 271483

_U  trunk/
U   trunk/include/asterisk/pval.h
U   trunk/pbx/pbx_ael.c
U   trunk/res/ael/pval.c

------------------------------------------------------------------------
r271483 | jpeeler | 2010-06-18 16:32:09 -0500 (Fri, 18 Jun 2010) | 18 lines

Merged revisions 271399 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
 r271399 | jpeeler | 2010-06-18 14:28:24 -0500 (Fri, 18 Jun 2010) | 11 lines
 
 Fix crash when parsing some heavily nested statements in AEL on reload.
 
 Due to the recursion used when compiling AEL in gen_prios, all the stack space
 was being consumed when parsing some AEL that contained nesting 13 levels deep.
 Changing a few large buffers to be heap allocated fixed the crash, although I
 did not test how many more levels can now be safely used.
 
 (closes issue ASTERISK-14971)
 Reported by: diLLec
 Tested by: jpeeler
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=271483

By: Digium Subversion (svnbot) 2010-06-18 16:33:57

Repository: asterisk
Revision: 271484

_U  branches/1.6.2/
U   branches/1.6.2/include/asterisk/pval.h
U   branches/1.6.2/pbx/pbx_ael.c
U   branches/1.6.2/res/ael/pval.c

------------------------------------------------------------------------
r271484 | jpeeler | 2010-06-18 16:33:57 -0500 (Fri, 18 Jun 2010) | 25 lines

Merged revisions 271483 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
 r271483 | jpeeler | 2010-06-18 16:32:09 -0500 (Fri, 18 Jun 2010) | 18 lines
 
 Merged revisions 271399 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
   r271399 | jpeeler | 2010-06-18 14:28:24 -0500 (Fri, 18 Jun 2010) | 11 lines
   
   Fix crash when parsing some heavily nested statements in AEL on reload.
   
   Due to the recursion used when compiling AEL in gen_prios, all the stack space
   was being consumed when parsing some AEL that contained nesting 13 levels deep.
   Changing a few large buffers to be heap allocated fixed the crash, although I
   did not test how many more levels can now be safely used.
   
   (closes issue ASTERISK-14971)
   Reported by: diLLec
   Tested by: jpeeler
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=271484

By: Michiel van Baak (mvanbaak) 2010-06-20 08:02:17

This one seriously breaks asterisk.

I have this in my extensions.ael:
if ( "${CALLERID(num):0:2}" = "31" ) {
   Set(CALLERID(num)=0${CALLERID(num):2});
}

and it gets loaded as:

4. GotoIf($[ "${C?5:6)                        [pbx_ael]
5. Set(CALLERID(num)=0${CALLERID(num):2})     [pbx_ael]
6. NoOp(Finish )                              [pbx_ael]

By: Elazar Broad (ebroad) 2010-06-21 08:46:35

Ditto on that, my macro names are getting truncated:


[Jun 21 09:44:32] NOTICE[15916] pbx.c: No such label 'interna' in extension '18005551212' in context 'from-trunk'
[Jun 21 09:44:32] WARNING[15916] pbx.c: Priority 'interna' must be a number > 0, or valid label
[Jun 21 09:44:32] ERROR[15916] app_stack.c: Gosub address is invalid: 'interna(ringgroup1,internal)'
[Jun 21 09:44:34] NOTICE[15917] pbx.c: No such label 'interna' in extension '18005551212' in context 'from-trunk'
[Jun 21 09:44:34] WARNING[15917] pbx.c: Priority 'interna' must be a number > 0, or valid label
[Jun 21 09:44:34] ERROR[15917] app_stack.c: Gosub address is invalid: 'interna(ringgroup1,internal)'


interna should be internal-dial()...



By: Jeff Peeler (jpeeler) 2010-06-21 12:10:23

I know the attached patch resolves the issue, but since you both seem to be actively using AEL I wanted to make sure you don't notice anything else wrong.

By: Elazar Broad (ebroad) 2010-06-21 12:39:40

Shouldn't the attached patch target pval.c, not pbx_ael.c?

By: Jeff Peeler (jpeeler) 2010-06-21 14:28:38

Well, the attached patch is for 1.4. To assist testing efforts (since the patch won't apply to the other branches) I'm just going to go ahead and commit it.

By: Digium Subversion (svnbot) 2010-06-21 15:37:47

Repository: asterisk
Revision: 271552

U   branches/1.4/pbx/pbx_ael.c

------------------------------------------------------------------------
r271552 | jpeeler | 2010-06-21 15:37:46 -0500 (Mon, 21 Jun 2010) | 7 lines

Do not use sizeof to calculate size of a heap allocated character array.

Change left out from 271399.

(closes issue ASTERISK-14971)
Reported by: diLLec

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=271552

By: Digium Subversion (svnbot) 2010-06-21 15:46:53

Repository: asterisk
Revision: 271554

_U  trunk/
U   trunk/res/ael/pval.c

------------------------------------------------------------------------
r271554 | jpeeler | 2010-06-21 15:46:52 -0500 (Mon, 21 Jun 2010) | 14 lines

Merged revisions 271552 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
 r271552 | jpeeler | 2010-06-21 15:37:47 -0500 (Mon, 21 Jun 2010) | 7 lines
 
 Do not use sizeof to calculate size of a heap allocated character array.
 
 Change left out from 271399.
 
 (closes issue ASTERISK-14971)
 Reported by: diLLec
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=271554

By: Digium Subversion (svnbot) 2010-06-21 15:48:09

Repository: asterisk
Revision: 271555

_U  branches/1.6.2/
U   branches/1.6.2/res/ael/pval.c

------------------------------------------------------------------------
r271555 | jpeeler | 2010-06-21 15:48:09 -0500 (Mon, 21 Jun 2010) | 21 lines

Merged revisions 271554 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
 r271554 | jpeeler | 2010-06-21 15:46:53 -0500 (Mon, 21 Jun 2010) | 14 lines
 
 Merged revisions 271552 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
   r271552 | jpeeler | 2010-06-21 15:37:47 -0500 (Mon, 21 Jun 2010) | 7 lines
   
   Do not use sizeof to calculate size of a heap allocated character array.
   
   Change left out from 271399.
   
   (closes issue ASTERISK-14971)
   Reported by: diLLec
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=271555