[Home]

Summary:ASTERISK-02039: [FreeBSD Only] Coredump in var resolving/substituting
Reporter:Konstantin Prokazoff (oryx)Labels:
Date Opened:2004-07-17 01:34:11Date Closed:2008-01-15 15:04:57.000-0600
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Core/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) diff.autoservice.missing.txt
( 1) diffs.txt
( 2) diffs-w-alloca-check.txt
( 3) diff-vs-cvshead.txt
( 4) diff-vs-diff.txt
( 5) revised_pthread_stacksize_diff.txt
( 6) thread_stack_diff.txt
Description:If you w'll try to resolve or substitute unavail. variable in extensions.conf processing, asterisk coredumps. Still finding out to post .diff, 100% on fbsd-5.2.1, not tested on linux.


****** ADDITIONAL INFORMATION ******

Seems to have workaround in previous CVS's, but in CVS of 16/07 still present.
Comments:By: Rob Gagnon (rgagnon) 2004-07-17 02:10:54

can you post any cli logging output prior to your crash?  or maybe a gdb backtrace.

Also, example portion of your extensions.conf file could help

By: Olle Johansson (oej) 2004-07-17 02:14:34

Had the same problem on FreeBSD 4.9, seems gone now. pbx_substitute_vars_helper crashed and core dump.

By: Olle Johansson (oej) 2004-07-17 02:15:42

...and could not replicate it on Linux.

By: Rob Gagnon (rgagnon) 2004-07-17 02:19:33

OK.  So if we can have Oryx report back with his pbx.c CVS version, we can confirm it has been eradicated?

Oryx, please do a "cvs status pbx.c" and let us know the version of the file you have

By: Konstantin Prokazoff (oryx) 2004-07-17 02:33:31

Thnx for quick answer, have too much working servers on fbsd. Still trying to replicate on rh-7.3.
gdb backtrace are:
0x... in pbx_substitute_variables_helper(c=0x...., cp1=0x... "${LEN(${FILENAME})} = 0 ", cp2=0x... "", count=4095) at pbx.c:1079,
where row 1079 is memset(var, 0, sizeof(var)).
Variable FILENAME undeclared.
Here's info from cvs (I have cvs'd yesterday):
File: pbx.c Status: Needs Patch
Working revision: 1.137
Repository revision: 1.138
Sticky Tag/Date/Options: (none)
That's all. Haven't ideas yet, maybe have to sleep some hours ;)

By: Konstantin Prokazoff (oryx) 2004-07-17 05:15:41

Maybe stack size/allocation problem?

By: Konstantin Prokazoff (oryx) 2004-07-17 09:27:13

So, all works if I change my current script as:
s,1,GotoIf($[${LEN(${FILENAME})} = 0]?s,2:s,7)
s,2,...
s,7,...
to
s,1,SetVar(FNLEN = LEN(${FILENAME}))
s,2,GotoIf($[${FNLEN} = 0]?s,3:s,8)
s,3,...
s,8,...
Trying to find why pbx_substitute_variables_helper can't resolve function name in evaluation.

By: Konstantin Prokazoff (oryx) 2004-07-17 09:41:20

This is stack/memory allocation problem.
We have minimized allocation of:
workspace, ltmp and var from 4096 to 1024.
Recursion in pbx_substitute_variables & temp has ran normally.
No coredumps.

By: Rob Gagnon (rgagnon) 2004-07-17 11:04:12

I've confirmed this does NOT happen on Linux.  I am running Fedora Core 1 (2.4.25), and tried it in these ways:

- variable not defined at all
- variable defined but with no length
- variable defined and length>0

Does BSD give any compiler warnings on pbx.c when you make asterisk?

By: Konstantin Prokazoff (oryx) 2004-07-17 11:50:03

No warnings at all. I have tried different cflags such -fstack-check, -fno-stack-limit, so on. Have compiled with profiler, tested on SMP and UP. No changes. I think this belong to platform differences. I can't now test on Linux, because this are "central" server, where lives all my services now. I'll test this on Monday, afternoon.
But, minimization of local function variables (as described in previous my note), in pbx_substitute_*, ast_log, ast_verbose (so on) gives chances on fbsd to take up to 4-5 levels of recursive calling (haven't tested for diving height ;) ).

By: Mark Spencer (markster) 2004-07-17 12:08:50

What is the simplest scenario in which this happens?  Maybe I can try running under valgrind on Linux and see if it finds anything suspicious.

By: Olle Johansson (oej) 2004-07-19 02:48:51

Happened again:
(gdb) bt
#0  0x806e1e9 in pbx_substitute_variables_helper (c=0x86c5000,
   cp1=0x85c2dc0 "Callcounter/${ACCOUNTCODE}-lastnumber=${ARG1}", cp2=0xbfa49cec "", count=8191)
   at pbx.c:1016

Seems to happen when we are using two variable substitutions in the same exten. Have to test a bit more.

By: Konstantin Prokazoff (oryx) 2004-07-19 08:06:12

oej & markster, I still using CVS of 16/07 and trying to catch, where the bug, etc. Simplest code in exten, which a way to coredump on BSD I have wrote as bugnote at 07-17-04, 09:27. So, I suppose, that BSD is more stable & reliable platform than Linux (this is not flame, and not topic to discuss), but I think there are available bug in code (maybe in different part of asterisk core), which is a reason of stack/memory chain corruption, and if asterisk now works fine on Linux, it willn't guarantee, that it will work stable on high load or script height. We need some time, and I think, we can find problem. Now working with local variable sizes set from 4096 to 1024. Having coredumps on high load ;).

By: Olle Johansson (oej) 2004-07-19 08:12:53

Still coredumps in the same spot. I am able to use two variables in a setvar or noop without coredumps. This is strange. Can only provoke it to happen in db apps.

By: Olle Johansson (oej) 2004-07-19 09:27:20

This construct makes my Asterisk crash. Please test. Seems like calling a macro from a macro eats memory and makes asterisk crash when substituting two variables in the same exten.
--------------
#0  0x806e28d in pbx_substitute_variables_helper (c=0x80fa000, cp1=0x83dbf70 "test2: ${ARG1}",
   cp2=0xbfa38cec "", count=8191) at pbx.c:1024
-----------------


exten => 20034,1,macro(test, HEJ)

[macro-test]
exten => s,1, noop
exten => s,2, noop
exten => s,3, noop
exten => s,4, noop
exten => s,5, noop(${ARG1})
exten => s,6, noop(${ARG1} ${ACCOUNTCODE})
exten => s,7, answer  
exten => s,8, playback(beep)
exten => s,9, noop
exten => s,10,macro(test2,${EXTEN})     ; count this call
exten => s,11, noop
exten => s,12, hangup  


[macro-test2]
exten => s,1, noop
exten => s,2, noop
exten => s,3, noop
exten => s,4, noop
exten => s,5, noop(test2: ${ARG1})
exten => s,6, noop(test2: ${ARG1} ${ACCOUNTCODE})
exten => s,7, noop
exten => s,8, playback(beep)
exten => s,9, noop

By: Konstantin Prokazoff (oryx) 2004-07-19 12:43:40

Coredump on (practically) ANY variable substitution on extension processing while active call, try set this in pbx.c, in void pbx_substitute_variables_helper(...):
char workspace[32768];
char ltmp[32768], var[32768];
why?

By: Konstantin Prokazoff (oryx) 2004-07-19 13:24:41

So, how I understanding, coredump -> MAJOR difference in VM+scheduling+libs(libc_r) in Linux and FreeBSD. I have added into ast_pbx_start(...) such code, after pthread_attr_setdetachstate(...) /because many of threads in asterisk created in detached state/:
pthread_attr_setstacksize(&attr, 1048576);
All works fine now, with recursive up to 10 and local variables size up to 32k.
I think this is a problem, we are overflowing stack at BSD. Can anyone test?

By: Olle Johansson (oej) 2004-07-19 13:31:17

Continued testing, changing buffer size to 44096 in pbx_substitute_vars_helper made Asterisk crash more often. It crashed on calling the first macro and the second macro(test2).

With 4096 (standard in pbx.c) the second macro (test2) called directly runs ok, but called from the first macro, the second one crashes Asterisk.

By: Mark Spencer (markster) 2004-07-19 19:04:40

Does Linux not care about the stack size or what?  Why is this FreeBSD specific?

By: Konstantin Prokazoff (oryx) 2004-07-20 01:42:53

Mark, Linux haven't native threads. Threads in linux emulated by fork() in library, and fork() provides all resources owned by process (for ex. ulimit -u ;) for childs. So, coredump of asterisk in Linux takes (in my cases) up to 100Mb, in BSD it takes 5-6Mb. BSD by default limits stack size for childs, and we must carefully useses any of child stack (thread) areas in functions (not only incore, but in channels, applications, so on). This is my opinion. Anyone?
BR,
Oryx.

By: Olle Johansson (oej) 2004-07-20 03:57:23

Oryx, you are right. The attached patch solved my problem. This maybe has to be implemented in other places as well.

By: Olle Johansson (oej) 2004-07-20 07:59:07

pthread_attr_setstacksize is part of Posix 1.0, but I can't find it on my Linux system.

By: Konstantin Prokazoff (oryx) 2004-07-20 08:11:13

Have checked on RH7.3 in /lib/libpthread-0.9.so available and check yours /usr/include/pthread.h, here is must be extern pthread_attr_setstacksize().
So, I finished .diffs for fbsd. Today I recheck & recompile all and will post it here.

By: Mark Spencer (markster) 2004-07-20 09:32:34

Why dont' we only do that if it's FreeBSD then?

By: Konstantin Prokazoff (oryx) 2004-07-20 09:46:36

I can't understand what your mean.
Linux default stack size is: 2093056, fbsd default stack size: 64k (and maybe in all compat. *bsd).

By: Olle Johansson (oej) 2004-07-20 09:49:41

Ok, if the default stacksize is 64K it seems that we need to make it higher.
But in how many places in the source?

And yes, markster, only for FreeBSD so far. However, we will need to be prepared for Mac OS X, NetBSD and OpenBSD joining the crowd. I'll e-mail the BSD list.

By: rich (rich) 2004-07-20 11:01:50

Some other ports do it this way on FreeBSD:

#ifdef THREAD_STACK_SIZE
pthread_attr_setstacksize(&attrs, THREAD_STACK_SIZE);
#endif

which could be enabled elsewhere using:

#include <sys/param.h>
#ifdef BSD
#define THREAD_STACK_SIZE
#endif

Great work Oryx!  I'm sure this is the root cause for some of the other memory corruption thats been observed.

Rich

By: Konstantin Prokazoff (oryx) 2004-07-20 13:02:00

Plz. check, one more files.
We have patched pbx.c (there are some problems), and file.c (in bsd problem with udp reception).
In res_adsi, problem with dial() application, which provides coredump with different equipment, because adsi_available() returns true in 2 cases: ADSI_AVAILABLE & ADSI_UNKNOWN.
Plz. check who can.

edited on: 07-20-04 12:50

By: rich (rich) 2004-07-20 17:30:13

diffs.txt incorporates pthread-patch-bsd-0.1.tar.gz,
lock.h.diff and module.h.diff and the following:

Surrounded pthread_attr_setstacksize() calls
with #ifdef PTHREAD_ATTR_STACKSIZE.

Reduced whitespace changes per existing
coding style.

changed char var[4096] -> char *var in
pbx_substitute_variables_helper() becuase the
patch uses malloc() to allocate it.

Moved variable definition before function
call in autoservice.c for FreeBSD 4.9.

Add LIBS+=-pthread -lcrypto in Makefile for
OpenBSD.

Moved res_parking.c patches to res_features.c.

Tested on FreeBSD 5.2.1, 4.9, OpenBSD 3.5 and
Redhat 9.

Cheers,
Rich

By: rich (rich) 2004-07-20 21:23:50

I've successfully tested the code with
#define PTHREAD_ATTR_STACKSIZE 1048576
on Redhat 9.

pthread_attr_setstacksize() is provided in NTPL
and LinuxThreads 0.8 and up.

I wonder whether it would enhance effectiveness
of testing if the various platforms used the same
code paths.  Consistent code paths makes it
easier to reason about bugs.  

I would propose that the linux platforms
set it as well; however, I don't
know how to construct an #if conditional that
would distinguish older versions, and I don't
know whether this might affect redhat 7 or 8.

Rich

By: Konstantin Prokazoff (oryx) 2004-07-21 04:41:42

in autoservice.diff there are no #ifdef for preprocessor.

By: () 2004-07-21 05:07:58

Been testing diffs.txt on FreeBSD this morning and it is far more stable.

Chris

By: rich (rich) 2004-07-21 10:04:43

diff.autoservice.missing.txt adds the missing #ifdef that Oryx mentioned.

By: Olle Johansson (oej) 2004-07-23 02:57:12

These diffs make my Asterisk stable. Time for CVS?

By: Konstantin Prokazoff (oryx) 2004-07-23 04:30:12

I think yes, for beginning. ;)
Next patches I w'll create in bug note 2112, which will stable scheduler.

By: Mark Spencer (markster) 2004-07-23 12:32:43

I do not like using malloc, especially when alloca will work just fine and is substantially more efficient.  Why do these patches change alloca calls to malloc?

By: rich (rich) 2004-07-23 15:13:55

Per Mark's request, changed two occurrences
of 'var = malloc...'  to var = alloca...' in
pbx_substitute_variables_helper().

There should be plenty of stack for it, and
it will core immediately after if allocation
fails, so it should be obvious in cases where
allocation fails.

diff-vs-diff.txt is these changes alone.

diff-vs-cvshead.txt is the complete set of
revised patches.

I'll post another note after retesting the
various platforms.

By: Rob Gagnon (rgagnon) 2004-07-23 16:01:49

Wouldn't it be nicer to put an

if (!var) {
  ast_log(LOG_ERROR, "Out of memory\n");
  // exit as appropriate
}

piece of code there, rather than just letting it core dump?

By: rich (rich) 2004-07-23 16:23:05

I was mistaken in thinking that it was consistent.  I'll do what much of the other code does.

By: rich (rich) 2004-07-23 19:00:10

diffs-w-alloca-check.txt reports 'Out of memory' if alloca fails, in the same way that much of the other code does.

Tested on FreeBSD 5.2.1, 4.9, OpenBSD 3.5 and Redhat 9.

By: Mark Spencer (markster) 2004-07-23 23:56:31

Improvement, but....

Why are we setting the stacksize attribute in asterisk.c without checking if it's defined -- and why are we doing it on linux at all?

By: rich (rich) 2004-07-24 22:41:04

in thread_stack_diff.txt,

lock.h defines ast_pthread_attr_setstacksize to
be pthread_attr_setstacksize() on BSD and empty
on others so that the thread stacksize
attribute is set only on BSD systems (not
Linux).

This covers the missing check Mark mentioned.

I'll be out of town sunday night through the
next sunday, so please feel free to alter
anything to suit your preferences, esp. while
I'm away!

And please credit Oryx for this work!

By: () 2004-07-29 07:29:59

Any reason why this patch has not been added yet?

By: () 2004-07-29 08:22:06

Just tried thread_stack_diff on SUSE 9.0 and had to make

#define PTHREAD_ATTR_STACKSIZE 1048576

global to compile and run OK.

By: Olle Johansson (oej) 2004-07-29 09:46:04

Can someone tell me which files we can delete and which to keep in this bug report?

By: Mark Spencer (markster) 2004-07-29 14:43:17

I just realized my comments from last week never made it in.  So, here they are again:

1) We're still setting stack size on linux in asterisk.c.  Is it reasonable to just have an ast_thread_new() or something instead of all this stuff?

2) Why use pthread_attr's at all on Linux?

3) What's with the return 0 vs. return -1 in file.c?  Is there some kind of a real bug here?

4) In pbx.c, I don't think that the nature of the length argument in ast_get_hint should be changed in such a way as to place the requirement upon the caller to set the size.  Why don't we just leave it as is where the function itself interprets the length properly (snprintf model) rather than asking the user to say "-1" all the time (strncpy model)?

5) snprintf does *not* need a -1 on the size

6) I just wish I had a better feel for the variable changes.

By: jacs (jacs) 2004-07-30 08:51:23

thread_stack_diff is the current file the rest can go

By: rich (rich) 2004-08-08 02:01:51

revised_pthread_stacksize_diff.txt

.. contains only the changes for setting
thread stacksize on non-linux platforms.  All
other issues have been separated out so that
we can address them individually in
subsequent posts.

Tested on RH 9, FreeBSD 5.2.1 and OpenBSD 3.5.

By: Mark Spencer (markster) 2004-08-08 13:29:20

Fixed in CVS (including changing everything to ast_pthread_create).  Please feel free to open a new bug with any other issues.

By: Digium Subversion (svnbot) 2008-01-15 15:04:57.000-0600

Repository: asterisk
Revision: 3596

U   trunk/apps/app_qcall.c
U   trunk/apps/app_rpt.c
U   trunk/asterisk.c
U   trunk/autoservice.c
U   trunk/channels/chan_alsa.c
U   trunk/channels/chan_h323.c
U   trunk/channels/chan_iax.c
U   trunk/channels/chan_iax2.c
U   trunk/channels/chan_mgcp.c
U   trunk/channels/chan_modem.c
U   trunk/channels/chan_oss.c
U   trunk/channels/chan_phone.c
U   trunk/channels/chan_sip.c
U   trunk/channels/chan_skinny.c
U   trunk/channels/chan_vofr.c
U   trunk/channels/chan_vpb.c
U   trunk/channels/chan_zap.c
U   trunk/include/asterisk/agi.h
U   trunk/include/asterisk/lock.h
U   trunk/include/asterisk/utils.h
U   trunk/manager.c
U   trunk/pbx/pbx_gtkconsole.c
U   trunk/pbx/pbx_spool.c
U   trunk/pbx/pbx_wilcalu.c
U   trunk/pbx.c
U   trunk/res/res_agi.c
U   trunk/res/res_features.c
U   trunk/res/res_musiconhold.c
U   trunk/utils.c

------------------------------------------------------------------------
r3596 | markster | 2008-01-15 15:04:57 -0600 (Tue, 15 Jan 2008) | 2 lines

Merge BSD stack size work (bug ASTERISK-2039)

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=3596