Summary: | ASTERISK-02039: [FreeBSD Only] Coredump in var resolving/substituting | ||
Reporter: | Konstantin Prokazoff (oryx) | Labels: | |
Date Opened: | 2004-07-17 01:34:11 | Date Closed: | 2008-01-15 15:04:57.000-0600 |
Priority: | Critical | Regression? | No |
Status: | Closed/Complete | Components: | Core/General |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) diff.autoservice.missing.txt ( 1) diffs.txt ( 2) diffs-w-alloca-check.txt ( 3) diff-vs-cvshead.txt ( 4) diff-vs-diff.txt ( 5) revised_pthread_stacksize_diff.txt ( 6) thread_stack_diff.txt | |
Description: | If you w'll try to resolve or substitute unavail. variable in extensions.conf processing, asterisk coredumps. Still finding out to post .diff, 100% on fbsd-5.2.1, not tested on linux. ****** ADDITIONAL INFORMATION ****** Seems to have workaround in previous CVS's, but in CVS of 16/07 still present. | ||
Comments: | By: Rob Gagnon (rgagnon) 2004-07-17 02:10:54 can you post any cli logging output prior to your crash? or maybe a gdb backtrace. Also, example portion of your extensions.conf file could help By: Olle Johansson (oej) 2004-07-17 02:14:34 Had the same problem on FreeBSD 4.9, seems gone now. pbx_substitute_vars_helper crashed and core dump. By: Olle Johansson (oej) 2004-07-17 02:15:42 ...and could not replicate it on Linux. By: Rob Gagnon (rgagnon) 2004-07-17 02:19:33 OK. So if we can have Oryx report back with his pbx.c CVS version, we can confirm it has been eradicated? Oryx, please do a "cvs status pbx.c" and let us know the version of the file you have By: Konstantin Prokazoff (oryx) 2004-07-17 02:33:31 Thnx for quick answer, have too much working servers on fbsd. Still trying to replicate on rh-7.3. gdb backtrace are: 0x... in pbx_substitute_variables_helper(c=0x...., cp1=0x... "${LEN(${FILENAME})} = 0 ", cp2=0x... "", count=4095) at pbx.c:1079, where row 1079 is memset(var, 0, sizeof(var)). Variable FILENAME undeclared. Here's info from cvs (I have cvs'd yesterday): File: pbx.c Status: Needs Patch Working revision: 1.137 Repository revision: 1.138 Sticky Tag/Date/Options: (none) That's all. Haven't ideas yet, maybe have to sleep some hours ;) By: Konstantin Prokazoff (oryx) 2004-07-17 05:15:41 Maybe stack size/allocation problem? By: Konstantin Prokazoff (oryx) 2004-07-17 09:27:13 So, all works if I change my current script as: s,1,GotoIf($[${LEN(${FILENAME})} = 0]?s,2:s,7) s,2,... s,7,... to s,1,SetVar(FNLEN = LEN(${FILENAME})) s,2,GotoIf($[${FNLEN} = 0]?s,3:s,8) s,3,... s,8,... Trying to find why pbx_substitute_variables_helper can't resolve function name in evaluation. By: Konstantin Prokazoff (oryx) 2004-07-17 09:41:20 This is stack/memory allocation problem. We have minimized allocation of: workspace, ltmp and var from 4096 to 1024. Recursion in pbx_substitute_variables & temp has ran normally. No coredumps. By: Rob Gagnon (rgagnon) 2004-07-17 11:04:12 I've confirmed this does NOT happen on Linux. I am running Fedora Core 1 (2.4.25), and tried it in these ways: - variable not defined at all - variable defined but with no length - variable defined and length>0 Does BSD give any compiler warnings on pbx.c when you make asterisk? By: Konstantin Prokazoff (oryx) 2004-07-17 11:50:03 No warnings at all. I have tried different cflags such -fstack-check, -fno-stack-limit, so on. Have compiled with profiler, tested on SMP and UP. No changes. I think this belong to platform differences. I can't now test on Linux, because this are "central" server, where lives all my services now. I'll test this on Monday, afternoon. But, minimization of local function variables (as described in previous my note), in pbx_substitute_*, ast_log, ast_verbose (so on) gives chances on fbsd to take up to 4-5 levels of recursive calling (haven't tested for diving height ;) ). By: Mark Spencer (markster) 2004-07-17 12:08:50 What is the simplest scenario in which this happens? Maybe I can try running under valgrind on Linux and see if it finds anything suspicious. By: Olle Johansson (oej) 2004-07-19 02:48:51 Happened again: (gdb) bt #0 0x806e1e9 in pbx_substitute_variables_helper (c=0x86c5000, cp1=0x85c2dc0 "Callcounter/${ACCOUNTCODE}-lastnumber=${ARG1}", cp2=0xbfa49cec "", count=8191) at pbx.c:1016 Seems to happen when we are using two variable substitutions in the same exten. Have to test a bit more. By: Konstantin Prokazoff (oryx) 2004-07-19 08:06:12 oej & markster, I still using CVS of 16/07 and trying to catch, where the bug, etc. Simplest code in exten, which a way to coredump on BSD I have wrote as bugnote at 07-17-04, 09:27. So, I suppose, that BSD is more stable & reliable platform than Linux (this is not flame, and not topic to discuss), but I think there are available bug in code (maybe in different part of asterisk core), which is a reason of stack/memory chain corruption, and if asterisk now works fine on Linux, it willn't guarantee, that it will work stable on high load or script height. We need some time, and I think, we can find problem. Now working with local variable sizes set from 4096 to 1024. Having coredumps on high load ;). By: Olle Johansson (oej) 2004-07-19 08:12:53 Still coredumps in the same spot. I am able to use two variables in a setvar or noop without coredumps. This is strange. Can only provoke it to happen in db apps. By: Olle Johansson (oej) 2004-07-19 09:27:20 This construct makes my Asterisk crash. Please test. Seems like calling a macro from a macro eats memory and makes asterisk crash when substituting two variables in the same exten. -------------- #0 0x806e28d in pbx_substitute_variables_helper (c=0x80fa000, cp1=0x83dbf70 "test2: ${ARG1}", cp2=0xbfa38cec "", count=8191) at pbx.c:1024 ----------------- exten => 20034,1,macro(test, HEJ) [macro-test] exten => s,1, noop exten => s,2, noop exten => s,3, noop exten => s,4, noop exten => s,5, noop(${ARG1}) exten => s,6, noop(${ARG1} ${ACCOUNTCODE}) exten => s,7, answer exten => s,8, playback(beep) exten => s,9, noop exten => s,10,macro(test2,${EXTEN}) ; count this call exten => s,11, noop exten => s,12, hangup [macro-test2] exten => s,1, noop exten => s,2, noop exten => s,3, noop exten => s,4, noop exten => s,5, noop(test2: ${ARG1}) exten => s,6, noop(test2: ${ARG1} ${ACCOUNTCODE}) exten => s,7, noop exten => s,8, playback(beep) exten => s,9, noop By: Konstantin Prokazoff (oryx) 2004-07-19 12:43:40 Coredump on (practically) ANY variable substitution on extension processing while active call, try set this in pbx.c, in void pbx_substitute_variables_helper(...): char workspace[32768]; char ltmp[32768], var[32768]; why? By: Konstantin Prokazoff (oryx) 2004-07-19 13:24:41 So, how I understanding, coredump -> MAJOR difference in VM+scheduling+libs(libc_r) in Linux and FreeBSD. I have added into ast_pbx_start(...) such code, after pthread_attr_setdetachstate(...) /because many of threads in asterisk created in detached state/: pthread_attr_setstacksize(&attr, 1048576); All works fine now, with recursive up to 10 and local variables size up to 32k. I think this is a problem, we are overflowing stack at BSD. Can anyone test? By: Olle Johansson (oej) 2004-07-19 13:31:17 Continued testing, changing buffer size to 44096 in pbx_substitute_vars_helper made Asterisk crash more often. It crashed on calling the first macro and the second macro(test2). With 4096 (standard in pbx.c) the second macro (test2) called directly runs ok, but called from the first macro, the second one crashes Asterisk. By: Mark Spencer (markster) 2004-07-19 19:04:40 Does Linux not care about the stack size or what? Why is this FreeBSD specific? By: Konstantin Prokazoff (oryx) 2004-07-20 01:42:53 Mark, Linux haven't native threads. Threads in linux emulated by fork() in library, and fork() provides all resources owned by process (for ex. ulimit -u ;) for childs. So, coredump of asterisk in Linux takes (in my cases) up to 100Mb, in BSD it takes 5-6Mb. BSD by default limits stack size for childs, and we must carefully useses any of child stack (thread) areas in functions (not only incore, but in channels, applications, so on). This is my opinion. Anyone? BR, Oryx. By: Olle Johansson (oej) 2004-07-20 03:57:23 Oryx, you are right. The attached patch solved my problem. This maybe has to be implemented in other places as well. By: Olle Johansson (oej) 2004-07-20 07:59:07 pthread_attr_setstacksize is part of Posix 1.0, but I can't find it on my Linux system. By: Konstantin Prokazoff (oryx) 2004-07-20 08:11:13 Have checked on RH7.3 in /lib/libpthread-0.9.so available and check yours /usr/include/pthread.h, here is must be extern pthread_attr_setstacksize(). So, I finished .diffs for fbsd. Today I recheck & recompile all and will post it here. By: Mark Spencer (markster) 2004-07-20 09:32:34 Why dont' we only do that if it's FreeBSD then? By: Konstantin Prokazoff (oryx) 2004-07-20 09:46:36 I can't understand what your mean. Linux default stack size is: 2093056, fbsd default stack size: 64k (and maybe in all compat. *bsd). By: Olle Johansson (oej) 2004-07-20 09:49:41 Ok, if the default stacksize is 64K it seems that we need to make it higher. But in how many places in the source? And yes, markster, only for FreeBSD so far. However, we will need to be prepared for Mac OS X, NetBSD and OpenBSD joining the crowd. I'll e-mail the BSD list. By: rich (rich) 2004-07-20 11:01:50 Some other ports do it this way on FreeBSD: #ifdef THREAD_STACK_SIZE pthread_attr_setstacksize(&attrs, THREAD_STACK_SIZE); #endif which could be enabled elsewhere using: #include <sys/param.h> #ifdef BSD #define THREAD_STACK_SIZE #endif Great work Oryx! I'm sure this is the root cause for some of the other memory corruption thats been observed. Rich By: Konstantin Prokazoff (oryx) 2004-07-20 13:02:00 Plz. check, one more files. We have patched pbx.c (there are some problems), and file.c (in bsd problem with udp reception). In res_adsi, problem with dial() application, which provides coredump with different equipment, because adsi_available() returns true in 2 cases: ADSI_AVAILABLE & ADSI_UNKNOWN. Plz. check who can. edited on: 07-20-04 12:50 By: rich (rich) 2004-07-20 17:30:13 diffs.txt incorporates pthread-patch-bsd-0.1.tar.gz, lock.h.diff and module.h.diff and the following: Surrounded pthread_attr_setstacksize() calls with #ifdef PTHREAD_ATTR_STACKSIZE. Reduced whitespace changes per existing coding style. changed char var[4096] -> char *var in pbx_substitute_variables_helper() becuase the patch uses malloc() to allocate it. Moved variable definition before function call in autoservice.c for FreeBSD 4.9. Add LIBS+=-pthread -lcrypto in Makefile for OpenBSD. Moved res_parking.c patches to res_features.c. Tested on FreeBSD 5.2.1, 4.9, OpenBSD 3.5 and Redhat 9. Cheers, Rich By: rich (rich) 2004-07-20 21:23:50 I've successfully tested the code with #define PTHREAD_ATTR_STACKSIZE 1048576 on Redhat 9. pthread_attr_setstacksize() is provided in NTPL and LinuxThreads 0.8 and up. I wonder whether it would enhance effectiveness of testing if the various platforms used the same code paths. Consistent code paths makes it easier to reason about bugs. I would propose that the linux platforms set it as well; however, I don't know how to construct an #if conditional that would distinguish older versions, and I don't know whether this might affect redhat 7 or 8. Rich By: Konstantin Prokazoff (oryx) 2004-07-21 04:41:42 in autoservice.diff there are no #ifdef for preprocessor. By: () 2004-07-21 05:07:58 Been testing diffs.txt on FreeBSD this morning and it is far more stable. Chris By: rich (rich) 2004-07-21 10:04:43 diff.autoservice.missing.txt adds the missing #ifdef that Oryx mentioned. By: Olle Johansson (oej) 2004-07-23 02:57:12 These diffs make my Asterisk stable. Time for CVS? By: Konstantin Prokazoff (oryx) 2004-07-23 04:30:12 I think yes, for beginning. ;) Next patches I w'll create in bug note 2112, which will stable scheduler. By: Mark Spencer (markster) 2004-07-23 12:32:43 I do not like using malloc, especially when alloca will work just fine and is substantially more efficient. Why do these patches change alloca calls to malloc? By: rich (rich) 2004-07-23 15:13:55 Per Mark's request, changed two occurrences of 'var = malloc...' to var = alloca...' in pbx_substitute_variables_helper(). There should be plenty of stack for it, and it will core immediately after if allocation fails, so it should be obvious in cases where allocation fails. diff-vs-diff.txt is these changes alone. diff-vs-cvshead.txt is the complete set of revised patches. I'll post another note after retesting the various platforms. By: Rob Gagnon (rgagnon) 2004-07-23 16:01:49 Wouldn't it be nicer to put an if (!var) { ast_log(LOG_ERROR, "Out of memory\n"); // exit as appropriate } piece of code there, rather than just letting it core dump? By: rich (rich) 2004-07-23 16:23:05 I was mistaken in thinking that it was consistent. I'll do what much of the other code does. By: rich (rich) 2004-07-23 19:00:10 diffs-w-alloca-check.txt reports 'Out of memory' if alloca fails, in the same way that much of the other code does. Tested on FreeBSD 5.2.1, 4.9, OpenBSD 3.5 and Redhat 9. By: Mark Spencer (markster) 2004-07-23 23:56:31 Improvement, but.... Why are we setting the stacksize attribute in asterisk.c without checking if it's defined -- and why are we doing it on linux at all? By: rich (rich) 2004-07-24 22:41:04 in thread_stack_diff.txt, lock.h defines ast_pthread_attr_setstacksize to be pthread_attr_setstacksize() on BSD and empty on others so that the thread stacksize attribute is set only on BSD systems (not Linux). This covers the missing check Mark mentioned. I'll be out of town sunday night through the next sunday, so please feel free to alter anything to suit your preferences, esp. while I'm away! And please credit Oryx for this work! By: () 2004-07-29 07:29:59 Any reason why this patch has not been added yet? By: () 2004-07-29 08:22:06 Just tried thread_stack_diff on SUSE 9.0 and had to make #define PTHREAD_ATTR_STACKSIZE 1048576 global to compile and run OK. By: Olle Johansson (oej) 2004-07-29 09:46:04 Can someone tell me which files we can delete and which to keep in this bug report? By: Mark Spencer (markster) 2004-07-29 14:43:17 I just realized my comments from last week never made it in. So, here they are again: 1) We're still setting stack size on linux in asterisk.c. Is it reasonable to just have an ast_thread_new() or something instead of all this stuff? 2) Why use pthread_attr's at all on Linux? 3) What's with the return 0 vs. return -1 in file.c? Is there some kind of a real bug here? 4) In pbx.c, I don't think that the nature of the length argument in ast_get_hint should be changed in such a way as to place the requirement upon the caller to set the size. Why don't we just leave it as is where the function itself interprets the length properly (snprintf model) rather than asking the user to say "-1" all the time (strncpy model)? 5) snprintf does *not* need a -1 on the size 6) I just wish I had a better feel for the variable changes. By: jacs (jacs) 2004-07-30 08:51:23 thread_stack_diff is the current file the rest can go By: rich (rich) 2004-08-08 02:01:51 revised_pthread_stacksize_diff.txt .. contains only the changes for setting thread stacksize on non-linux platforms. All other issues have been separated out so that we can address them individually in subsequent posts. Tested on RH 9, FreeBSD 5.2.1 and OpenBSD 3.5. By: Mark Spencer (markster) 2004-08-08 13:29:20 Fixed in CVS (including changing everything to ast_pthread_create). Please feel free to open a new bug with any other issues. By: Digium Subversion (svnbot) 2008-01-15 15:04:57.000-0600 Repository: asterisk Revision: 3596 U trunk/apps/app_qcall.c U trunk/apps/app_rpt.c U trunk/asterisk.c U trunk/autoservice.c U trunk/channels/chan_alsa.c U trunk/channels/chan_h323.c U trunk/channels/chan_iax.c U trunk/channels/chan_iax2.c U trunk/channels/chan_mgcp.c U trunk/channels/chan_modem.c U trunk/channels/chan_oss.c U trunk/channels/chan_phone.c U trunk/channels/chan_sip.c U trunk/channels/chan_skinny.c U trunk/channels/chan_vofr.c U trunk/channels/chan_vpb.c U trunk/channels/chan_zap.c U trunk/include/asterisk/agi.h U trunk/include/asterisk/lock.h U trunk/include/asterisk/utils.h U trunk/manager.c U trunk/pbx/pbx_gtkconsole.c U trunk/pbx/pbx_spool.c U trunk/pbx/pbx_wilcalu.c U trunk/pbx.c U trunk/res/res_agi.c U trunk/res/res_features.c U trunk/res/res_musiconhold.c U trunk/utils.c ------------------------------------------------------------------------ r3596 | markster | 2008-01-15 15:04:57 -0600 (Tue, 15 Jan 2008) | 2 lines Merge BSD stack size work (bug ASTERISK-2039) ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=3596 |