Summary: | ASTERISK-15782: [patch] [regression] Segfault when hanging up phone after launching app_confbridge on Solaris 10 x86 | ||
Reporter: | Robert McGilvray (rmcgilvr) | Labels: | |
Date Opened: | 2010-03-09 14:42:18.000-0600 | Date Closed: | 2010-06-09 15:51:11 |
Priority: | Critical | Regression? | No |
Status: | Closed/Complete | Components: | Applications/app_confbridge |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) 20100526__issue17000.diff.txt ( 1) backtrace.txt ( 2) gdb.txt ( 3) mmlog.txt | |
Description: | Dead air when dialing into an app_confbridge extension. CLI output reports playing the "conf-placeintoconf" file but you don't hear the audio. Upon hanging up the phone Asterisk segfaults *everytime*. *CLI> -- Executing [3505@default:1] Goto("SIP/us.sip.globeop.com-00000000", "gocfb-main,s,1") in new stack -- Goto (gocfb-main,s,1) -- Executing [s@gocfb-main:1] Answer("SIP/us.sip.globeop.com-00000000", "") in new stack -- Executing [s@gocfb-main:2] NoOp("SIP/us.sip.globeop.com-00000000", ""Call was answered\n"") in new stack -- Executing [s@gocfb-main:3] ConfBridge("SIP/us.sip.globeop.com-00000000", "1000,aAcM,614084") in new stack -- <Bridge/8555ed0-input> Playing 'conf-placeintoconf.slin16' (language 'en') Segmentation Fault (core dumped) Dialplan is basic. context default { 3505 => { goto gocfb-main|s|1; } } context gocfb-main { s => { Answer(); NoOp("Call was answered\n"); ConfBridge(1000,aAcM,614084); } } ****** STEPS TO REPRODUCE ****** Install 1.6.2.X on Solaris 10 x86. Create a simple dialplan that launches ConfBridge. Call into bridge extension. Hangup phone Segfault. ****** ADDITIONAL INFORMATION ****** Asterisk 1.6.2.4 - I tried the latest SVN (Revision: 251408) but that appears to have a chan_sip bug. CLI output: *CLI> [Mar 9 19:53:31] WARNING[10486]: chan_sip.c:20343 handle_request_invite: Don't know how to handle INVITE in state 15 [Mar 9 19:53:38] WARNING[10486]: channel.c:1066 __ast_queue_frame: Unable to write to alert pipe on SIP/172.30.30.18-00000000 (qlen = 0): Bad file number! Asterisk launched with: /d1/asterisk/asterisk-1.6.2.4/sbin/asterisk -U root -G root -I -C /d1/asterisk/asterisk-1.6.2.4/etc/asterisk.conf -vvvvvvvg -c -dfgn GCC info:Configured with: /builds/sfw10-gate/usr/src/cmd/gcc/gcc-3.4.3/configure --prefix=/usr/sfw --with-as=/usr/sfw/bin/gas --with-gnu-as --with-ld=/usr/ccs/bin/ld --without-gnu-ld --enable-languages=c,c++ --enable-shared Thread model: posix gcc version 3.4.3 (csl-sol210-3_4-branch+sol_rpath) Phone used to initiate call is registered to Kamailio, not directly to Asterisk. SIP trace looks normal before the segfault. | ||
Comments: | By: Robert McGilvray (rmcgilvr) 2010-03-09 15:22:28.000-0600 asterisk-1.6.2.5 has the same app_confbridge segfault. asterisk-1.6.2.6-rc2 suffers from the same SIP INVITE errors that I experienced with svn revsion 251408. *CLI> [Mar 9 21:12:56] WARNING[7433]: chan_sip.c:20343 handle_request_invite: Don't know how to handle INVITE in state 15 *CLI> [Mar 9 21:13:05] WARNING[7433]: channel.c:1066 __ast_queue_frame: Unable to write to alert pipe on SIP/us.sip.globeop.com-00000000 (qlen = 0): Bad file number! Since I can't get past this error to get a call into app_confbridge I am unable to test anything newer than 1.6.2.5. By: Tilghman Lesher (tilghman) 2010-04-27 15:18:35 Please checkout http://svn.digium.com/svn/asterisk/team/tilghman/malloc_hold/1.6.2 , enable Compiler Options --> MALLOC_DEBUG, compile, and install. After reproducing the situation that causes the crash in 1.6.2, please upload the mmlog file in your Asterisk logs directory. If this also crashed, please upload the backtrace. By: Robert McGilvray (rmcgilvr) 2010-04-27 15:55:08 Getting a compiler error on the above <snip> [CC] alaw.c -> alaw.o [CC] app.c -> app.o [CC] ast_expr2.c -> ast_expr2.o [CC] ast_expr2f.c -> ast_expr2f.o [CC] asterisk.c -> asterisk.o [CC] astfd.c -> astfd.o [CC] astmm.c -> astmm.o In file included from astmm.c:38: /usr/include/malloc.h:46: error: syntax error before string constant /usr/include/malloc.h:47: error: syntax error before string constant /usr/include/malloc.h:48: error: syntax error before string constant /usr/include/malloc.h:51: error: syntax error before string constant gmake[1]: *** [astmm.o] Error 1 gmake: *** [main] Error 2 the lines in malloc.h it's complaing about contain: void *malloc(size_t); void free(void *); void *realloc(void *, size_t); and void *calloc(size_t, size_t); By: Tilghman Lesher (tilghman) 2010-04-28 02:43:12 Update your checkout and try again. This has been fixed in the latest branch. By: Robert McGilvray (rmcgilvr) 2010-04-28 07:19:41 Same crash, backtrace and mmlog uploaded. The mmlog is empty though. It was compiled with MALLOC_DEBUG as you requested and when Asterisk starts I see it starts the debugger with the mmlog file. By: Robert McGilvray (rmcgilvr) 2010-05-20 15:33:05 Any update on this issue? Thanks! By: Tilghman Lesher (tilghman) 2010-05-24 13:07:48 You'll need to provide more information on how to reproduce this issue: *CLI> -- Executing [8175@digium:1] Answer("SIP/gadolinium-00000000", "") in new stack -- Executing [8175@digium:2] ConfBridge("SIP/gadolinium-00000000", "1234,aAcM") in new stack -- <Bridge/97195c0-input> Playing 'conf-placeintoconf.slin16' (language 'en') [May 14 17:16:20] WARNING[7264]: cdr.c:891 ast_cdr_end: CDR on channel 'Bridge/97195c0-output' has no answer time but is 'ANSWERED' [May 14 17:16:20] WARNING[7264]: cdr.c:891 ast_cdr_end: CDR on channel 'Bridge/97195c0-input' has no answer time but is 'ANSWERED' *CLI> !uname -a SunOS dot100.jeffandtilghman.foo 5.10 Generic_141445-09 i86pc i386 i86pc *CLI> By: Robert McGilvray (rmcgilvr) 2010-05-26 13:08:17 Interesting. Can you provide some guidance on the types on things you're looking for? The scenario under which it fails for me is as basic as you can get. I have a default installation with the dialplan listed above and regardless of what device I use to dial in it always segfaults. Are there any specific debugs, compiler options or library versions that may be helpful? I just discovered that during the deadair time if I dial in with another device the conference DOES work, the audio path is established across both devices. Hanging up one phone doesn't cause a crash but the second one does. I never hear the joined-into-conf file though. By: Tilghman Lesher (tilghman) 2010-05-26 13:33:48 To be fair, I'm using the most recent branch of 1.6.2. It's possible you're running into a problem that has already been fixed, so I'd encourage you to update to the 1.6.2 branch from SVN and try to reproduce your problem. By: Robert McGilvray (rmcgilvr) 2010-05-26 13:37:46 I just did that today hoping it was. No luck. I'll try to reproduce what you have on your system that's working. Can you give me the gcc version and anything else relevant to the build? By: Tilghman Lesher (tilghman) 2010-05-26 14:18:14 I just remembered something after I looked back at your backtrace, and I believe the crash you're seeing is related to how Solaris processes signals. So I changed Asterisk from using signal(2) to using sigaction(2). This may fix the original crash that you were having. By: Robert McGilvray (rmcgilvr) 2010-05-26 14:54:59 What branch did you make the change in? I tried malloc_hold from above and it crashes at the same point. I'll upload a new backtrace but I want to make sure I'm using the right code. By: Robert McGilvray (rmcgilvr) 2010-05-26 14:57:34 Oops didn't see the patch. I'll give it a shot. By: Robert McGilvray (rmcgilvr) 2010-05-26 15:38:33 Your patch fixed the crash. There is still a problem with the audio path from ConfBridge though. To make sure it wasn't an IP problem I switched my dialplan from ConfBridge(1000,aAcM) to Playback(conf-placeintoconf); and I heard the file fine using Playback. watching snoop on the interface I don't see normal RTP leaving the box. During Playback I see significant amounts of 180 byte packets but with ConfBridge I see just a few with a 52 so it doesn't look like Asterisk is even sending the audio. Joining a second phone into the conference starts the RTP stream. By: Digium Subversion (svnbot) 2010-05-26 16:11:44 Repository: asterisk Revision: 266142 U branches/1.4/main/asterisk.c U branches/1.4/main/logger.c ------------------------------------------------------------------------ r266142 | tilghman | 2010-05-26 16:11:43 -0500 (Wed, 26 May 2010) | 14 lines Use sigaction for signals which should persist past the initial trigger, not signal. If you call signal() in a Solaris signal handler, instead of just resetting the signal handler, it causes the signal to refire, because the signal is not marked as handled prior to the signal handler being called. This effectively causes Solaris to immediately exceed the threadstack in recursive signal handlers and crash. (closes issue ASTERISK-15782) Reported by: rmcgilvr Patches: 20100526__issue17000.diff.txt uploaded by tilghman (license 14) Tested by: rmcgilvr ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=266142 By: Digium Subversion (svnbot) 2010-05-26 16:17:46 Repository: asterisk Revision: 266146 _U trunk/ U trunk/main/asterisk.c U trunk/main/logger.c U trunk/utils/extconf.c ------------------------------------------------------------------------ r266146 | tilghman | 2010-05-26 16:17:46 -0500 (Wed, 26 May 2010) | 21 lines Merged revisions 266142 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r266142 | tilghman | 2010-05-26 16:11:44 -0500 (Wed, 26 May 2010) | 14 lines Use sigaction for signals which should persist past the initial trigger, not signal. If you call signal() in a Solaris signal handler, instead of just resetting the signal handler, it causes the signal to refire, because the signal is not marked as handled prior to the signal handler being called. This effectively causes Solaris to immediately exceed the threadstack in recursive signal handlers and crash. (closes issue ASTERISK-15782) Reported by: rmcgilvr Patches: 20100526__issue17000.diff.txt uploaded by tilghman (license 14) Tested by: rmcgilvr ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=266146 By: Digium Subversion (svnbot) 2010-05-26 16:19:49 Repository: asterisk Revision: 266154 _U branches/1.6.2/ U branches/1.6.2/main/asterisk.c U branches/1.6.2/main/logger.c U branches/1.6.2/utils/extconf.c ------------------------------------------------------------------------ r266154 | tilghman | 2010-05-26 16:19:49 -0500 (Wed, 26 May 2010) | 28 lines Merged revisions 266146 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ................ r266146 | tilghman | 2010-05-26 16:17:46 -0500 (Wed, 26 May 2010) | 21 lines Merged revisions 266142 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r266142 | tilghman | 2010-05-26 16:11:44 -0500 (Wed, 26 May 2010) | 14 lines Use sigaction for signals which should persist past the initial trigger, not signal. If you call signal() in a Solaris signal handler, instead of just resetting the signal handler, it causes the signal to refire, because the signal is not marked as handled prior to the signal handler being called. This effectively causes Solaris to immediately exceed the threadstack in recursive signal handlers and crash. (closes issue ASTERISK-15782) Reported by: rmcgilvr Patches: 20100526__issue17000.diff.txt uploaded by tilghman (license 14) Tested by: rmcgilvr ........ ................ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=266154 By: Digium Subversion (svnbot) 2010-05-26 16:20:19 Repository: menuselect Revision: 766 U trunk/menuselect.c U trunk/menuselect_curses.c ------------------------------------------------------------------------ r766 | tilghman | 2010-05-26 16:20:18 -0500 (Wed, 26 May 2010) | 28 lines Merged revisions 266146 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ................ r266146 | tilghman | 2010-05-26 16:17:46 -0500 (Wed, 26 May 2010) | 21 lines Merged revisions 266142 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r266142 | tilghman | 2010-05-26 16:11:44 -0500 (Wed, 26 May 2010) | 14 lines Use sigaction for signals which should persist past the initial trigger, not signal. If you call signal() in a Solaris signal handler, instead of just resetting the signal handler, it causes the signal to refire, because the signal is not marked as handled prior to the signal handler being called. This effectively causes Solaris to immediately exceed the threadstack in recursive signal handlers and crash. (closes issue ASTERISK-15782) Reported by: rmcgilvr Patches: 20100526__issue17000.diff.txt uploaded by tilghman (license 14) Tested by: rmcgilvr ........ ................ ------------------------------------------------------------------------ http://svn.digium.com/view/menuselect?view=rev&revision=766 |