[Home]

Summary:ASTERISK-15782: [patch] [regression] Segfault when hanging up phone after launching app_confbridge on Solaris 10 x86
Reporter:Robert McGilvray (rmcgilvr)Labels:
Date Opened:2010-03-09 14:42:18.000-0600Date Closed:2010-06-09 15:51:11
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Applications/app_confbridge
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 20100526__issue17000.diff.txt
( 1) backtrace.txt
( 2) gdb.txt
( 3) mmlog.txt
Description:Dead air when dialing into an app_confbridge extension. CLI output reports playing the "conf-placeintoconf" file but you don't hear the audio. Upon hanging up the phone Asterisk segfaults *everytime*.

*CLI>     -- Executing [3505@default:1] Goto("SIP/us.sip.globeop.com-00000000", "gocfb-main,s,1") in new stack
   -- Goto (gocfb-main,s,1)
   -- Executing [s@gocfb-main:1] Answer("SIP/us.sip.globeop.com-00000000", "") in new stack
   -- Executing [s@gocfb-main:2] NoOp("SIP/us.sip.globeop.com-00000000", ""Call was answered\n"") in new stack
   -- Executing [s@gocfb-main:3] ConfBridge("SIP/us.sip.globeop.com-00000000", "1000,aAcM,614084") in new stack
   -- <Bridge/8555ed0-input> Playing 'conf-placeintoconf.slin16' (language 'en')
Segmentation Fault (core dumped)


Dialplan is basic.

context default {

       3505 => {
               goto gocfb-main|s|1;
       }
}

context gocfb-main {

       s => {

               Answer();
               NoOp("Call was answered\n");
               ConfBridge(1000,aAcM,614084);
       }
}



****** STEPS TO REPRODUCE ******

Install 1.6.2.X on Solaris 10 x86.
Create a simple dialplan that launches ConfBridge.
Call into bridge extension.
Hangup phone
Segfault.


****** ADDITIONAL INFORMATION ******

Asterisk 1.6.2.4 - I tried the latest SVN (Revision: 251408) but that appears to have a chan_sip bug. CLI output: *CLI> [Mar  9 19:53:31] WARNING[10486]: chan_sip.c:20343 handle_request_invite: Don't know how to handle INVITE in state 15
[Mar  9 19:53:38] WARNING[10486]: channel.c:1066 __ast_queue_frame: Unable to write to alert pipe on SIP/172.30.30.18-00000000 (qlen = 0): Bad file number!

Asterisk launched with: /d1/asterisk/asterisk-1.6.2.4/sbin/asterisk -U root -G root -I -C /d1/asterisk/asterisk-1.6.2.4/etc/asterisk.conf -vvvvvvvg -c -dfgn

GCC info:Configured with: /builds/sfw10-gate/usr/src/cmd/gcc/gcc-3.4.3/configure --prefix=/usr/sfw --with-as=/usr/sfw/bin/gas --with-gnu-as --with-ld=/usr/ccs/bin/ld --without-gnu-ld --enable-languages=c,c++ --enable-shared
Thread model: posix
gcc version 3.4.3 (csl-sol210-3_4-branch+sol_rpath)

Phone used to initiate call is registered to Kamailio, not directly to Asterisk. SIP trace looks normal before the segfault.
Comments:By: Robert McGilvray (rmcgilvr) 2010-03-09 15:22:28.000-0600

asterisk-1.6.2.5 has the same app_confbridge segfault.

asterisk-1.6.2.6-rc2 suffers from the same SIP INVITE errors that I experienced with svn revsion 251408.

*CLI> [Mar  9 21:12:56] WARNING[7433]: chan_sip.c:20343 handle_request_invite: Don't know how to handle INVITE in state 15
*CLI> [Mar  9 21:13:05] WARNING[7433]: channel.c:1066 __ast_queue_frame: Unable to write to alert pipe on SIP/us.sip.globeop.com-00000000 (qlen = 0): Bad file number!

Since I can't get past this error to get a call into app_confbridge I am unable to test anything newer than 1.6.2.5.

By: Tilghman Lesher (tilghman) 2010-04-27 15:18:35

Please checkout http://svn.digium.com/svn/asterisk/team/tilghman/malloc_hold/1.6.2 , enable Compiler Options --> MALLOC_DEBUG, compile, and install.  After reproducing the situation that causes the crash in 1.6.2, please upload the mmlog file in your Asterisk logs directory.  If this also crashed, please upload the backtrace.

By: Robert McGilvray (rmcgilvr) 2010-04-27 15:55:08

Getting a compiler error on the above

<snip>

  [CC] alaw.c -> alaw.o
  [CC] app.c -> app.o
  [CC] ast_expr2.c -> ast_expr2.o
  [CC] ast_expr2f.c -> ast_expr2f.o
  [CC] asterisk.c -> asterisk.o
  [CC] astfd.c -> astfd.o
  [CC] astmm.c -> astmm.o
In file included from astmm.c:38:
/usr/include/malloc.h:46: error: syntax error before string constant
/usr/include/malloc.h:47: error: syntax error before string constant
/usr/include/malloc.h:48: error: syntax error before string constant
/usr/include/malloc.h:51: error: syntax error before string constant
gmake[1]: *** [astmm.o] Error 1
gmake: *** [main] Error 2

the lines in malloc.h it's complaing about contain:
void *malloc(size_t);
void free(void *);
void *realloc(void *, size_t);
and
void *calloc(size_t, size_t);

By: Tilghman Lesher (tilghman) 2010-04-28 02:43:12

Update your checkout and try again.  This has been fixed in the latest branch.

By: Robert McGilvray (rmcgilvr) 2010-04-28 07:19:41

Same crash, backtrace and mmlog uploaded. The mmlog is empty though. It was compiled with MALLOC_DEBUG as you requested and when Asterisk starts I see it starts the debugger with the mmlog file.

By: Robert McGilvray (rmcgilvr) 2010-05-20 15:33:05

Any update on this issue? Thanks!

By: Tilghman Lesher (tilghman) 2010-05-24 13:07:48

You'll need to provide more information on how to reproduce this issue:

*CLI>     -- Executing [8175@digium:1] Answer("SIP/gadolinium-00000000", "") in new stack
   -- Executing [8175@digium:2] ConfBridge("SIP/gadolinium-00000000", "1234,aAcM") in new stack
   -- <Bridge/97195c0-input> Playing 'conf-placeintoconf.slin16' (language 'en')
[May 14 17:16:20] WARNING[7264]: cdr.c:891 ast_cdr_end: CDR on channel 'Bridge/97195c0-output' has no answer time but is 'ANSWERED'
[May 14 17:16:20] WARNING[7264]: cdr.c:891 ast_cdr_end: CDR on channel 'Bridge/97195c0-input' has no answer time but is 'ANSWERED'

*CLI> !uname -a
SunOS dot100.jeffandtilghman.foo 5.10 Generic_141445-09 i86pc i386 i86pc
*CLI>

By: Robert McGilvray (rmcgilvr) 2010-05-26 13:08:17

Interesting. Can you provide some guidance on the types on things you're looking for? The scenario under which it fails for me is as basic as you can get. I have a default installation with the dialplan listed above and regardless of what device I use to dial in it always segfaults. Are there any specific debugs, compiler options or library versions that may be helpful?

I just discovered that during the deadair time if I dial in with another device the conference DOES work, the audio path is established across both devices. Hanging up one phone doesn't cause a crash but the second one does. I never hear the joined-into-conf file though.

By: Tilghman Lesher (tilghman) 2010-05-26 13:33:48

To be fair, I'm using the most recent branch of 1.6.2.  It's possible you're running into a problem that has already been fixed, so I'd encourage you to update to the 1.6.2 branch from SVN and try to reproduce your problem.

By: Robert McGilvray (rmcgilvr) 2010-05-26 13:37:46

I just did that today hoping it was. No luck.

I'll try to reproduce what you have on your system that's working. Can you give me the gcc version and anything else relevant to the build?

By: Tilghman Lesher (tilghman) 2010-05-26 14:18:14

I just remembered something after I looked back at your backtrace, and I believe the crash you're seeing is related to how Solaris processes signals.  So I changed Asterisk from using signal(2) to using sigaction(2).  This may fix the original crash that you were having.

By: Robert McGilvray (rmcgilvr) 2010-05-26 14:54:59

What branch did you make the change in? I tried malloc_hold from above and it crashes at the same point. I'll upload a new backtrace but I want to make sure I'm using the right code.

By: Robert McGilvray (rmcgilvr) 2010-05-26 14:57:34

Oops didn't see the patch. I'll give it a shot.

By: Robert McGilvray (rmcgilvr) 2010-05-26 15:38:33

Your patch fixed the crash. There is still a problem with the audio path from ConfBridge though. To make sure it wasn't an IP problem I switched my dialplan from ConfBridge(1000,aAcM) to Playback(conf-placeintoconf); and I heard the file fine using Playback.

watching snoop on the interface I don't see normal RTP leaving the box. During Playback I see significant amounts of 180 byte packets but with ConfBridge I see just a few with a 52 so it doesn't look like Asterisk is even sending the audio. Joining a second phone into the conference starts the RTP stream.

By: Digium Subversion (svnbot) 2010-05-26 16:11:44

Repository: asterisk
Revision: 266142

U   branches/1.4/main/asterisk.c
U   branches/1.4/main/logger.c

------------------------------------------------------------------------
r266142 | tilghman | 2010-05-26 16:11:43 -0500 (Wed, 26 May 2010) | 14 lines

Use sigaction for signals which should persist past the initial trigger, not signal.

If you call signal() in a Solaris signal handler, instead of just resetting
the signal handler, it causes the signal to refire, because the signal is not
marked as handled prior to the signal handler being called.  This effectively
causes Solaris to immediately exceed the threadstack in recursive signal
handlers and crash.

(closes issue ASTERISK-15782)
Reported by: rmcgilvr
Patches:
      20100526__issue17000.diff.txt uploaded by tilghman (license 14)
Tested by: rmcgilvr

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=266142

By: Digium Subversion (svnbot) 2010-05-26 16:17:46

Repository: asterisk
Revision: 266146

_U  trunk/
U   trunk/main/asterisk.c
U   trunk/main/logger.c
U   trunk/utils/extconf.c

------------------------------------------------------------------------
r266146 | tilghman | 2010-05-26 16:17:46 -0500 (Wed, 26 May 2010) | 21 lines

Merged revisions 266142 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
 r266142 | tilghman | 2010-05-26 16:11:44 -0500 (Wed, 26 May 2010) | 14 lines
 
 Use sigaction for signals which should persist past the initial trigger, not signal.
 
 If you call signal() in a Solaris signal handler, instead of just resetting
 the signal handler, it causes the signal to refire, because the signal is not
 marked as handled prior to the signal handler being called.  This effectively
 causes Solaris to immediately exceed the threadstack in recursive signal
 handlers and crash.
 
 (closes issue ASTERISK-15782)
  Reported by: rmcgilvr
  Patches:
        20100526__issue17000.diff.txt uploaded by tilghman (license 14)
  Tested by: rmcgilvr
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=266146

By: Digium Subversion (svnbot) 2010-05-26 16:19:49

Repository: asterisk
Revision: 266154

_U  branches/1.6.2/
U   branches/1.6.2/main/asterisk.c
U   branches/1.6.2/main/logger.c
U   branches/1.6.2/utils/extconf.c

------------------------------------------------------------------------
r266154 | tilghman | 2010-05-26 16:19:49 -0500 (Wed, 26 May 2010) | 28 lines

Merged revisions 266146 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
 r266146 | tilghman | 2010-05-26 16:17:46 -0500 (Wed, 26 May 2010) | 21 lines
 
 Merged revisions 266142 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
   r266142 | tilghman | 2010-05-26 16:11:44 -0500 (Wed, 26 May 2010) | 14 lines
   
   Use sigaction for signals which should persist past the initial trigger, not signal.
   
   If you call signal() in a Solaris signal handler, instead of just resetting
   the signal handler, it causes the signal to refire, because the signal is not
   marked as handled prior to the signal handler being called.  This effectively
   causes Solaris to immediately exceed the threadstack in recursive signal
   handlers and crash.
   
   (closes issue ASTERISK-15782)
    Reported by: rmcgilvr
    Patches:
          20100526__issue17000.diff.txt uploaded by tilghman (license 14)
    Tested by: rmcgilvr
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=266154

By: Digium Subversion (svnbot) 2010-05-26 16:20:19

Repository: menuselect
Revision: 766

U   trunk/menuselect.c
U   trunk/menuselect_curses.c

------------------------------------------------------------------------
r766 | tilghman | 2010-05-26 16:20:18 -0500 (Wed, 26 May 2010) | 28 lines

Merged revisions 266146 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
 r266146 | tilghman | 2010-05-26 16:17:46 -0500 (Wed, 26 May 2010) | 21 lines
 
 Merged revisions 266142 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
   r266142 | tilghman | 2010-05-26 16:11:44 -0500 (Wed, 26 May 2010) | 14 lines
   
   Use sigaction for signals which should persist past the initial trigger, not signal.
   
   If you call signal() in a Solaris signal handler, instead of just resetting
   the signal handler, it causes the signal to refire, because the signal is not
   marked as handled prior to the signal handler being called.  This effectively
   causes Solaris to immediately exceed the threadstack in recursive signal
   handlers and crash.
   
   (closes issue ASTERISK-15782)
    Reported by: rmcgilvr
    Patches:
          20100526__issue17000.diff.txt uploaded by tilghman (license 14)
    Tested by: rmcgilvr
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/menuselect?view=rev&revision=766