ASTERISK-23248: Asterisk 1.8.23.0 Crashes Signal 11/Segfault

[Home]

Summary: ASTERISK-23248: Asterisk 1.8.23.0 Crashes Signal 11/Segfault

Reporter: Jamuel Starkey (jamuel) Labels:

Date Opened: 2014-02-03 14:21:10.000-0600 Date Closed: 2014-02-12 22:48:16.000-0600

Priority: Major Regression?

Status: Closed/Complete Components:

Versions: 1.8.23.0 Frequency of
Occurrence One Time

Related
Issues:

Environment: Linux 32-bit Cent OS 5.9 VM on XenServer 5.6 SP2 Attachments: ( 0) ASTERISK-23248_backtrace.txt
( 1) cdr_adaptive_odbc.conf
( 2) cdr.conf
( 3) sip_config.txt

Description: Saw this crash. Have backtrace with thread debugging enabled. Guessing that it occurred during a hangup or CDR manipulation as that's all I see in the backtrace.

Comments: By: Jamuel Starkey (jamuel) 2014-02-03 14:23:06.130-0600

Backtrace including thread debugging.
By: Jamuel Starkey (jamuel) 2014-02-03 14:23:23.038-0600

Backtrace attached.
By: Matt Jordan (mjordan) 2014-02-03 14:57:27.047-0600

That's exceedingly odd. Can you attach the relevant portion of your dialplan, in particular, the hangup-call macro - as well as whatever calls it?
By: Jamuel Starkey (jamuel) 2014-02-03 15:39:20.855-0600

Here is the log just before the crash:
{code}
[Feb 3 11:42:32] VERBOSE[1632] res_musiconhold.c: -- Stopped music on hold on SIP/XXXXXXXX-0000347b
[Feb 3 11:42:32] VERBOSE[1632] features.c: == SIP/XXXXXXXX-0000347b got tired of being parked
[Feb 3 11:42:32] VERBOSE[31045] pbx.c: == Starting SIP/XXXXXXXX-0000347e at from-internal,*664,1 failed so falling back to exten 's'
[Feb 3 11:42:32] VERBOSE[31045] pbx.c: -- Executing [s@from-internal:1] Macro("SIP/XXXXXXXX-0000347e", "hangupcall") in new stack
[Feb 3 11:42:32] VERBOSE[31045] pbx.c: -- Executing [s@macro-hangupcall:1] Set("SIP/XXXXXXXX-0000347e", "CDR(userfield)=") in new stack
[Feb 3 11:42:32] VERBOSE[31045] pbx.c: -- Executing [s@macro-hangupcall:2] ResetCDR("SIP/XXXXXXXX-0000347e", "w") in new stack
{code}

And from-internal s extension:
{code}
exten => s,1,Macro(hangupcall)
{code}

And here's the hangupcall macro:
{code}
[macro-hangupcall]
exten => s,1,Set(CDR(userfield)=${FROM_DID})
exten => s,n,ResetCDR(w)
exten => s,n,NoCDR()
{code}
By: Jamuel Starkey (jamuel) 2014-02-03 15:40:17.473-0600

Comment added with relevant snippets of dialplan and log.
By: Rusty Newton (rnewton) 2014-02-10 09:35:42.044-0600

Jamuel. I can't reproduce when trying calls a few ways and having them run through the dialplan shown. Are you able to reproduce the issue, or did you only see the crash that one time?
By: Rusty Newton (rnewton) 2014-02-10 09:37:27.935-0600

Can you also post your cdr.conf and scrubbed sip.conf ? Attach as .txt files to the issue.
By: Jamuel Starkey (jamuel) 2014-02-11 17:50:23.874-0600

sip_config.txt has been culled from the various files-based sip configuration files present in freepbx.

We use CDR Adaptive ODBC for CDRs so both it and cdr.conf have been attached as well.
By: Jamuel Starkey (jamuel) 2014-02-11 17:51:43.004-0600

We have only seen this issue occur once. From the asterisk log it looks like the AMI process transferred a call to a non-existent portion of the dialplan which was immediately fixed.
By: Matt Jordan (mjordan) 2014-02-12 22:48:08.992-0600

So, I just noticed this in the backtrace:

{noformat}
Thread 2 (Thread 31029):
#0 0x0060e402 in __kernel_vsyscall ()
#1 0x00306653 in fts_read () from /lib/i686/nosegneg/libc.so.6
#2 0x080ae17b in ast_waitfor_nandfds (c=0x24baf10, n=1, fds=0x0, nfds=0, exception=0x0, outfd=0x0, ms=0x24baf14) at channel.c:3271
#3 0x080ae5bc in ast_waitfor (c=0xfe5e3c8, ms=29948) at channel.c:3536
#4 0x080b4e85 in __ast_request_and_dial (type=0x127a437c "SIP", format=64, requestor=0x0, data=0x127a4382, timeout=30000, outstate=0x24bb2ec,
cid_num=0x127a4392 "4152000401", cid_name=0x127a43a0 "Calling", oh=0x24bb184) at channel.c:5494
#5 0x08157080 in ast_pbx_outgoing_exten (type=0x127a437c "SIP", format=64, data=0x127a4382, timeout=30000, context=0x127a43aa "from-internal",
exten=0x127a43ba "*664", priority=1, reason=0x24bb2ec, synchronous=1, cid_num=0x127a4392 "4152000401", cid_name=0x127a43a0 "Calling", vars=0x1240fc60,
account=0x82386e6 "", channel=0x24bb2e8) at pbx.c:9065
#6 0x0812ddae in fast_originate (data=0xf728b58) at manager.c:3833
#7 0x081a0617 in dummy_start (data=0x11102e78) at utils.c:1075
#8 0x003e7939 in start_thread () from /lib/i686/nosegneg/libpthread.so.0
#9 0x003108ae in __init_misc () from /lib/i686/nosegneg/libc.so.6
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
{noformat}

Note the {{Backtrace stopped: previous frame inner to this frame (corrupt stack?)}}. This is noted on all of the stack frames.

There's a number of ways it could have gotten in this state, but none are trivial to ascertain, even with the core file - which would be needed. It's also highly unlikely that anyone will be able to reproduce this.

See [http://stackoverflow.com/questions/9809810/gdb-corrupted-stack-frame-how-to-debug] for some information on pulling data out of the {{core}} file for this.

Although it's hard to point a finger at anything in particular, I'm suspicious of the Macros you were using here. The fact that you have what appears to be a corrupted stack - and that the memory was smashed - makes me wonder if you didn't run afoul of the Macro stack smashing problem. You certainly have some nested Macros here:

{noformat}
#12 0x0462c405 in _macro_exec (chan=0x12aa0468, data=0x2693eb8 "dial,15,TtrWwI,4154944930", exclusive=0) at app_macro.c:413
#13 0x0462d65e in macro_exec (chan=0x12aa0468, data=0x2693eb8 "dial,15,TtrWwI,4154944930") at app_macro.c:586
#14 0x0813e52f in pbx_exec (c=0x12aa0468, app=0xf8fd8a8, data=0x2693eb8 "dial,15,TtrWwI,4154944930") at pbx.c:1446
#15 0x08147cf5 in pbx_extension_helper (c=0x12aa0468, con=0x0, context=0x12aa07d4 "macro-dial", exten=0x12aa0824 "s", priority=11, label=0x0,
callerid=0xff6b1d8 "+17202047063", action=E_SPAWN, found=0x269627c, combined_find_spawn=1) at pbx.c:4489
#16 0x0814998b in ast_spawn_extension (c=0x12aa0468, context=0x12aa07d4 "macro-dial", exten=0x12aa0824 "s", priority=11, callerid=0xff6b1d8 "+17202047063",
found=0x269627c, combined_find_spawn=1) at pbx.c:5127
#17 0x0462c405 in _macro_exec (chan=0x12aa0468, data=0x2698d28 "exten-vm,4154944930,4154944930", exclusive=0) at app_macro.c:413
#18 0x0462d65e in macro_exec (chan=0x12aa0468, data=0x2698d28 "exten-vm,4154944930,4154944930") at app_macro.c:586
#19 0x0813e52f in pbx_exec (c=0x12aa0468, app=0xf8fd8a8, data=0x2698d28 "exten-vm,4154944930,4154944930") at pbx.c:1446
{noformat}

I'd change your Macros to Gosubs and see if that prevents the problem from cropping back up.

Your other option would be to run under valgrind.

Since it is nearly impossible for a bug marshal to reproduce this problem, I'm going to go ahead and close this out as "Can't Reproduce". If you're able to get some more meaningful data - either from valgrind or from the core file - let a bug marshal know and we can reopen this issue.