Summary:ASTERISK-12691: Asterisk crash ast_do_masquerade (segfault at 00000000000000d8 rip 0000003ad54082f9 rsp 0000000040523fb0 error)
Reporter:Geoff Mina (geoff2010)Labels:
Date Opened:2008-09-06 16:20:30Date Closed:2008-09-08 15:53:13
Versions:Frequency of
Environment:Attachments:( 0) backtrace.txt
Description:I currently have 4 asterisk systems which are all experiencing this issue.  I was running and have recently upgraded to the and the bug still exists.

I am currently using FastAGI and AMI to control the calls on our system.  this bug seems to be caused by an AMI redirect to a new context.

I am seeing the following as the last entry in my asterisk log

[Sep  6 16:23:27] WARNING[4122] channel.c: SIP/bw-13d384c0 is already going to masquerade as Local/OFFHOOK@acd_dial_offhook-bd26,1

What I believe leads to the crash is the following:

1 - Issue an manager originate to a Local channel into an "outdial" context.

2 - the "outdial" context dials via SIP through our gateway provider

3 - Upon ANSWER a FastAGI script is executed which notifies a higher level application on another server via TCP that the call was answered.  the higher level application sends a message back to the Java server (on another thread) which issues an AMI redirect to a new context to play "on hold" audio to the freshly generated outbound call.


Java Thread1 - FastAGI Script sends TCP message to server to notify of answer

Java Thread2 - receives the TcP response from server to play a specific "hold" audio file to the newly answered call.  This Issues an AMI redirect to asterisk which causes the FastAGI script in Thread1 to terminate and the call is then moved to the new "hold" context

At some point in the redirect process the system crashes with a segfault.  This happens about every 2 hours when the system is doing a good bit of dialing.



Program terminated with signal 11, Segmentation fault.
#0  0x0000003ad54082f9 in pthread_mutex_lock () from /lib64/libpthread.so.0
(gdb) bt
#0  0x0000003ad54082f9 in pthread_mutex_lock () from /lib64/libpthread.so.0
#1  0x000000000043bed9 in ast_mutex_lock (pmutex=0xc8) at /usr/src/asterisk-
#2  0x000000000044657f in ast_do_masquerade (original=0x13d15b10) at channel.c:3388
#3  0x000000000048af9b in ast_async_goto (chan=0x13d22a20, context=0x40526709 "acd_call_hold", exten=0x40526737 "HOLD", priority=1) at pbx.c:4631
#4  0x000000000047990f in action_redirect (s=0x13d1eb20, m=0x40526c00) at manager.c:1665
ASTERISK-1  0x000000000047bc32 in process_message (s=0x13d1eb20, m=0x40526c00) at manager.c:2205
ASTERISK-2  0x000000000047c14b in do_message (s=0x13d1eb20) at manager.c:2301
ASTERISK-3  0x000000000047c27f in session_do (data=0x13d1eb20) at manager.c:2317
ASTERISK-4  0x00000000004c61ca in dummy_start (data=0x13ceb060) at utils.c:895
ASTERISK-5  0x0000003ad54062f7 in start_thread () from /lib64/libpthread.so.0
ASTERISK-6 0x0000003ad48ce85d in clone () from /lib64/libc.so.6
Comments:By: Geoff Mina (geoff2010) 2008-09-07 06:18:45

After looking through some application logs and analyzing the timing of some things, i believe this is a race condition, although I am not sure where.  Just to clarify.

1 - Originate to Local/dial@outbound
2 - [outbound] does a Dial(SIP/)
3 - [outbound] upon answer does a FastAGI()
4 - FastAGI script sends message to higher level system
5 - higher level system sends message back to Java
6 - Manager Redirect is issued to move the new SIP session away from the first FastAGI script to another FastAGI script which will play some specific hold music

There is < 10ms between steps 4 and 6.  When the first AGI script is called, the channeId supplied is still the Local/ channel, so I am thinking that the rename hasn't been completed yet.  The manager redirect is issued on the SIP/ channel and not the Local/ channel.

It would appear that most likely the masquerade is already in progress when i issue a redirect which appears to trigger another masquerade which causes the crash.

I may be totally off base as I am certainly no * expert, this is just my guess.


By: Digium Subversion (svnbot) 2008-09-08 15:53:10

Repository: asterisk
Revision: 141806

U   branches/1.4/main/pbx.c

r141806 | russell | 2008-09-08 15:53:10 -0500 (Mon, 08 Sep 2008) | 7 lines

When doing an async goto, detect if the channel is already in the middle of a
masquerade.  This can happen when chan_local is trying to optimize itself out.
If this happens, fail the async goto instead of bursting into flames.

(closes issue ASTERISK-12691)
Reported by: geoff2010