Summary:ASTERISK-12611: Crash in chanspy mutex
Reporter:Ruddy G (ruddy)Labels:
Date Opened:2008-08-18 15:18:33Date Closed:2008-08-20 17:05:47
Versions:Frequency of
Environment:Attachments:( 0) valgrind.txt
Description:I'm having a crash on chanspy. And it seems to be in the mutex.
It was specific to a groug. None of the channels was in that group.

Here is the dialplan :
exten => 0000,1,ChanSpy(|g(XXXXXXXX_SPY)q)

Here is the backtrace :

(gdb) bt full
#0  0x08081fa0 in ast_mutex_lock (pmutex=0x8213558)
at /asterisk-
No locals.
#1  0x0808584a in ast_waitfor_nandfds (c=0xb6eac1b0, n=1,
fds=0xb6eac1bc, nfds=0, exception=0x0, outfd=0xb6eac1a0, ms=0xb6eac1b4)
at channel.c:1654
start = {tv_sec = 0, tv_usec = 0}
pfds = (struct pollfd *) 0xb6eac090
res = -1226129032
rms = -1228921824
x = 0
y = -1209229324
max = 135670632
sz = 8
now = 0
whentohangup = 0
diff = 135670632
winner = (struct ast_channel *) 0x0
fdmap = (struct fdmap *) 0xb6eac040
__PRETTY_FUNCTION__ = "ast_waitfor_nandfds"
#2  0x08086107 in ast_waitfordigit_full (c=0x82134e0, ms=100,
audiofd=-1, cmdfd=-1) at channel.c:1828
rchan = (struct ast_channel *) 0xb6c02420
outfd = -99999
__PRETTY_FUNCTION__ = "ast_waitfordigit_full"
#3  0x08085f9c in ast_waitfordigit (c=0x82134e0, ms=100) at
No locals.
#4  0xb7a53956 in common_exec (chan=0x82134e0, flags=0xb6ee0e04,
volfactor=0, fd=0, mygroup=0xb6ee0d83 "XXXXXXXX_SPY", spec=0x0,
exten=0x0, context=0x0) at app_chanspy.c:527
peer_chanspy_ds = (struct chanspy_ds *) 0x0
next_chanspy_ds = (struct chanspy_ds *) 0x0
prev = (struct ast_channel *) 0x0
peer = (struct ast_channel *) 0x0
nameprefix = "?\f???t\033\b\000\000\000\0008\f?U\v\b\200\027\037\b\001
peer_name = '\0' <repeats 13 times>, "\213\215??\237??`???L \000\000?
\v????", '\0' <repeats 28 times>, "h\020!\b????\000\000\000\000\220???
\000\000\000\000P \000\000k\000\000\000\bV!\b\002\000\000\000S=???Z\005
\220???Y@???g???&\000\000?\237??`???L \000\000\020V!\b\005??\f???\000
\000\000\000L \000\000\001\000\000\000\bV!\bL \000\000tv???6!\b"...
zero_volume = 0 '\0'
waitms = 100
res = 0
ptr = 0x0
num = -1209224820
num_spyed_upon = 0
chanspy_ds = {chan = 0x0, unique_id = "0\000\000\000\000\020??0g\000
\000\000\000\000\000\004\000\000", lock = {__data = {__lock = 0, __count
= 0, __owner = 0, __kind = 1, __nusers = 0, {
       __spins = 0, __list = {__next = 0x0}}}, __size = '\0' <repeats
12 times>, "\001\000\000\000\000\000\000\000\000\000\000", __align = 0}}
ASTERISK-1  0xb7a54749 in chanspy_exec (chan=0x82134e0, data=0xb6ee0d80) at
u = (struct ast_module_user *) 0x8211d68
options = 0xb6ee0d81 "g(XXXXXXXX_SPY"
spec = 0x0
argv = {0xb6ee0d80 "", 0xb6ee0d81 "g(XXXXXXXX_SPY"}
mygroup = 0xb6ee0d83 "XXXXXXXX_SPY"
recbase = 0x0
fd = 0
flags = {flags = 9}
oldwf = 4
argc = 2
volfactor = 0
---Type <return> to continue, or q <return> to quit---
res = 0
__PRETTY_FUNCTION__ = "chanspy_exec"
ASTERISK-2  0x080c3d14 in pbx_exec (c=0x82134e0, app=0x81b6a18, data=0xb6ee40f8)
at pbx.c:537
res = -16121856
saved_c_appl = 0x0
saved_c_data = 0x0
ASTERISK-3  0x080c72d4 in pbx_extension_helper (c=0x82134e0, con=0x0,
context=0x8213660 "xxxxxx", exten=0x82136b0 "xxxxxxxxspy", priority=1,
label=0x0, callerid=0x82113d0 "140", action=E_SPAWN) at pbx.c:1862
e = (struct ast_exten *) 0x81ffe68
app = (struct ast_app *) 0x81b6a18
res = 0
q = {incstack = {0x81fcfc4 "xxxxxx", 0x81fd1ec "papers", 0x81ff9d4
"qwebec", 0x0 <repeats 125 times>}, stacklen = 3, status = 5, swo = 0x0,
data = 0x0, foundcontext = 0x81fd18d "allopass"}
passdata = "|g(ALLOPASS_SPY)q", '\0' <repeats 8174 times>
matching_action = 0
__PRETTY_FUNCTION__ = "pbx_extension_helper"
ASTERISK-4  0x080c8443 in ast_spawn_extension (c=0x82134e0, context=0x8213660
"xxxxxx", exten=0x82136b0 "xxxxxxxxspy", priority=1, callerid=0x82113d0
"140") at pbx.c:2317
No locals.
ASTERISK-5  0x080c89ef in __ast_pbx_run (c=0x82134e0) at pbx.c:2419
dst_exten = '\0' <repeats 44 times>, "\020", '\0' <repeats 35 times>,
"?8!\b", '\0' <repeats 32 times>, "\220\211?\f\000\000\000???\000?
\f???\000\000\000\000\000\000\000\000\001", '\0' <repeats 11 times>,
pos = 0
digit = 0
found = 1
res = 0
autoloopflag = 0
error = 0
__PRETTY_FUNCTION__ = "__ast_pbx_run"
ASTERISK-6 0x080c99dc in pbx_thread (data=0x82134e0) at pbx.c:2636
c = (struct ast_channel *) 0x82134e0
ASTERISK-7 0x0810bae8 in dummy_start (data=0x82138d8) at utils.c:895
__cancel_buf = {__cancel_jmp_buf = {{__cancel_jmp_buf = {-1208696844,
0, -1225888880, -1225890856, -926490477, 896889326}, __mask_was_saved =
0}}, __pad = {0xb6ee6490, 0x0, 0x0, 0x0}}
__cancel_routine = (void (*)(void *)) 0x806c970 <ast_unregister_thread>
__cancel_arg = (void *) 0xb6ee6b90
not_first_call = 0
ret = (void *) 0x0
a = {start_routine = 0x80c99c5 <pbx_thread>, data = 0x82134e0, name =
0x8211560 "pbx_thread", ' ' <repeats 11 times>, "started at [ 2660]
pbx.c ast_pbx_start()"}
ASTERISK-8 0xb7f3c4fb in start_thread ()
from /lib/tls/i686/cmov/libpthread.so.0
No symbol table info available.
ASTERISK-9 0xb7e55e5e in clone () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
Comments:By: Mark Michelson (mmichelson) 2008-08-19 14:24:37

I'm curious to know a bit more information about this. At what point of the call did the crash occur? Was it immediately when ChanSpy was called, was it after a period of time on the phone, or was it when you hung up? I see that you marked the reproducibility as "have not tried," but had you been able to make successful calls to ChanSpy before? I'm trying to get a feel for how rare this crash is.


By: Ruddy G (ruddy) 2008-08-19 14:38:40

Well, the server crashes, let say 5 minutes after I place a call to spy some channels. And if I restart it, and call back, the server crash again after a certain time. So, I can reproduce it at will.
Also, for your information, each time, on the server I have something like 10-15 channels but none of them are in the spy group. So, when I spy, I hear nothing and after a few minutes, the server just crash

By: Mark Michelson (mmichelson) 2008-08-19 14:41:35

Thanks very much for the quick feedback!

By: Mark Michelson (mmichelson) 2008-08-19 15:22:22

I wasn't able to reproduce it here in my test setup. I'm suspecting that there's some sort of memory corruption occurring, but it's very difficult to tell where or how it is happening just by looking at the code.

Is it possible for you to run Asterisk under valgrind? If you could do so and upload the output here, that could be very helpful. There are instructions for how to do this in doc/valgrind.txt in the Asterisk source directory.


By: Ruddy G (ruddy) 2008-08-19 16:30:14

I was able to reproduce it on another server.
Here is the step I followed.
Maybe all informations here are not relevent.
1. The first step was to generate some calls in asterisk.
So, I put and extension in the dialplan as follow :

exten => 10000,1,Noop
exten => 10000,n,Answer
exten => 10000,n,Set(SPYGROUP=SOMETHINGELSE)
exten => 10000,n,Wait(5000)
exten => 10000,n,Hangup

Then, I have a call file local.call with the following content

Channel: Local/10000@default
Application: Wait
Data: 10000

2. Then,  I copy the local.call file in /var/spool/asterisk/outgoing something like 15 times (with different names). So, I have 30 channels in asterisk.

3. Then, I call the spy extension

exten => 675,1,ChanSpy(|g(XXXXXXXX_SPY)q)
exten => 675,n,Hangup

4. Then, I copy others local.call files to generate more calls. So, I have somethign like 40 channels in asterisk.

After 3/4 minutes, asterisk crashes ...
Please see the valgrind.txt file (the malloc_debug.txt file is empty)....
Here is the trace of the new crash :

(gdb) bt
#0  0x040007f2 in ?? () from /lib/ld-linux.so.2
#1  0x0417cc07 in poll () from /lib/tls/i686/cmov/libc.so.6
#2  0x08087c78 in ast_waitfor_nandfds (c=0x59b2e78, n=1, fds=0x0, nfds=0, exception=0x0, outfd=0x0, ms=0x59b2e74) at channel.c:1713
#3  0x0808a2a5 in ast_safe_sleep_conditional (chan=0x4476ff8, ms=5000000, cond=0, data=0x0) at channel.c:1782
#4  0x0808a318 in ast_safe_sleep (chan=0x4476ff8, ms=5000000) at channel.c:1162
ASTERISK-1  0x080cc370 in pbx_extension_helper (c=0x4476ff8, con=0x0, context=0x4477178 "default", exten=0x44771c8 "10000", priority=4, label=0x0, callerid=0x4477a70 "", action=E_SPAWN) at pbx.c:537
ASTERISK-2  0x080ce535 in __ast_pbx_run (c=0x4476ff8) at pbx.c:2317
ASTERISK-3  0x080cf40e in pbx_thread (data=0x4476ff8) at pbx.c:2636
ASTERISK-4  0x081005bb in dummy_start (data=0x4477b18) at utils.c:895
ASTERISK-5  0x040344fb in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
ASTERISK-6 0x04186e5e in clone () from /lib/tls/i686/cmov/libc.so.6

By: Mark Michelson (mmichelson) 2008-08-20 14:13:21

Using your set of steps, I am able to consistently get Asterisk to crash. Between you and I, I have seen crashes occur in three different places. In your original report, the crash occurred in the ast_mutex_lock call. In your latest note, the crash occurred within the call to poll(). Every time I've had the crash occur, it has stemmed from the snprintf call inside of next_channel() in app_chanspy. I'll keep looking into it to see if I can figure out what's happening.

By: Mark Michelson (mmichelson) 2008-08-20 14:16:40

An update:

I can get the crashes to occur in multiple different places (although all stem from app_chanspy) if I vary the number of active channels in Asterisk.

By: Digium Subversion (svnbot) 2008-08-20 17:05:45

Repository: asterisk
Revision: 139213

U   branches/1.4/apps/app_chanspy.c

r139213 | russell | 2008-08-20 17:05:44 -0500 (Wed, 20 Aug 2008) | 11 lines

Fix a crash in the ChanSpy application.  The issue here is that if you call
ChanSpy and specify a spy group, and sit in the application long enough looping
through the channel list, you will eventually run out of stack space and the
application with exit with a seg fault.  The backtrace was always inside of
a harmless snprintf() call, so it was tricky to track down.  However, it turned
out that the call to snprintf() was just the biggest stack consumer in this
code path, so it would always be the first one to hit the boundary.

(closes issue ASTERISK-12611)
Reported by: ruddy