Summary:ASTERISK-15966: ast->tech_pvt->rtp contains garbage yielding SEGFAULT
Reporter:Walter Doekes (wdoekes)Labels:
Date Opened:2010-04-16 06:12:28Date Closed:2011-06-07 14:01:06
Versions:Frequency of
Environment:Attachments:( 0) ast1430-17193-btfull.txt
( 1) ast1430-17193-threadbt.txt
( 2) ast1430-17193-variables.txt

somehow I can get asterisk-1.4.30-rc3 and earlier (1.4.24) to SEGFAULT. I haven't tried the 1.4.31-rc1 yet. But I don't think you've fixed anything related there.

Backtrace info:

root@voip-test:/usr/src/asterisk-1.4.30-rc3# asterisk -V
Asterisk 1.4.30-rc3
root@voip-test:/usr/src/asterisk-1.4.30-rc3# gdb `which asterisk` /root/asterisk-crash.core
#0  0x00007f92aedf2b66 in poll () from /lib/libc.so.6
(gdb) thread 6
[Switching to thread 6 (process 26472)]#0  ast_rtp_write (rtp=0x31202f2045544956, _f=0x42532f00) at rtp.c:2875
2875 if (!rtp->them.sin_addr.s_addr)
(gdb) bt
#0  ast_rtp_write (rtp=0x31202f2045544956, _f=0x42532f00) at rtp.c:2875
#1  0x00007f9297996952 in sip_write (ast=0x14e6360, frame=0x42532f00) at chan_sip.c:3922
#2  0x00007f9297996c55 in sip_rtp_prod (data=<value optimized out>) at chan_sip.c:4650
#3  0x00000000004a92db in ast_sched_runq (con=<value optimized out>) at sched.c:363
#4  0x00007f92979800fa in do_monitor (data=<value optimized out>) at chan_sip.c:17082
ASTERISK-1  0x00000000004b6c7c in dummy_start (data=<value optimized out>) at utils.c:856
ASTERISK-2  0x00007f92af75bfc7 in start_thread () from /lib/libpthread.so.0
ASTERISK-3  0x00007f92aedfb5ad in clone () from /lib/libc.so.6
ASTERISK-4  0x0000000000000000 in ?? ()
(gdb) list
2874 /* If we have no peer, return immediately */
2875 if (!rtp->them.sin_addr.s_addr)
2876 return 0;

That rtp value is garbage and therefore the process is killed when trying to access it.

****** STEPS TO REPRODUCE ******

I'm not really sure what causes this. But these crashes started to happen when I was playing with INVITE+replaces style call-pickup.

My setup is as follows:

phone -> opensips -> asterisk

The NOTIFY's that hold the call-pickup info contain the asterisk server IP:

<dialog-info xmlns="urn:ietf:params:xml:ns:dialog-info"
   version="41" state="full" entity="511599201@sip1.sig.gntel.nl">
 <dialog id="15b4840f122793461992f9a20734a144@"
     <identity display="pascal211">sip:211@</identity>
     <target uri="sip:211@"/>
     <target uri="sip:511599201@sip1.sig.gntel.nl"/>

When doing call pickup, my NATed phone bypasses the OpenSIPS and sends out the INVITE+replaces to the Asterisk machine directly. Call pickup then does not work because (for starters) the Contact/Via contain RFC1918 IP's. This is no problem, because I'm only trying out which call-pickup methods work and which do not. But after a while I get crashes.

Unfortunately, the crash happens first after a while and (in the case of the attached core dump) not on such a Replace'ing INVITE.

So, I cannot be entirely sure that it's the INVITE+replace that causes the breakage, but it's my best guess at the time.


If I find more info, I'll add that. I've marked it as private as I don't know how serious this might be.

Walter Doekes
Comments:By: Walter Doekes (wdoekes) 2010-04-16 07:05:00

The core dump:

I hope it can tell someone anything.

(Thanks leif for flagging this private.)

By: Walter Doekes (wdoekes) 2010-04-16 07:31:51

Okay, the good news is that it crashed without any INVITE+replaces this time. The bad news is that now the reproducibility is completely random to me.

I shall now check if the machine (memory, motherboard) is at fault first.

By: Walter Doekes (wdoekes) 2010-04-16 08:50:29


It was my own broken patch for bug 17012 that causes it. See the sip_rtp_prod in the backtrace that shouldn't be there.

Phew. Sorry :-)

I hope I haven't wasted too much of your time.

By: Leif Madsen (lmadsen) 2010-04-16 08:52:21

Closed based on feedback from reporter. Issue was in a local patch.