[Home]

Summary:ASTERISK-20856: Segmentation fault in res_rtp_asterisk.so caused by NULL data pointer in frame from sig_analog
Reporter:Roberto Casas (rcasas)Labels:
Date Opened:2013-01-03 06:40:19.000-0600Date Closed:2017-12-29 05:08:50.000-0600
Priority:CriticalRegression?
Status:Closed/CompleteComponents:Resources/res_rtp_asterisk
Versions:Frequency of
Occurrence
Frequent
Related
Issues:
Environment:Attachments:( 0) backtrace.txt
( 1) debug.log
( 2) messages.log
Description:I have this bug in Asterisk 1.8.3.1 but I've inspected trunk version and the code is almost the same.

The bug is in the function:

ast_rtp_raw_write

When we have a remote_address, but frame->data.ptr should be 0 (because substracting hdrlen gives position 0xfffffffffffffff4 to the rtpheader variable)
Comments:By: Roberto Casas (rcasas) 2013-01-03 06:41:03.116-0600

Backtrace of the problem

By: Roberto Casas (rcasas) 2013-01-03 06:47:03.668-0600

By the moment, I'm trying a workaround changing this line:

if (!ast_sockaddr_isnull(&remote_address)) {

to:

if (!ast_sockaddr_isnull(&remote_address) && frame->data.ptr > 0) {


By: Matt Jordan (mjordan) 2013-01-03 08:26:16.316-0600

I'm curious what that frame actually is, and why it has a NULL data pointer. Can you determine what the frame type is, and why chan_dahdi is writing a frame to it's bridged channel that has a NULL data pointer?

By: Roberto Casas (rcasas) 2013-01-03 09:26:19.518-0600

It should be a frame of type AST_FRAME_VOICE. The call comes from the dahdi channel, and after some dialplan actions, a SIP account is dialed. I have no idea where the NULL pointer come from, but I've seen some weird things related to network.

I'm going to attache debug and messages log when the segfault occurs.

By: Roberto Casas (rcasas) 2013-01-03 09:28:34.684-0600

Also, some times I'm seeing this messages on the log, with may be related to the problem:

[Jan  3 10:04:51] WARNING[14646] frame.c: Huh?  Can't smooth a non-voice frame!


By: Roberto Casas (rcasas) 2013-01-03 11:25:44.994-0600

I'm investigating about it, and I'm guessing it may be a problem with different MTU sizes between clients and Asterisk server.

By: Roberto Casas (rcasas) 2013-01-08 03:12:30.488-0600

After adding code to log when pointer is null, and then, changing MTU according to the new switch, no problem has been seen (4 days).

By: Rusty Newton (rnewton) 2013-01-11 14:36:12.051-0600

Roberto, is the problem still gone after the change you made? Can you describe the change to MTU that you made, plus what Switch (and firwmare version) are you using?

By: Matt Jordan (mjordan) 2013-01-12 22:58:15.939-0600

The problem here is that a media frame with a NULL data pointer was written out to the SIP channel. Channel drivers shouldn't be queuing up media frames with no data.

In this particular case, it appears as if the culprit was an analog channel, since the thread that started this was in sig_analog.

By: Roberto Casas (rcasas) 2013-01-14 02:37:28.516-0600

The status right now:

*CLI> core show uptime
System uptime: 1 week, 3 days, 21 hours, 17 minutes, 11 seconds
Last reload: 1 week, 2 days, 23 hours, 26 minutes, 16 seconds

It seems that the MTU change fixed it.

I'll try to explain the changes.

Asterisk server was connected to a network with a MTU of 1452 (value on the switches), and the network interface was configured to this value.

The client changed the telephony system to a new switch, configured with the default value of MTU, but Asterisk server was forced to 1452. SIP clients were configured to default MTU value.

So, the problem arrived when some packet from clients was fragmented, and the second packet produced a null frame. I think that it's not a problem related to DAHDI, because it was caused by network issues. My opinion is that it's a voice frame coming corrupted from a SIP channel.

By: Joshua C. Colp (jcolp) 2017-12-20 06:35:15.737-0600

Are you still experiencing this problem under recent supported versions of Asterisk?

By: Roberto Casas (rcasas) 2017-12-29 02:47:37.683-0600

I have not tried for a long time this scenario. It was a network misconfiguration that I have not tested again.


By: Joshua C. Colp (jcolp) 2017-12-29 05:08:50.385-0600

I'm suspending this issue then since it hasn't been seen since and noone else has experienced it.