[Home]

Summary:ASTERISK-14976: [patch] Crash in local_ast_moh_start / ast_indicate_data due to AST_CONTROL_HOLD with bad pointer
Reporter:Atis Lezdins (atis)Labels:
Date Opened:2009-10-12 08:24:47Date Closed:2010-03-26 13:14:47
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Channels/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) bt.asterisk2-2009-10-11T09:46:50-0700.26846.txt
( 1) bug16058.patch
( 2) bug16058-fix.patch
( 3) callid_1261511650.11325.log
( 4) debug.patch
( 5) frame_datalen.patch
( 6) framedebug.patch
( 7) framedebug2.patch
( 8) frames_jpeeler3.log
Description:I got the following backtrace:

Program terminated with signal 11, Segmentation fault.
# 0  0x00002aaab13265d0 in local_ast_moh_start (chan=0x2aaac82fad98, mclass=0xc8196c58 <Address 0xc8196c58 out of bounds>, interpclass=0xc4c1db "default") at /export/storage0/dist/1.6.1_p1/asterisk-svn-1.6.1.6-iqlabs/include/asterisk/strings.h:50
50              return (!s || (*s == '\0'));

# 0  0x00002aaab13265d0 in local_ast_moh_start (chan=0x2aaac82fad98, mclass=0xc8196c58 <Address 0xc8196c58 out of bounds>, interpclass=0xc4c1db "default") at /export/storage0/dist/1.6.1_p1/asterisk-svn-1.6.1.6-iqlabs/include/asterisk/strings.h:50
# 1  0x000000000046b106 in ast_moh_start (chan=0x2aaac82fad98, mclass=0xc8196c58 <Address 0xc8196c58 out of bounds>, interpclass=0xc4c1db "default") at channel.c:5625
# 2  0x00002aaab8e1ad84 in sip_indicate (ast=0x2aaac82fad98, condition=16, data=0xc8196c58, datalen=0) at chan_sip.c:5794
# 3  0x000000000045f40c in ast_indicate_data (chan=0x2aaac82fad98, _condition=16, data=0xc8196c58, datalen=0) at channel.c:3113
# 4  0x0000000000467171 in ast_generic_bridge (c0=0xaa7638, c1=0x2aaac82fad98, config=0x40c90c00, fo=0x40c8fd28, rc=0x40c8fd20, bridge_end={tv_sec = 1255274833, tv_usec = 81777}) at channel.c:4902

After investigating closer in frame 4 i found that:

f->datalen = 0
f->data.ptr = (void *) 0xc8196c58 (which is out of bounds in ast_strlen_zero)
f->frametype = AST_FRAME_CONTROL
f->subclass = 16 (AST_CONTROL_HOLD)

So, apparently something generates wrong HOLD frame with datalength 0 but invalid pointer.

Full backtrace attached
Comments:By: Atis Lezdins (atis) 2009-10-12 08:30:28

Attached small fix that should detect such issues in __ast_read and fix it, while providing additional debug.

By: Jeff Peeler (jpeeler) 2009-11-09 12:34:34.000-0600

What music on hold settings do you have set for your endpoints? I'm assuming you haven't changed anything and are using the default, but I need to be sure.

By: Jeff Peeler (jpeeler) 2009-11-09 15:06:06.000-0600

I think in this particular scenario the attached patch will solve the problem. If I'm correct though all uses of ast_indicate_data will need to be modified that use the data.ptr as the data arg.

By: Atis Lezdins (atis) 2009-11-10 13:35:50.000-0600

Issue is not reproducable on simple tests.

I'm going to test proposed patch on production environment for several days starting tomorrow.

By: Wolfgang Pichler (wuwu) 2009-11-12 06:28:06.000-0600

i do have the same problem here in version 1.6.1.6 - i will apply the patches tonight. I will report back if it solved the problems.

By: Jeff Peeler (jpeeler) 2009-11-12 10:29:28.000-0600

wuwu: Can you test with ./configure --enable-dev-mode and make sure to turn on DO_CRASH? Hopefully it will crash on the assert statement and a bt full will show the source of the problem.

By: Wolfgang Pichler (wuwu) 2009-11-12 10:44:52.000-0600

i thought the bug is already identified and patched with the above patch - and you only need some more patch testers... ?

Or should i enabled dev-mode on the patched version and see if any more crashes occour ?

Does enable-dev-mode has a major impact on performance ?

By: Atis Lezdins (atis) 2009-11-12 11:02:10.000-0600

wuwu:

my patch is fixing the crash, as is jpeeler's patch (he did the same thing one function higher).

However this is just a band-aid for the problem, which resides somewhere deeper.

If You can reproduce the problem, it would be very helpful. Enabling DO_CRASH (and dev-mode) shouldn't impact performance, however it will crash your system again providing much more debug to identify underlying problem (which is why i don't want to run it on production system).

By: Wolfgang Pichler (wuwu) 2009-11-12 14:39:52.000-0600

atis:

sorry - i have to problems also on a production machine (callcenter) - i can't wait for the next crash.
But i have an other hint - i did not have the crashes before i enabled a function in my app which does create a .call file to originate an outbound call. And every crash was related to an outbound call. I did not happend with outbound calls initiated by the user.
What about creating a test environment and doing some dummy outbound calls with it - maybe we are lucky and can reproduce the crash there.

By: Atis Lezdins (atis) 2009-11-12 16:25:28.000-0600

I already have two test boxes, throwing and routing calls at each other (one heavily depends on callfiles, but no music-on-hold), other one routing/playing but no call-files..

I checked logs, but it doesn't seem that either of them is having this problem. I also tried generating some calls on machine #2 from call-files and putting them on hold, but no success in reproducing the issue.

One issue case I investigated was not releated to call-file, but phone setting Hold while channel was executing Wait(1) before Hangup(). However I couldn't reproduce that.

By: Atis Lezdins (atis) 2009-11-16 06:27:48.000-0600

I recompiled our production box with --enable-dev-mode, assertion and my patch in place. For the weekend I got several frames fixed by my patch (and corresponding warnings), but no assertion failures. As far as I understand, that means, frames are queued incorrectly.

By: Jeff Peeler (jpeeler) 2009-11-19 17:23:44.000-0600

atis: I'm hoping the attached debug.patch will have something show up. It's going to be a bit verbose. It would be nice to see if any of the other warnings show up before being corrected in ast_read, but knowing that it doesn't occur is useful as well.

By: Jeff Peeler (jpeeler) 2009-11-30 11:10:50.000-0600

Any debug output to share Atis? I thought you were planning on putting the debug patch in production last week.

By: Digium Subversion (svnbot) 2009-12-01 15:37:03.000-0600

Repository: asterisk
Revision: 231911

U   branches/1.4/main/channel.c

------------------------------------------------------------------------
r231911 | jpeeler | 2009-12-01 15:37:02 -0600 (Tue, 01 Dec 2009) | 12 lines

Fix crash with invalid frame data

The crash was happening as a result of a frame containing an invalid data
pointer, but was set with data length of zero. The few times the issue was
reproduced it _seemed_ that the frame was queued properly, that is the data
pointer was set to NULL. I never could reproduce the crash so as a last resort
the crash has been fixed, but a check in __ast_read has been added to give as
much information about the source of problematic frames in the future.

(closes issue ASTERISK-14976)
Reported by: atis

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=231911

By: Digium Subversion (svnbot) 2009-12-01 16:01:53.000-0600

Repository: asterisk
Revision: 231927

_U  trunk/
U   trunk/main/channel.c

------------------------------------------------------------------------
r231927 | jpeeler | 2009-12-01 16:01:52 -0600 (Tue, 01 Dec 2009) | 19 lines

Merged revisions 231911 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
 r231911 | jpeeler | 2009-12-01 15:29:31 -0600 (Tue, 01 Dec 2009) | 12 lines
 
 Fix crash with invalid frame data
 
 The crash was happening as a result of a frame containing an invalid data
 pointer, but was set with data length of zero. The few times the issue was
 reproduced it _seemed_ that the frame was queued properly, that is the data
 pointer was set to NULL. I never could reproduce the crash so as a last resort
 the crash has been fixed, but a check in __ast_read has been added to give as
 much information about the source of problematic frames in the future.
 
 (closes issue ASTERISK-14976)
 Reported by: atis
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=231927

By: Digium Subversion (svnbot) 2009-12-01 16:05:17.000-0600

Repository: asterisk
Revision: 231928

_U  branches/1.6.0/
U   branches/1.6.0/main/channel.c

------------------------------------------------------------------------
r231928 | jpeeler | 2009-12-01 16:05:16 -0600 (Tue, 01 Dec 2009) | 26 lines

Merged revisions 231927 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
 r231927 | jpeeler | 2009-12-01 15:54:21 -0600 (Tue, 01 Dec 2009) | 19 lines
 
 Merged revisions 231911 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
   r231911 | jpeeler | 2009-12-01 15:29:31 -0600 (Tue, 01 Dec 2009) | 12 lines
   
   Fix crash with invalid frame data
   
   The crash was happening as a result of a frame containing an invalid data
   pointer, but was set with data length of zero. The few times the issue was
   reproduced it _seemed_ that the frame was queued properly, that is the data
   pointer was set to NULL. I never could reproduce the crash so as a last resort
   the crash has been fixed, but a check in __ast_read has been added to give as
   much information about the source of problematic frames in the future.
   
   (closes issue ASTERISK-14976)
   Reported by: atis
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=231928

By: Digium Subversion (svnbot) 2009-12-01 16:07:46.000-0600

Repository: asterisk
Revision: 231929

_U  branches/1.6.1/
U   branches/1.6.1/main/channel.c

------------------------------------------------------------------------
r231929 | jpeeler | 2009-12-01 16:07:45 -0600 (Tue, 01 Dec 2009) | 26 lines

Merged revisions 231927 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
 r231927 | jpeeler | 2009-12-01 15:54:21 -0600 (Tue, 01 Dec 2009) | 19 lines
 
 Merged revisions 231911 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
   r231911 | jpeeler | 2009-12-01 15:29:31 -0600 (Tue, 01 Dec 2009) | 12 lines
   
   Fix crash with invalid frame data
   
   The crash was happening as a result of a frame containing an invalid data
   pointer, but was set with data length of zero. The few times the issue was
   reproduced it _seemed_ that the frame was queued properly, that is the data
   pointer was set to NULL. I never could reproduce the crash so as a last resort
   the crash has been fixed, but a check in __ast_read has been added to give as
   much information about the source of problematic frames in the future.
   
   (closes issue ASTERISK-14976)
   Reported by: atis
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=231929

By: Digium Subversion (svnbot) 2009-12-01 16:10:50.000-0600

Repository: asterisk
Revision: 231930

_U  branches/1.6.2/
U   branches/1.6.2/main/channel.c

------------------------------------------------------------------------
r231930 | jpeeler | 2009-12-01 16:10:50 -0600 (Tue, 01 Dec 2009) | 26 lines

Merged revisions 231927 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
 r231927 | jpeeler | 2009-12-01 15:54:21 -0600 (Tue, 01 Dec 2009) | 19 lines
 
 Merged revisions 231911 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
   r231911 | jpeeler | 2009-12-01 15:29:31 -0600 (Tue, 01 Dec 2009) | 12 lines
   
   Fix crash with invalid frame data
   
   The crash was happening as a result of a frame containing an invalid data
   pointer, but was set with data length of zero. The few times the issue was
   reproduced it _seemed_ that the frame was queued properly, that is the data
   pointer was set to NULL. I never could reproduce the crash so as a last resort
   the crash has been fixed, but a check in __ast_read has been added to give as
   much information about the source of problematic frames in the future.
   
   (closes issue ASTERISK-14976)
   Reported by: atis
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=231930

By: Jeff Peeler (jpeeler) 2009-12-02 15:24:50.000-0600

This framedebug patch will hopefully do the trick. I plan on improving it some more and then committing it to all branches for future frame debugging.

By: Jeff Peeler (jpeeler) 2009-12-16 18:08:05.000-0600

I know it's been a while, but atis I've uploaded a new frame debug patch. Test it when you get a chance...

By: Atis Lezdins (atis) 2009-12-23 06:41:06.000-0600

Attached two files - one is all requested debug for 10 minutes around datalen=0, another is just for the specific call.

Btw, you made a tiny mistake in printing frame, you did that only when " f->frametype == AST_FRAME_CONTROL && f->subclass == AST_CONTROL_HOLD" so it didn't matched UNHOLD frames. I had left my previous block of debug below, that's why output is a bit different (without "jpeeler:" line). Other instances of debug just check for AST_FRAME_CONTROL so the debug should be sufficent.

The corresponding line for [src: process_sdp chan_sip.c:7805] is:

if (ast_test_flag(&p->flags[1], SIP_PAGE2_CALL_ONHOLD) && sin.sin_addr.s_addr && (!sendonly || sendonly == -1)) {
ast_queue_control(p->owner, AST_CONTROL_UNHOLD);



By: Jeff Peeler (jpeeler) 2010-01-04 10:47:08.000-0600

atis: With your latest debug information I was able to reproduce the issue. I have attached a fix and eagerly await your testing results. You should no longer see any warnings with finding an initialized data pointer.

By: Leif Madsen (lmadsen) 2010-01-05 09:37:58.000-0600

<leifmadsen> any chance you'll get to look at that tomorrow?
<atis_work> well, not really
<atis_work> i was frightened by this issue and rolled back our production today
<leifmadsen> gotcha -- jpeeler seems to have reproduced and provided a fix
<atis_work> i'd like to wait til next week, when our tester is back from vacation

By: Jeff Peeler (jpeeler) 2010-01-18 17:12:33.000-0600

Atis, please test so we can finally get this resolved.

By: Atis Lezdins (atis) 2010-01-18 17:23:36.000-0600

Sorry for the delay. I had last patch on production since last Thursday, and got none of the CONTROL frames showing with datalen=0 and pointer.

There were other types - DTMF and Frametype 10, but as I understand they shouldn't cause any problems.

By: Jeff Peeler (jpeeler) 2010-01-18 17:48:16.000-0600

As long as you didn't see any warnings originating from ast_read, then we should be good to go.

By: Digium Subversion (svnbot) 2010-01-26 12:07:59.000-0600

Repository: asterisk
Revision: 243244

U   trunk/main/frame.c

------------------------------------------------------------------------
r243244 | jpeeler | 2010-01-26 12:07:58 -0600 (Tue, 26 Jan 2010) | 12 lines

Fix crash resulting from frames with invalid data pointers.

In ast_frdup the frame data union does not get set to point to malloced memory
if the datalen is zero, so make sure to handle the same case in ast_frisolate
appropriately.

(closes issue ASTERISK-14976)
Reported by: atis
Patches:
     bug16058-fix.patch uploaded by jpeeler (license 325)
Tested by: atis

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=243244

By: Digium Subversion (svnbot) 2010-01-26 12:08:48.000-0600

Repository: asterisk
Revision: 243245

_U  branches/1.6.0/

------------------------------------------------------------------------
r243245 | jpeeler | 2010-01-26 12:08:48 -0600 (Tue, 26 Jan 2010) | 18 lines

Blocked revisions 243244 via svnmerge

........
 r243244 | jpeeler | 2010-01-26 12:07:57 -0600 (Tue, 26 Jan 2010) | 12 lines
 
 Fix crash resulting from frames with invalid data pointers.
 
 In ast_frdup the frame data union does not get set to point to malloced memory
 if the datalen is zero, so make sure to handle the same case in ast_frisolate
 appropriately.
 
 (closes issue ASTERISK-14976)
 Reported by: atis
 Patches:
       bug16058-fix.patch uploaded by jpeeler (license 325)
 Tested by: atis
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=243245

By: Digium Subversion (svnbot) 2010-01-26 12:12:28.000-0600

Repository: asterisk
Revision: 243246

_U  branches/1.6.1/
U   branches/1.6.1/main/frame.c

------------------------------------------------------------------------
r243246 | jpeeler | 2010-01-26 12:12:28 -0600 (Tue, 26 Jan 2010) | 19 lines

Merged revisions 243244 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

........
 r243244 | jpeeler | 2010-01-26 12:07:57 -0600 (Tue, 26 Jan 2010) | 12 lines
 
 Fix crash resulting from frames with invalid data pointers.
 
 In ast_frdup the frame data union does not get set to point to malloced memory
 if the datalen is zero, so make sure to handle the same case in ast_frisolate
 appropriately.
 
 (closes issue ASTERISK-14976)
 Reported by: atis
 Patches:
       bug16058-fix.patch uploaded by jpeeler (license 325)
 Tested by: atis
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=243246

By: Digium Subversion (svnbot) 2010-01-26 12:16:20.000-0600

Repository: asterisk
Revision: 243247

_U  branches/1.6.2/
U   branches/1.6.2/main/frame.c

------------------------------------------------------------------------
r243247 | jpeeler | 2010-01-26 12:16:19 -0600 (Tue, 26 Jan 2010) | 19 lines

Merged revisions 243244 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

........
 r243244 | jpeeler | 2010-01-26 12:07:57 -0600 (Tue, 26 Jan 2010) | 12 lines
 
 Fix crash resulting from frames with invalid data pointers.
 
 In ast_frdup the frame data union does not get set to point to malloced memory
 if the datalen is zero, so make sure to handle the same case in ast_frisolate
 appropriately.
 
 (closes issue ASTERISK-14976)
 Reported by: atis
 Patches:
       bug16058-fix.patch uploaded by jpeeler (license 325)
 Tested by: atis
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=243247

By: Leif Madsen (lmadsen) 2010-02-28 19:34:35.000-0600

Reopened issue per Wolfgang on asterisk-dev to add additional information.

By: Wolfgang Pichler (wuwu) 2010-03-17 03:51:46

I was having version 1.6.2.2 running - this version did already have
the patch included to avoid this bug. And it was working fine - i
never had the same problem again. But in 1.6.2.2 transfer is not
working anymore - so i needed to upgrade to 1.6.2.3-rc2.

But with 1.6.2.3-rc2 the above bug is back - so there must be another
source for this problem.

I have now 1.6.2.3-rc2 running - with the frame_datalen.patch applied to get the debugging output. And i have compiled it without optimization in dev mode. Problem now is - it is not crashing anymore ;-)
I now get lots of these messages....

[Mar 17 09:49:16] WARNING[12065]: channel.c:3065 __ast_read: Found frame with datalen=0 but initialized data pointer:
<< [ TYPE: Control (4) SUBCLASS: Unknown control '15' (15) ] [IAX2/tel01pica02-20225]

By: Leif Madsen (lmadsen) 2010-03-17 09:01:16

You may want to try 1.6.2.6 as it has additional fixes beyond 1.6.2.3-rc2.

By: Leif Madsen (lmadsen) 2010-03-17 09:47:59

If there are any additional comments or questions or issues with this bug, then please open a new issue and request it be marked as related to this one. Thanks!