Summary: | ASTERISK-14976: [patch] Crash in local_ast_moh_start / ast_indicate_data due to AST_CONTROL_HOLD with bad pointer | ||
Reporter: | Atis Lezdins (atis) | Labels: | |
Date Opened: | 2009-10-12 08:24:47 | Date Closed: | 2010-03-26 13:14:47 |
Priority: | Critical | Regression? | No |
Status: | Closed/Complete | Components: | Channels/General |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) bt.asterisk2-2009-10-11T09:46:50-0700.26846.txt ( 1) bug16058.patch ( 2) bug16058-fix.patch ( 3) callid_1261511650.11325.log ( 4) debug.patch ( 5) frame_datalen.patch ( 6) framedebug.patch ( 7) framedebug2.patch ( 8) frames_jpeeler3.log | |
Description: | I got the following backtrace: Program terminated with signal 11, Segmentation fault. # 0 0x00002aaab13265d0 in local_ast_moh_start (chan=0x2aaac82fad98, mclass=0xc8196c58 <Address 0xc8196c58 out of bounds>, interpclass=0xc4c1db "default") at /export/storage0/dist/1.6.1_p1/asterisk-svn-1.6.1.6-iqlabs/include/asterisk/strings.h:50 50 return (!s || (*s == '\0')); # 0 0x00002aaab13265d0 in local_ast_moh_start (chan=0x2aaac82fad98, mclass=0xc8196c58 <Address 0xc8196c58 out of bounds>, interpclass=0xc4c1db "default") at /export/storage0/dist/1.6.1_p1/asterisk-svn-1.6.1.6-iqlabs/include/asterisk/strings.h:50 # 1 0x000000000046b106 in ast_moh_start (chan=0x2aaac82fad98, mclass=0xc8196c58 <Address 0xc8196c58 out of bounds>, interpclass=0xc4c1db "default") at channel.c:5625 # 2 0x00002aaab8e1ad84 in sip_indicate (ast=0x2aaac82fad98, condition=16, data=0xc8196c58, datalen=0) at chan_sip.c:5794 # 3 0x000000000045f40c in ast_indicate_data (chan=0x2aaac82fad98, _condition=16, data=0xc8196c58, datalen=0) at channel.c:3113 # 4 0x0000000000467171 in ast_generic_bridge (c0=0xaa7638, c1=0x2aaac82fad98, config=0x40c90c00, fo=0x40c8fd28, rc=0x40c8fd20, bridge_end={tv_sec = 1255274833, tv_usec = 81777}) at channel.c:4902 After investigating closer in frame 4 i found that: f->datalen = 0 f->data.ptr = (void *) 0xc8196c58 (which is out of bounds in ast_strlen_zero) f->frametype = AST_FRAME_CONTROL f->subclass = 16 (AST_CONTROL_HOLD) So, apparently something generates wrong HOLD frame with datalength 0 but invalid pointer. Full backtrace attached | ||
Comments: | By: Atis Lezdins (atis) 2009-10-12 08:30:28 Attached small fix that should detect such issues in __ast_read and fix it, while providing additional debug. By: Jeff Peeler (jpeeler) 2009-11-09 12:34:34.000-0600 What music on hold settings do you have set for your endpoints? I'm assuming you haven't changed anything and are using the default, but I need to be sure. By: Jeff Peeler (jpeeler) 2009-11-09 15:06:06.000-0600 I think in this particular scenario the attached patch will solve the problem. If I'm correct though all uses of ast_indicate_data will need to be modified that use the data.ptr as the data arg. By: Atis Lezdins (atis) 2009-11-10 13:35:50.000-0600 Issue is not reproducable on simple tests. I'm going to test proposed patch on production environment for several days starting tomorrow. By: Wolfgang Pichler (wuwu) 2009-11-12 06:28:06.000-0600 i do have the same problem here in version 1.6.1.6 - i will apply the patches tonight. I will report back if it solved the problems. By: Jeff Peeler (jpeeler) 2009-11-12 10:29:28.000-0600 wuwu: Can you test with ./configure --enable-dev-mode and make sure to turn on DO_CRASH? Hopefully it will crash on the assert statement and a bt full will show the source of the problem. By: Wolfgang Pichler (wuwu) 2009-11-12 10:44:52.000-0600 i thought the bug is already identified and patched with the above patch - and you only need some more patch testers... ? Or should i enabled dev-mode on the patched version and see if any more crashes occour ? Does enable-dev-mode has a major impact on performance ? By: Atis Lezdins (atis) 2009-11-12 11:02:10.000-0600 wuwu: my patch is fixing the crash, as is jpeeler's patch (he did the same thing one function higher). However this is just a band-aid for the problem, which resides somewhere deeper. If You can reproduce the problem, it would be very helpful. Enabling DO_CRASH (and dev-mode) shouldn't impact performance, however it will crash your system again providing much more debug to identify underlying problem (which is why i don't want to run it on production system). By: Wolfgang Pichler (wuwu) 2009-11-12 14:39:52.000-0600 atis: sorry - i have to problems also on a production machine (callcenter) - i can't wait for the next crash. But i have an other hint - i did not have the crashes before i enabled a function in my app which does create a .call file to originate an outbound call. And every crash was related to an outbound call. I did not happend with outbound calls initiated by the user. What about creating a test environment and doing some dummy outbound calls with it - maybe we are lucky and can reproduce the crash there. By: Atis Lezdins (atis) 2009-11-12 16:25:28.000-0600 I already have two test boxes, throwing and routing calls at each other (one heavily depends on callfiles, but no music-on-hold), other one routing/playing but no call-files.. I checked logs, but it doesn't seem that either of them is having this problem. I also tried generating some calls on machine #2 from call-files and putting them on hold, but no success in reproducing the issue. One issue case I investigated was not releated to call-file, but phone setting Hold while channel was executing Wait(1) before Hangup(). However I couldn't reproduce that. By: Atis Lezdins (atis) 2009-11-16 06:27:48.000-0600 I recompiled our production box with --enable-dev-mode, assertion and my patch in place. For the weekend I got several frames fixed by my patch (and corresponding warnings), but no assertion failures. As far as I understand, that means, frames are queued incorrectly. By: Jeff Peeler (jpeeler) 2009-11-19 17:23:44.000-0600 atis: I'm hoping the attached debug.patch will have something show up. It's going to be a bit verbose. It would be nice to see if any of the other warnings show up before being corrected in ast_read, but knowing that it doesn't occur is useful as well. By: Jeff Peeler (jpeeler) 2009-11-30 11:10:50.000-0600 Any debug output to share Atis? I thought you were planning on putting the debug patch in production last week. By: Digium Subversion (svnbot) 2009-12-01 15:37:03.000-0600 Repository: asterisk Revision: 231911 U branches/1.4/main/channel.c ------------------------------------------------------------------------ r231911 | jpeeler | 2009-12-01 15:37:02 -0600 (Tue, 01 Dec 2009) | 12 lines Fix crash with invalid frame data The crash was happening as a result of a frame containing an invalid data pointer, but was set with data length of zero. The few times the issue was reproduced it _seemed_ that the frame was queued properly, that is the data pointer was set to NULL. I never could reproduce the crash so as a last resort the crash has been fixed, but a check in __ast_read has been added to give as much information about the source of problematic frames in the future. (closes issue ASTERISK-14976) Reported by: atis ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=231911 By: Digium Subversion (svnbot) 2009-12-01 16:01:53.000-0600 Repository: asterisk Revision: 231927 _U trunk/ U trunk/main/channel.c ------------------------------------------------------------------------ r231927 | jpeeler | 2009-12-01 16:01:52 -0600 (Tue, 01 Dec 2009) | 19 lines Merged revisions 231911 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r231911 | jpeeler | 2009-12-01 15:29:31 -0600 (Tue, 01 Dec 2009) | 12 lines Fix crash with invalid frame data The crash was happening as a result of a frame containing an invalid data pointer, but was set with data length of zero. The few times the issue was reproduced it _seemed_ that the frame was queued properly, that is the data pointer was set to NULL. I never could reproduce the crash so as a last resort the crash has been fixed, but a check in __ast_read has been added to give as much information about the source of problematic frames in the future. (closes issue ASTERISK-14976) Reported by: atis ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=231927 By: Digium Subversion (svnbot) 2009-12-01 16:05:17.000-0600 Repository: asterisk Revision: 231928 _U branches/1.6.0/ U branches/1.6.0/main/channel.c ------------------------------------------------------------------------ r231928 | jpeeler | 2009-12-01 16:05:16 -0600 (Tue, 01 Dec 2009) | 26 lines Merged revisions 231927 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ................ r231927 | jpeeler | 2009-12-01 15:54:21 -0600 (Tue, 01 Dec 2009) | 19 lines Merged revisions 231911 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r231911 | jpeeler | 2009-12-01 15:29:31 -0600 (Tue, 01 Dec 2009) | 12 lines Fix crash with invalid frame data The crash was happening as a result of a frame containing an invalid data pointer, but was set with data length of zero. The few times the issue was reproduced it _seemed_ that the frame was queued properly, that is the data pointer was set to NULL. I never could reproduce the crash so as a last resort the crash has been fixed, but a check in __ast_read has been added to give as much information about the source of problematic frames in the future. (closes issue ASTERISK-14976) Reported by: atis ........ ................ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=231928 By: Digium Subversion (svnbot) 2009-12-01 16:07:46.000-0600 Repository: asterisk Revision: 231929 _U branches/1.6.1/ U branches/1.6.1/main/channel.c ------------------------------------------------------------------------ r231929 | jpeeler | 2009-12-01 16:07:45 -0600 (Tue, 01 Dec 2009) | 26 lines Merged revisions 231927 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ................ r231927 | jpeeler | 2009-12-01 15:54:21 -0600 (Tue, 01 Dec 2009) | 19 lines Merged revisions 231911 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r231911 | jpeeler | 2009-12-01 15:29:31 -0600 (Tue, 01 Dec 2009) | 12 lines Fix crash with invalid frame data The crash was happening as a result of a frame containing an invalid data pointer, but was set with data length of zero. The few times the issue was reproduced it _seemed_ that the frame was queued properly, that is the data pointer was set to NULL. I never could reproduce the crash so as a last resort the crash has been fixed, but a check in __ast_read has been added to give as much information about the source of problematic frames in the future. (closes issue ASTERISK-14976) Reported by: atis ........ ................ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=231929 By: Digium Subversion (svnbot) 2009-12-01 16:10:50.000-0600 Repository: asterisk Revision: 231930 _U branches/1.6.2/ U branches/1.6.2/main/channel.c ------------------------------------------------------------------------ r231930 | jpeeler | 2009-12-01 16:10:50 -0600 (Tue, 01 Dec 2009) | 26 lines Merged revisions 231927 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ................ r231927 | jpeeler | 2009-12-01 15:54:21 -0600 (Tue, 01 Dec 2009) | 19 lines Merged revisions 231911 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r231911 | jpeeler | 2009-12-01 15:29:31 -0600 (Tue, 01 Dec 2009) | 12 lines Fix crash with invalid frame data The crash was happening as a result of a frame containing an invalid data pointer, but was set with data length of zero. The few times the issue was reproduced it _seemed_ that the frame was queued properly, that is the data pointer was set to NULL. I never could reproduce the crash so as a last resort the crash has been fixed, but a check in __ast_read has been added to give as much information about the source of problematic frames in the future. (closes issue ASTERISK-14976) Reported by: atis ........ ................ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=231930 By: Jeff Peeler (jpeeler) 2009-12-02 15:24:50.000-0600 This framedebug patch will hopefully do the trick. I plan on improving it some more and then committing it to all branches for future frame debugging. By: Jeff Peeler (jpeeler) 2009-12-16 18:08:05.000-0600 I know it's been a while, but atis I've uploaded a new frame debug patch. Test it when you get a chance... By: Atis Lezdins (atis) 2009-12-23 06:41:06.000-0600 Attached two files - one is all requested debug for 10 minutes around datalen=0, another is just for the specific call. Btw, you made a tiny mistake in printing frame, you did that only when " f->frametype == AST_FRAME_CONTROL && f->subclass == AST_CONTROL_HOLD" so it didn't matched UNHOLD frames. I had left my previous block of debug below, that's why output is a bit different (without "jpeeler:" line). Other instances of debug just check for AST_FRAME_CONTROL so the debug should be sufficent. The corresponding line for [src: process_sdp chan_sip.c:7805] is: if (ast_test_flag(&p->flags[1], SIP_PAGE2_CALL_ONHOLD) && sin.sin_addr.s_addr && (!sendonly || sendonly == -1)) { ast_queue_control(p->owner, AST_CONTROL_UNHOLD); By: Jeff Peeler (jpeeler) 2010-01-04 10:47:08.000-0600 atis: With your latest debug information I was able to reproduce the issue. I have attached a fix and eagerly await your testing results. You should no longer see any warnings with finding an initialized data pointer. By: Leif Madsen (lmadsen) 2010-01-05 09:37:58.000-0600 <leifmadsen> any chance you'll get to look at that tomorrow? <atis_work> well, not really <atis_work> i was frightened by this issue and rolled back our production today <leifmadsen> gotcha -- jpeeler seems to have reproduced and provided a fix <atis_work> i'd like to wait til next week, when our tester is back from vacation By: Jeff Peeler (jpeeler) 2010-01-18 17:12:33.000-0600 Atis, please test so we can finally get this resolved. By: Atis Lezdins (atis) 2010-01-18 17:23:36.000-0600 Sorry for the delay. I had last patch on production since last Thursday, and got none of the CONTROL frames showing with datalen=0 and pointer. There were other types - DTMF and Frametype 10, but as I understand they shouldn't cause any problems. By: Jeff Peeler (jpeeler) 2010-01-18 17:48:16.000-0600 As long as you didn't see any warnings originating from ast_read, then we should be good to go. By: Digium Subversion (svnbot) 2010-01-26 12:07:59.000-0600 Repository: asterisk Revision: 243244 U trunk/main/frame.c ------------------------------------------------------------------------ r243244 | jpeeler | 2010-01-26 12:07:58 -0600 (Tue, 26 Jan 2010) | 12 lines Fix crash resulting from frames with invalid data pointers. In ast_frdup the frame data union does not get set to point to malloced memory if the datalen is zero, so make sure to handle the same case in ast_frisolate appropriately. (closes issue ASTERISK-14976) Reported by: atis Patches: bug16058-fix.patch uploaded by jpeeler (license 325) Tested by: atis ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=243244 By: Digium Subversion (svnbot) 2010-01-26 12:08:48.000-0600 Repository: asterisk Revision: 243245 _U branches/1.6.0/ ------------------------------------------------------------------------ r243245 | jpeeler | 2010-01-26 12:08:48 -0600 (Tue, 26 Jan 2010) | 18 lines Blocked revisions 243244 via svnmerge ........ r243244 | jpeeler | 2010-01-26 12:07:57 -0600 (Tue, 26 Jan 2010) | 12 lines Fix crash resulting from frames with invalid data pointers. In ast_frdup the frame data union does not get set to point to malloced memory if the datalen is zero, so make sure to handle the same case in ast_frisolate appropriately. (closes issue ASTERISK-14976) Reported by: atis Patches: bug16058-fix.patch uploaded by jpeeler (license 325) Tested by: atis ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=243245 By: Digium Subversion (svnbot) 2010-01-26 12:12:28.000-0600 Repository: asterisk Revision: 243246 _U branches/1.6.1/ U branches/1.6.1/main/frame.c ------------------------------------------------------------------------ r243246 | jpeeler | 2010-01-26 12:12:28 -0600 (Tue, 26 Jan 2010) | 19 lines Merged revisions 243244 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ........ r243244 | jpeeler | 2010-01-26 12:07:57 -0600 (Tue, 26 Jan 2010) | 12 lines Fix crash resulting from frames with invalid data pointers. In ast_frdup the frame data union does not get set to point to malloced memory if the datalen is zero, so make sure to handle the same case in ast_frisolate appropriately. (closes issue ASTERISK-14976) Reported by: atis Patches: bug16058-fix.patch uploaded by jpeeler (license 325) Tested by: atis ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=243246 By: Digium Subversion (svnbot) 2010-01-26 12:16:20.000-0600 Repository: asterisk Revision: 243247 _U branches/1.6.2/ U branches/1.6.2/main/frame.c ------------------------------------------------------------------------ r243247 | jpeeler | 2010-01-26 12:16:19 -0600 (Tue, 26 Jan 2010) | 19 lines Merged revisions 243244 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ........ r243244 | jpeeler | 2010-01-26 12:07:57 -0600 (Tue, 26 Jan 2010) | 12 lines Fix crash resulting from frames with invalid data pointers. In ast_frdup the frame data union does not get set to point to malloced memory if the datalen is zero, so make sure to handle the same case in ast_frisolate appropriately. (closes issue ASTERISK-14976) Reported by: atis Patches: bug16058-fix.patch uploaded by jpeeler (license 325) Tested by: atis ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=243247 By: Leif Madsen (lmadsen) 2010-02-28 19:34:35.000-0600 Reopened issue per Wolfgang on asterisk-dev to add additional information. By: Wolfgang Pichler (wuwu) 2010-03-17 03:51:46 I was having version 1.6.2.2 running - this version did already have the patch included to avoid this bug. And it was working fine - i never had the same problem again. But in 1.6.2.2 transfer is not working anymore - so i needed to upgrade to 1.6.2.3-rc2. But with 1.6.2.3-rc2 the above bug is back - so there must be another source for this problem. I have now 1.6.2.3-rc2 running - with the frame_datalen.patch applied to get the debugging output. And i have compiled it without optimization in dev mode. Problem now is - it is not crashing anymore ;-) I now get lots of these messages.... [Mar 17 09:49:16] WARNING[12065]: channel.c:3065 __ast_read: Found frame with datalen=0 but initialized data pointer: << [ TYPE: Control (4) SUBCLASS: Unknown control '15' (15) ] [IAX2/tel01pica02-20225] By: Leif Madsen (lmadsen) 2010-03-17 09:01:16 You may want to try 1.6.2.6 as it has additional fixes beyond 1.6.2.3-rc2. By: Leif Madsen (lmadsen) 2010-03-17 09:47:59 If there are any additional comments or questions or issues with this bug, then please open a new issue and request it be marked as related to this one. Thanks! |