Summary: | ASTERISK-12717: Seg fault 1.6.0 trunk | ||
Reporter: | Eric Dantie (edantie) | Labels: | |
Date Opened: | 2008-09-12 04:38:32 | Date Closed: | 2008-09-23 11:32:16 |
Priority: | Critical | Regression? | No |
Status: | Closed/Complete | Components: | CDR/General |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) btfull.txt ( 1) btfull2.txt ( 2) cdrcrash.fix ( 3) cdrcrash-trunk.fix ( 4) gdb-27th-sep.txt | |
Description: | Program terminated with signal 11, Segmentation fault. #0 0x0000000000446ed3 in ast_cdr_start (cdr=0x4a0) at cdr.c:680 680 if (!ast_test_flag(cdr, AST_CDR_FLAG_LOCKED)) { | ||
Comments: | By: Steve Murphy (murf) 2008-09-12 08:40:59 What's the circumstances of the crash? Is this happening during a transfer situation? What kind? By: Eric Dantie (edantie) 2008-09-13 12:39:54 I've been trying it on our main office. So I wasn't doing the tests myself. But for the extensions in action at that moment here is the situation (I suppose): One call enter from dahdi/18 (of a E1 TE110P) It called extension SIP/1015 SIP/1015 is on holidays (lucky one) and as programmed the polycom phone to be redirected to SIP/1020 (IP10s) Don't know exactly what's happen next: Normally, SIP/1020 is answering the call and do an attended call to another SIP phone (IP10s) Hope help By: Eric Dantie (edantie) 2008-09-13 12:49:26 Something like that at first part: -- Executing [928498991@incoming928498990:1] Gosub("DAHDI/13-1", "tamardialout,s,1(SIP/1015)") in new stack -- Executing [s@tamardialout:1] Set("DAHDI/13-1", "LOCAL(dev)=SIP/1015") in new stack -- Executing [s@tamardialout:2] Set("DAHDI/13-1", "~~EXTEN~~=s") in new stack -- Executing [s@tamardialout:3] Goto("DAHDI/13-1", "sw-19-928XXXXXX,10") in new stack -- Goto (tamardialout,sw-19-928XXXXXX,10) -- Executing [sw-19-928XXXXXX@tamardialout:10] Goto("DAHDI/13-1", "s,4") in new stack -- Goto (tamardialout,s,4) -- Executing [s@tamardialout:4] NoOp("DAHDI/13-1", "Finish switch-tamardialout-19") in new stack -- Executing [s@tamardialout:5] Dial("DAHDI/13-1", "SIP/1015,60") in new stack == Using SIP RTP TOS bits 184 == Using SIP RTP CoS mark 5 -- Called 1015 -- Accepting call from '928XXXXXX' to '928498991' on channel 0/13, span 1 [Sep 13 18:53:24] WARNING[24011]: chan_dahdi.c:1654 dahdi_enable_ec: Unable to enable echo cancellation on channel 13 (No such device) -- Got SIP response 302 "Moved Temporarily" back from 172.16.222.115 -- Now forwarding DAHDI/13-1 to 'Local/1020@sip1015' (thanks to SIP/1015-700c4000) -- Executing [1020@sip1015:1] Dial("Local/1020@sip1015-6847;2", "SIP/1020") in new stack == Using SIP RTP TOS bits 184 == Using SIP RTP CoS mark 5 -- Called 1020 -- SIP/1020-00899fe0 is ringing -- Local/1020@sip1015-6847;1 is ringing And then it's mystery. I've added more logging to know exactly on what conditions it has happened. By: Eric Dantie (edantie) 2008-09-13 13:02:22 line of cdr for that call: "","639XXXXXX","928498990","incoming928498990","639XXXXXX","DAHDI/18-1","SIP/1000-008d5540","Dial","SIP/1000,18","2008-09-12 12:54:00","2008-09-12 12:54:02","2008-09-1212:54:32",32,30,"ANSWERED","DOCUMENTATION","1221224040.372","" By: Eric Dantie (edantie) 2008-09-15 08:19:52 I'm adding the full message corresponding to the btfull2.txt file [Sep 15 13:37:15] VERBOSE[2230] logger.c: -- Accepting call from '928578771' to '928498996' on channel 0/4, span 1 [Sep 15 13:37:15] WARNING[2230] chan_dahdi.c: Unable to enable echo cancellation on channel 4 (No such device) [Sep 15 13:37:15] VERBOSE[7419] logger.c: -- Executing [928498996@incoming928498990:1] ^[[1;36;40mGosub^[[0;37;40m("^[[1;35;40mDAHDI/4-1^[[0;37;40m", "^[[1;35;40mtamardialout,s,1(SIP/1008)^[[0;37;40m") in new stack [Sep 15 13:37:15] VERBOSE[7419] logger.c: -- Executing [s@tamardialout:1] ^[[1;36;40mSet^[[0;37;40m("^[[1;35;40mDAHDI/4-1^[[0;37;40m", "^[[1;35;40mLOCAL(dev)=SIP/1008^[[0;37;40m") in new stack [Sep 15 13:37:15] VERBOSE[7419] logger.c: -- Executing [s@tamardialout:2] ^[[1;36;40mSet^[[0;37;40m("^[[1;35;40mDAHDI/4-1^[[0;37;40m", "^[[1;35;40m~~EXTEN~~=s^[[0;37;40m") in new stack [Sep 15 13:37:15] VERBOSE[7419] logger.c: -- Executing [s@tamardialout:3] ^[[1;36;40mGoto^[[0;37;40m("^[[1;35;40mDAHDI/4-1^[[0;37;40m", "^[[1;35;40msw-19-928578771,10^[[0;37;40m") in new stack [Sep 15 13:37:15] VERBOSE[7419] logger.c: -- Goto (tamardialout,sw-19-928578771,10) [Sep 15 13:37:15] VERBOSE[7419] logger.c: -- Executing [sw-19-928578771@tamardialout:10] ^[[1;36;40mGoto^[[0;37;40m("^[[1;35;40mDAHDI/4-1^[[0;37;40m", "^[[1;35;40ms,4^[[0;37;40m") in new stack [Sep 15 13:37:15] VERBOSE[7419] logger.c: -- Goto (tamardialout,s,4) [Sep 15 13:37:15] VERBOSE[7419] logger.c: -- Executing [s@tamardialout:4] ^[[1;36;40mNoOp^[[0;37;40m("^[[1;35;40mDAHDI/4-1^[[0;37;40m", "^[[1;35;40mFinish switch-tamardialout-19^[[0;37;40m") in new stack [Sep 15 13:37:15] VERBOSE[7419] logger.c: -- Executing [s@tamardialout:5] ^[[1;36;40mDial^[[0;37;40m("^[[1;35;40mDAHDI/4-1^[[0;37;40m", "^[[1;35;40mSIP/1008,60^[[0;37;40m") in new stack [Sep 15 13:37:15] VERBOSE[7419] logger.c: == Using SIP RTP TOS bits 184 [Sep 15 13:37:15] VERBOSE[7419] logger.c: == Using SIP RTP CoS mark 5 [Sep 15 13:37:15] VERBOSE[7419] logger.c: -- Called 1008 [Sep 15 13:37:15] VERBOSE[2226] logger.c: -- Got SIP response 302 "Moved Temporarily" back from 172.16.222.108 [Sep 15 13:37:15] VERBOSE[7419] logger.c: -- Now forwarding DAHDI/4-1 to 'Local/1023@bintersips91' (thanks to SIP/1008-58001590) [Sep 15 13:37:15] VERBOSE[7420] logger.c: -- Executing [1023@bintersips91:1] ^[[1;36;40mDial^[[0;37;40m("^[[1;35;40mLocal/1023@bintersips91-b5e4;2^[[0;37;40m", "^[[1;35;40mSIP/1023^[[0;37;40m") in new stack [Sep 15 13:37:15] VERBOSE[7420] logger.c: == Using SIP RTP TOS bits 184 [Sep 15 13:37:15] VERBOSE[7420] logger.c: == Using SIP RTP CoS mark 5 [Sep 15 13:37:15] VERBOSE[7420] logger.c: -- Called 1023 [Sep 15 13:37:15] VERBOSE[7420] logger.c: -- SIP/1023-00896620 is ringing [Sep 15 13:37:15] VERBOSE[7419] logger.c: -- Local/1023@bintersips91-b5e4;1 is ringing [Sep 15 13:37:26] VERBOSE[7420] logger.c: -- SIP/1023-00896620 answered Local/1023@bintersips91-b5e4;2 [Sep 15 13:37:26] VERBOSE[7419] logger.c: -- Local/1023@bintersips91-b5e4;1 answered DAHDI/4-1 [Sep 15 13:37:27] VERBOSE[7419] logger.c: -- fixed jitterbuffer created on channel DAHDI/4-1 [Sep 15 13:37:27] VERBOSE[7420] logger.c: -- Executing [h@bintersips91:1] ^[[1;36;40mNoOp^[[0;37;40m("^[[1;35;40mLocal/1023@bintersips91-b5e4;2^[[0;37;40m", "^[[1;35;40m16^[[0;37;40m") in new stack [Sep 15 13:37:27] VERBOSE[7420] logger.c: == Spawn extension (bintersips91, 1023, 1) exited non-zero on 'Local/1023@bintersips91-b5e4;2' RESET By: Steve Murphy (murf) 2008-09-16 15:43:10 OK, I've gone thru my scenario list and fairly quickly found one that caused a crash; I traced it from one end to the other and it was pretty interesting. I looked at from all sorts of angles, and came up with the patch I attached. Please, test quickly, I want to keep down the number of bugs reported against this. I've personally tested it against polycom430 set to fwd to another exten, blind xfer (via dahdi hookflash), blind xfer via #-key; attended dahdi xfer via 3-way; with the caller doing the xfer, and the callee doing the xfer; sip blind xfer via the phone button, etc, and all seem to survive. By: Steve Murphy (murf) 2008-09-16 15:58:04 BTW, the attached patch is for 1.4; I'll come up with the 1.6.x/trunk version here in a little bit By: Steve Murphy (murf) 2008-09-16 17:44:43 I just uploaded a trunk version By: Eric Dantie (edantie) 2008-09-17 01:27:20 I'm testing it By: Eric Dantie (edantie) 2008-09-17 14:46:45 One day with no crash!!! For me it's working. Will let it for more time. By: Doug (doug) 2008-09-18 01:17:23 gdb-27th-sep.txt is a debug from a crash with using the patch. This is using the ver 1.4 By: Eric Dantie (edantie) 2008-09-18 15:58:46 For me another whole day without crash By: Steve Murphy (murf) 2008-09-22 10:27:17 DougUDI-- I don't know what to do about your crash; it *appears* to have nothing to do with this problem, at least none I can make out. I need more evidence that the patch is somehow implicated. Do you get the crash w/o the patch? (or does this patch allow asterisk to live long enough to see this crash instead?) It might be better to file this crash as a separate bug. edantie-- This is encouraging. I shall most likely apply the patch to quiet the storm, and then we can look at "friendly fire", "collateral damage", etc. reports that may pop up. By: Eric Dantie (edantie) 2008-09-22 13:44:43 Still no crash for me till 2008-09-17. 24H/24H up. For me it's ok. By: Steve Murphy (murf) 2008-09-23 11:15:53 OK, I'm going to commit the patch to 1.4, trunk, 1.6.x here in the next few minutes. Hopefully, this solves this problem, and we can handle remaining issues under other bug reports. If we continue to see crashes involving ast_bridge_call and CDR's, please reopen! By: Digium Subversion (svnbot) 2008-09-23 11:32:12 Repository: asterisk Revision: 144066 U branches/1.4/res/res_features.c ------------------------------------------------------------------------ r144066 | murf | 2008-09-23 11:32:04 -0500 (Tue, 23 Sep 2008) | 29 lines (closes issue ASTERISK-12735) Reported by: DougUDI Tested by: murf (closes issue ASTERISK-12736) Reported by: seanbright Tested by: murf (closes issue ASTERISK-12717) Reported by: edantie Tested by: murf, edantie, DougUDI This crash happens because we are unsafely handling old pointers. The channel whose cdr is being handled, has been hung up and destroyed already. I reorganized the code a bit, and tried not to lose the fork-cdr-chain concepts of the previous code. I now verify that the 'previous' channel (the channel we had when the bridge was started), still exists, by looking it up by name in the channel list. I also do not try to reset the CDR's of channels involved in bridges. Testing shows it solves the crash problem, and should not negatively impact previous fixes involving CDR's generated during/after blind transfers. (The reason we need to reset the CDR's on the "beginning" channels in the first place). ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=144066 |