[Home]

Summary:ASTERISK-12717: Seg fault 1.6.0 trunk
Reporter:Eric Dantie (edantie)Labels:
Date Opened:2008-09-12 04:38:32Date Closed:2008-09-23 11:32:16
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:CDR/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) btfull.txt
( 1) btfull2.txt
( 2) cdrcrash.fix
( 3) cdrcrash-trunk.fix
( 4) gdb-27th-sep.txt
Description:Program terminated with signal 11, Segmentation fault.
#0  0x0000000000446ed3 in ast_cdr_start (cdr=0x4a0) at cdr.c:680
680 if (!ast_test_flag(cdr, AST_CDR_FLAG_LOCKED)) {
Comments:By: Steve Murphy (murf) 2008-09-12 08:40:59

What's the circumstances of the crash? Is this happening during a transfer situation? What kind?

By: Eric Dantie (edantie) 2008-09-13 12:39:54

I've been trying it on our main office. So I wasn't doing the tests myself. But for the extensions in action at that moment here is the situation (I suppose):

One call enter from dahdi/18 (of a E1 TE110P)
It called extension SIP/1015
SIP/1015 is on holidays (lucky one) and as programmed the polycom phone to be redirected to SIP/1020 (IP10s)

Don't know exactly what's happen next:

Normally, SIP/1020 is answering the call and do an attended call to another SIP phone (IP10s)

Hope help

By: Eric Dantie (edantie) 2008-09-13 12:49:26

Something like that at first part:
   -- Executing [928498991@incoming928498990:1] Gosub("DAHDI/13-1", "tamardialout,s,1(SIP/1015)") in new stack
   -- Executing [s@tamardialout:1] Set("DAHDI/13-1", "LOCAL(dev)=SIP/1015") in new stack
   -- Executing [s@tamardialout:2] Set("DAHDI/13-1", "~~EXTEN~~=s") in new stack
   -- Executing [s@tamardialout:3] Goto("DAHDI/13-1", "sw-19-928XXXXXX,10") in new stack
   -- Goto (tamardialout,sw-19-928XXXXXX,10)
   -- Executing [sw-19-928XXXXXX@tamardialout:10] Goto("DAHDI/13-1", "s,4") in new stack
   -- Goto (tamardialout,s,4)
   -- Executing [s@tamardialout:4] NoOp("DAHDI/13-1", "Finish switch-tamardialout-19") in new stack
   -- Executing [s@tamardialout:5] Dial("DAHDI/13-1", "SIP/1015,60") in new stack
 == Using SIP RTP TOS bits 184
 == Using SIP RTP CoS mark 5
   -- Called 1015
   -- Accepting call from '928XXXXXX' to '928498991' on channel 0/13, span 1
[Sep 13 18:53:24] WARNING[24011]: chan_dahdi.c:1654 dahdi_enable_ec: Unable to enable echo cancellation on channel 13 (No such device)
   -- Got SIP response 302 "Moved Temporarily" back from 172.16.222.115
   -- Now forwarding DAHDI/13-1 to 'Local/1020@sip1015' (thanks to SIP/1015-700c4000)
   -- Executing [1020@sip1015:1] Dial("Local/1020@sip1015-6847;2", "SIP/1020") in new stack
 == Using SIP RTP TOS bits 184
 == Using SIP RTP CoS mark 5
   -- Called 1020
   -- SIP/1020-00899fe0 is ringing
   -- Local/1020@sip1015-6847;1 is ringing


And then it's mystery.
I've added more logging to know exactly on what conditions it has happened.

By: Eric Dantie (edantie) 2008-09-13 13:02:22

line of cdr for that call:

"","639XXXXXX","928498990","incoming928498990","639XXXXXX","DAHDI/18-1","SIP/1000-008d5540","Dial","SIP/1000,18","2008-09-12 12:54:00","2008-09-12 12:54:02","2008-09-1212:54:32",32,30,"ANSWERED","DOCUMENTATION","1221224040.372",""

By: Eric Dantie (edantie) 2008-09-15 08:19:52

I'm adding the full message corresponding to the btfull2.txt file


[Sep 15 13:37:15] VERBOSE[2230] logger.c:     -- Accepting call from '928578771' to '928498996' on channel 0/4, span 1
[Sep 15 13:37:15] WARNING[2230] chan_dahdi.c: Unable to enable echo cancellation on channel 4 (No such device)
[Sep 15 13:37:15] VERBOSE[7419] logger.c:     -- Executing [928498996@incoming928498990:1] ^[[1;36;40mGosub^[[0;37;40m("^[[1;35;40mDAHDI/4-1^[[0;37;40m", "^[[1;35;40mtamardialout,s,1(SIP/1008)^[[0;37;40m") in new stack
[Sep 15 13:37:15] VERBOSE[7419] logger.c:     -- Executing [s@tamardialout:1] ^[[1;36;40mSet^[[0;37;40m("^[[1;35;40mDAHDI/4-1^[[0;37;40m", "^[[1;35;40mLOCAL(dev)=SIP/1008^[[0;37;40m") in new stack
[Sep 15 13:37:15] VERBOSE[7419] logger.c:     -- Executing [s@tamardialout:2] ^[[1;36;40mSet^[[0;37;40m("^[[1;35;40mDAHDI/4-1^[[0;37;40m", "^[[1;35;40m~~EXTEN~~=s^[[0;37;40m") in new stack
[Sep 15 13:37:15] VERBOSE[7419] logger.c:     -- Executing [s@tamardialout:3] ^[[1;36;40mGoto^[[0;37;40m("^[[1;35;40mDAHDI/4-1^[[0;37;40m", "^[[1;35;40msw-19-928578771,10^[[0;37;40m") in new stack
[Sep 15 13:37:15] VERBOSE[7419] logger.c:     -- Goto (tamardialout,sw-19-928578771,10)
[Sep 15 13:37:15] VERBOSE[7419] logger.c:     -- Executing [sw-19-928578771@tamardialout:10] ^[[1;36;40mGoto^[[0;37;40m("^[[1;35;40mDAHDI/4-1^[[0;37;40m", "^[[1;35;40ms,4^[[0;37;40m") in new stack
[Sep 15 13:37:15] VERBOSE[7419] logger.c:     -- Goto (tamardialout,s,4)
[Sep 15 13:37:15] VERBOSE[7419] logger.c:     -- Executing [s@tamardialout:4] ^[[1;36;40mNoOp^[[0;37;40m("^[[1;35;40mDAHDI/4-1^[[0;37;40m", "^[[1;35;40mFinish switch-tamardialout-19^[[0;37;40m") in new stack
[Sep 15 13:37:15] VERBOSE[7419] logger.c:     -- Executing [s@tamardialout:5] ^[[1;36;40mDial^[[0;37;40m("^[[1;35;40mDAHDI/4-1^[[0;37;40m", "^[[1;35;40mSIP/1008,60^[[0;37;40m") in new stack
[Sep 15 13:37:15] VERBOSE[7419] logger.c:   == Using SIP RTP TOS bits 184
[Sep 15 13:37:15] VERBOSE[7419] logger.c:   == Using SIP RTP CoS mark 5
[Sep 15 13:37:15] VERBOSE[7419] logger.c:     -- Called 1008
[Sep 15 13:37:15] VERBOSE[2226] logger.c:     -- Got SIP response 302 "Moved Temporarily" back from 172.16.222.108
[Sep 15 13:37:15] VERBOSE[7419] logger.c:     -- Now forwarding DAHDI/4-1 to 'Local/1023@bintersips91' (thanks to SIP/1008-58001590)
[Sep 15 13:37:15] VERBOSE[7420] logger.c:     -- Executing [1023@bintersips91:1] ^[[1;36;40mDial^[[0;37;40m("^[[1;35;40mLocal/1023@bintersips91-b5e4;2^[[0;37;40m", "^[[1;35;40mSIP/1023^[[0;37;40m") in new stack
[Sep 15 13:37:15] VERBOSE[7420] logger.c:   == Using SIP RTP TOS bits 184
[Sep 15 13:37:15] VERBOSE[7420] logger.c:   == Using SIP RTP CoS mark 5
[Sep 15 13:37:15] VERBOSE[7420] logger.c:     -- Called 1023
[Sep 15 13:37:15] VERBOSE[7420] logger.c:     -- SIP/1023-00896620 is ringing
[Sep 15 13:37:15] VERBOSE[7419] logger.c:     -- Local/1023@bintersips91-b5e4;1 is ringing
[Sep 15 13:37:26] VERBOSE[7420] logger.c:     -- SIP/1023-00896620 answered Local/1023@bintersips91-b5e4;2
[Sep 15 13:37:26] VERBOSE[7419] logger.c:     -- Local/1023@bintersips91-b5e4;1 answered DAHDI/4-1
[Sep 15 13:37:27] VERBOSE[7419] logger.c:     -- fixed jitterbuffer created on channel DAHDI/4-1
[Sep 15 13:37:27] VERBOSE[7420] logger.c:     -- Executing [h@bintersips91:1] ^[[1;36;40mNoOp^[[0;37;40m("^[[1;35;40mLocal/1023@bintersips91-b5e4;2^[[0;37;40m", "^[[1;35;40m16^[[0;37;40m") in new stack
[Sep 15 13:37:27] VERBOSE[7420] logger.c:   == Spawn extension (bintersips91, 1023, 1) exited non-zero on 'Local/1023@bintersips91-b5e4;2'


RESET

By: Steve Murphy (murf) 2008-09-16 15:43:10

OK, I've gone thru my scenario list and fairly quickly found one that caused a crash; I traced it from one end to the other and it was pretty interesting.

I looked at from all sorts of angles, and came up with the patch I attached. Please, test quickly, I want to keep down the number of bugs reported against this.

I've personally tested it against polycom430 set to fwd to another exten, blind xfer (via dahdi hookflash), blind xfer via #-key; attended dahdi xfer via 3-way;
with the caller doing the xfer, and the callee doing the xfer; sip blind xfer via the phone button, etc, and all seem to survive.

By: Steve Murphy (murf) 2008-09-16 15:58:04

BTW, the attached patch is for 1.4; I'll come up with the 1.6.x/trunk version here in a little bit

By: Steve Murphy (murf) 2008-09-16 17:44:43

I just uploaded a trunk version

By: Eric Dantie (edantie) 2008-09-17 01:27:20

I'm testing it

By: Eric Dantie (edantie) 2008-09-17 14:46:45

One day with no crash!!! For me it's working. Will let it for more time.

By: Doug (doug) 2008-09-18 01:17:23

gdb-27th-sep.txt is a debug from a crash with using the patch. This is using the ver 1.4



By: Eric Dantie (edantie) 2008-09-18 15:58:46

For me another whole day without crash

By: Steve Murphy (murf) 2008-09-22 10:27:17

DougUDI--

I don't know what to do about your crash; it *appears* to have nothing
to do with this problem, at least none I can make out.

I need more evidence that the patch is somehow implicated. Do you get the crash
w/o the patch?  (or does this patch allow asterisk to live long enough to see this crash instead?)

It might be better to file this crash as a separate bug.

edantie-- This is encouraging. I shall most likely apply the patch to
quiet the storm, and then we can look at "friendly fire", "collateral damage",
etc. reports that may pop up.



By: Eric Dantie (edantie) 2008-09-22 13:44:43

Still no crash for me till 2008-09-17. 24H/24H up. For me it's ok.

By: Steve Murphy (murf) 2008-09-23 11:15:53

OK, I'm going to commit the patch to 1.4, trunk, 1.6.x here in the next few minutes. Hopefully, this solves this problem, and we can handle remaining issues under other bug reports. If we continue to see crashes involving ast_bridge_call and CDR's, please reopen!

By: Digium Subversion (svnbot) 2008-09-23 11:32:12

Repository: asterisk
Revision: 144066

U   branches/1.4/res/res_features.c

------------------------------------------------------------------------
r144066 | murf | 2008-09-23 11:32:04 -0500 (Tue, 23 Sep 2008) | 29 lines

(closes issue ASTERISK-12735)
Reported by: DougUDI
Tested by: murf

(closes issue ASTERISK-12736)
Reported by: seanbright
Tested by: murf

(closes issue ASTERISK-12717)
Reported by: edantie
Tested by: murf, edantie, DougUDI


This crash happens because we are unsafely handling old pointers.
The channel whose cdr is being handled, has been hung up and
destroyed already. I reorganized the code a bit, and tried not
to lose the fork-cdr-chain concepts of the previous code.
I now verify that the 'previous' channel (the channel we
had when the bridge was started), still exists, by looking it up
by name in the channel list. I also do not try to reset the
CDR's of channels involved in bridges.

Testing shows it solves the crash problem, and should not
negatively impact previous fixes involving CDR's generated
during/after blind transfers. (The reason we need to reset
the CDR's on the "beginning" channels in the first place).



------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=144066