[Home]

Summary:ASTERISK-13994: Abort in free() in local_hangup, possibly related to failure to provide ringback indication
Reporter:David Woolley (davidw)Labels:
Date Opened:2009-04-21 11:44:34Date Closed:2009-04-27 08:23:08
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) bt.txt
( 1) crashlogs.txt
( 2) valgrind.txt
Description:We tried running under valgrind, but that didn't produce any obvious indications of problems, and also made the ring back indication problem go away, suggesting that there may be a common race condition that is causing both.

I ran SIP history on the outgoing call and got the expected INVITE/Trying/Ringing sequence.

gdb was only able to see the one thread.  I don't know if that is because it was an abort(), or because memory corruption has prevented it from finding the information about the threads.

****** ADDITIONAL INFORMATION ******

I've categorised this as General, rather than chan_local, as I don't think one can be sure or the true location, and it is more likely that chan_local was simply the first thing to try and de-allocate memory after the problem developed.

We have one local patch, to work around issue ASTERISK-12766 (which isn't properly fixed - could you please re-open).  This simply prevents app_dial from bridging for early media, and has been working for some time, and shouldn't be invoked in this case, anyway, as we only got early media when the call ended on the PSTN.

At the moment, I'm assuming a common cause for the crash and the lack of ringback, although, when not running under valgrind, the latter is 100%.

Although we used valgrind, we haven't tried DEBUG_MALLOC, as this was run on a clean install machine, and I wanted to minimise the chances of subsequently testing a version that wasn't the version released.

There is some possibility that a co-factor for the crash is the use of sip history or sip show channels.

We had no problems in this area with 1.4.21.2.

Another factor that might affect timing is that this was running under VMWare.
Comments:By: David Woolley (davidw) 2009-04-22 05:26:33

Part of the description got lost because I let the original cookie timeout and had to copy to a new submission form.  It should have started:

We have had a couple of crashes whilst trying to get debugging information for a problem where SIP calls to Cisco CCM fail to provide ring back indication, but calls to a SIP phone do.  In both cases the crash is the result of free() issuing an abort() when called to free a local channel private structure.

By: Joshua C. Colp (jcolp) 2009-04-23 12:40:20

Would it be possible to get the console output prior to the crash? It would help me trying to recreate the exact issue here.

By: David Woolley (davidw) 2009-04-24 06:02:32

Whilst I'm preparing the console extract, it might be worth noting that the first crash ended with this message:

chan_local.c line 534 (local_hangup): Error obtaining mutex: Invalid argument

The second crash appears to have been during a stop, and I do seem to remember that I didn't notice it at the time.

By: David Woolley (davidw) 2009-04-24 06:27:11

I've added the preludes to the two crashes, with some names changed.

Unfortunately, the full scenario is quite complex and relates to an unreleased product, so, if you the bigger picture, we may need to use less public channels.  (I think I have found your email address.)

By: Joshua C. Colp (jcolp) 2009-04-24 08:22:28

Unfortunately even labbing up those scenarios doesn't seem to cause my chan_local to crash even under high load. While I investigate further it would be useful to know whether this happens in 1.6.0.9 as well, there have been some changes in chan_local specific to the hangup handling that may change things.

By: Joshua C. Colp (jcolp) 2009-04-27 08:23:08

After looking at this further I believe my original change that I put in also solved this issue. I let tests run for a day and previously I could see in valgrind the issue happening, but now can not after I fixed the issue.