[Home]

Summary:ASTERISK-09383: Race condition leading to crash in chan_iax2
Reporter:mihai (mihai)Labels:
Date Opened:2007-05-04 11:05:09Date Closed:2007-07-11 19:59:10
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Core/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 06asterisk-1.4-chan_iax-crash.patch
Description:We have experienced a series of random crashes on our production systems, especially when operating under load or under poor network conditions. By investigating the core dumps, we found that all crashes were caused by a segfault in chan_iax2.c, in function __attempt_transmit(), line 1845 (official asterisk-1.4.4 tarball). The relevant code looks like this:

/* Hangup the fd */
fr.frametype = AST_FRAME_CONTROL;
fr.subclass = AST_CONTROL_HANGUP;
iax2_queue_frame(callno, &fr);
/* Remember, owner could disappear */
if (iaxs[callno]->owner)
iaxs[callno]->owner->hangupcause = AST_CAUSE_DESTINATION_OUT_OF_ORDER;

This code is supposed to be executed with the call mutex locked (iaxsl[callno]).  However, you will notice that two lines before the if, there's a call to iax2_queue_frame().  This function will release the lock for a short period of time in an attempt to prevent a deadlock.  If another thread grabs the lock, it can call iax2_destroy, thus NULLing the entry in the iaxs array.

There are several other areas in the code where iax2_queue_frame() is called which are also potential crash spots - however, for some reason, all our crashes happened in only one place, as described above.

We have a patch that attempts to fix this hole as well as several others.  I am not sure that it is the correct way of fixing the problem since it addresses the effects and not the cause.  Will post it after we test it a little bit.

****** ADDITIONAL INFORMATION ******

This problem is still present in trunk, as of revision 63045.
Comments:By: mihai (mihai) 2007-05-14 15:17:37

Added a patch that fixes our crashes. The patch was diffed against 1.4.2 but still applies to 1.4.4.  It seems to apply to trunk as well, but I have not done any functional tests whatsoever with that.

I am pretty sure that there are other crash spots in chan_iax2.c but it seems that for now this patch solves our problem

Could somebody please update the title to [patch]...?

By: Steve Davies . (stevedavies) 2007-05-19 01:47:34

hi mihai,

I applied to trunk & running on a production box.  I'll report back.

Steve

By: Russell Bryant (russell) 2007-06-04 18:34:05

I have addressed the code you fixed in your patch, plus some more places, in both 1.4 and trunk in revisions 67158 and 67160.  Thanks!