Summary:ASTERISK-15462: Crash In chan_local in local_queue_frame (ast_mutex_trylock)
Reporter:Geoff Mina (geoff2010)Labels:
Date Opened:2010-01-18 16:07:35.000-0600Date Closed:2011-06-07 14:00:41
Versions:Frequency of
Environment:Attachments:( 0) bt.txt
( 1) bt-full.txt
Description:I had a server randomly crash on me today.  I am currently running 1.4.26, but have scanned all the release notes up until 1.4.29 and found nothing that would indicate a change to chan_local was made to correct this issue.  Unfortunately this is a very high profile platform and I can't just upgrade without knowing for certain the bug has been corrected.

I also searched the bug list and the most similar ticket I found was for the 1.6 branch.  I have attached the bt and bt full.  Please let me know if there is anything else I can provide.


Comments:By: Geoff Mina (geoff2010) 2010-01-18 20:37:10.000-0600

The following code is the source of the crash in chan_local.  It appears that 'other' is most likely an invalid pointer at this point... but I am not sure what else could be done to prevent this particular crash.  

       /* Recalculate outbound channel */
      other = isoutbound ? p->owner : p->chan;

       if (!other) {
               return 0;

       /* do not queue frame if generator is on both local channels */
       if (us && us->generator && other->generator) {
               return 0;

       /* Set glare detection */
       ast_set_flag(p, LOCAL_GLARE_DETECT);

       /* Ensure that we have both channels locked */
       while (other && ast_channel_trylock(other)) {
               if (us && us_locked) {
                       do {
                       } while (ast_mutex_trylock(&p->lock));
               } else {
               other = isoutbound ? p->owner : p->chan;

By: Geoff Mina (geoff2010) 2010-01-19 07:30:31.000-0600

Issue 12012 is about a year old, but appears to be a similar issue... or at least the segfault happened at the code which was added by Russel in the patch to fix the last problem.

By: Leif Madsen (lmadsen) 2010-01-19 07:48:08.000-0600

Thanks for the triage in this bug report! That should be useful to a developer for sure.

By: Russell Bryant (russell) 2010-03-02 09:39:54.000-0600

There have been quite a few changes since 1.4.26 at this point (339 changes).  Can you try with the latest code in the 1.4 branch to see if you are still having a problem?

By: Geoff Mina (geoff2010) 2010-03-02 18:51:26.000-0600

I am planning on upgrading soon.  Unfortunately, we won't know for a very long time.  I run about 12 million calls through my network in a month... and this problem has only occurred a single time in over a year.

It's obviously not a common scenario which caused this.


By: Leif Madsen (lmadsen) 2010-03-17 11:02:39

Due to the nature of this bug, I'm going to close this for now. Since it is an issue that is VERY uncommon, then I think leaving this open isn't really necessary. If the reporter continues to have this issue with future versions of Asterisk, then please reopen. Thanks!