Summary:ASTERISK-14803: chan_sip deadlock in mutex sip_alloc
Reporter:Alan Graham (zerohalo)Labels:
Date Opened:2009-09-09 08:14:31Date Closed:2011-06-07 14:00:31
Versions:Frequency of
Environment:Attachments:( 0) bt_scrubbed.txt
( 1) coreshowlocks.txt
Description:output of core show locks and backtrace from a forced crash attached.
Comments:By: David Vossel (dvossel) 2009-10-02 17:53:51

I don't see a dead lock here. One thread holds two locks and isn't waiting on any other locks... And the other two threads are waiting on one of the locks the first one has, but I don't see anything obvious that is preventing that thread from giving up the lock.

Also, how is this a "forced crash". Are there steps we can use to reproduce this behavior?

By: Alan Graham (zerohalo) 2009-10-07 08:58:36

Not sure how to reproduce this behavior - I've had two like this thus far with the same core show locks output. chan_sip is blocked and calls will not progress - I crashed the machine to get the bt (using the ast_grab_core script in contrib).

By: David Vossel (dvossel) 2009-10-07 12:12:29

Umm, this is puzzling.  I don't see anything in the find_call code that would cause the iflock to be held indefinitely.  How long do you wait before you crash it?  Do you have pedantic checking on? What kind of load is on the machine when this happens, can you provide "sip show channels" output, or at least provide the number of dialogs that are present when this happens?

By: David Vossel (dvossel) 2009-10-07 15:06:07

I'm marking this as related to issue ASTERISK-14332.  It appears that it may be possible for the dialog list to loop forever which would explain why find_call would never give up the dialog list lock.

By: Alan Graham (zerohalo) 2009-10-07 16:52:00

dvossel- this machine processes ~ 100-150 concurrent calls under normal-high loads. I've waited for up to 10 minutes to crash this without the lock freeing. I'll forward the other requested information as soon as this happens again.

By: Leif Madsen (lmadsen) 2009-10-26 10:23:23

Just pinging this issue to see if the information was obtained. Thanks!

By: Leif Madsen (lmadsen) 2009-11-17 07:34:29.000-0600

This issue is now being closed due to lack of feedback. If you're able to supply the requested information, then please feel free to reopen the issue. Thanks!