[Home]

Summary:ASTERISK-10632: deadlock in iax
Reporter:callguy (callguy)Labels:
Date Opened:2007-10-28 13:07:01Date Closed:2007-12-17 16:26:48.000-0600
Priority:BlockerRegression?No
Status:Closed/CompleteComponents:Channels/chan_iax2
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) chan_iax2.c-unlock_patch.diff.txt
( 1) coreshowlocks_iaxdealock-102807.txt
( 2) coreshowlocks-11072007.txt
( 3) lock.h
Description:We are seeing the iax channel deadlock and cause asterisk to stop processing calls intermittently. output of core show locks attached.
Comments:By: Russell Bryant (russell) 2007-10-29 11:31:24

Just curious, does this system have any patches?

By: callguy (callguy) 2007-10-29 11:39:03

Yes - this is 1.4.13, but with the added patch from bug 10571 (the DEBUG_THREADS fixes). The only resultant change from that is in lock.h.

By: callguy (callguy) 2007-10-29 11:50:18

i uploaded a copy of lock.h from our system so you can be looking at the same thing.

Edit: One additional note. I reported the same bug under 1.4.11, (bug 10830) which you had thought was resolved as part of some other fixes. The original report pre-dated the DEBUG_THREADS patches so I don't believe this issue is related to those.



By: Digium Subversion (svnbot) 2007-10-29 15:20:29

Repository: asterisk
Revision: 87396

U   branches/1.4/include/asterisk/lock.h
U   branches/1.4/main/utils.c

------------------------------------------------------------------------
r87396 | russell | 2007-10-29 15:20:28 -0500 (Mon, 29 Oct 2007) | 5 lines

Add some more details to the output of "core show locks".  When a thread
is waiting for a lock, this will now show the details about who currently
has it locked.
(inspired by issue ASTERISK-10632)

------------------------------------------------------------------------

By: Digium Subversion (svnbot) 2007-10-29 15:23:11

Repository: asterisk
Revision: 87397

_U  trunk/
U   trunk/include/asterisk/lock.h
U   trunk/main/utils.c

------------------------------------------------------------------------
r87397 | russell | 2007-10-29 15:23:11 -0500 (Mon, 29 Oct 2007) | 13 lines

Merged revisions 87396 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r87396 | russell | 2007-10-29 15:22:07 -0500 (Mon, 29 Oct 2007) | 5 lines

Add some more details to the output of "core show locks".  When a thread
is waiting for a lock, this will now show the details about who currently
has it locked.
(inspired by issue ASTERISK-10632)

........

------------------------------------------------------------------------

By: Russell Bryant (russell) 2007-10-29 15:30:49

I did some digging through chan_iax2 but couldn't find anything that would cause this issue.  However, I did update the "core show locks" output to provide even more details that will hopefully provide a hint as to where the problem is.  If you want to just apply the change to 1.4.13, you can get the patch by running:

svn diff -r 87395:87396 http://svn.digium.com/svn/asterisk/branches/1.4 > issue_11100.patch.txt

By: Russell Bryant (russell) 2007-10-30 08:07:12

Well, please just try the latest 1.4 branch so you'll have my last change, because I'd like you to do it for another issue you have reported, as well.

$ svn co http://svn.digium.com/svn/asterisk/branches/1.4 asterisk-1.4

By: Russell Bryant (russell) 2007-11-06 12:03:48.000-0600

I wonder if the lock initialization issues fixed this bug, as well.  There isn't a good way to know other than just waiting to see if it happens again.  I'm going to hope it won't!  But, please let me know if it does and I'll track it down.  :)

Thanks!

By: callguy (callguy) 2007-11-07 16:27:18.000-0600

russell: we just had this one happen again. I've uploaded the latest core show locks.

By: callguy (callguy) 2007-11-13 14:51:27.000-0600

russell: this just happened again, though this time we weren't able to get a console to execute core show locks (bringing up the console hung before getting to the cli prompt). From the console output it looked exactly the same as my other reports here.

By: callguy (callguy) 2007-11-14 23:03:53.000-0600

russell: we came up with a theory on this one that was inspired by what we found in bug 11080. We were wondering if the looping of the message we see when this condition occurs (example from previous bug) is actually just a symptom of the deadlock, and actually irrelevant relative to the actual problem:

[Sep 26 18:56:18] NOTICE[10762]: chan_iax2.c:6521 socket_read: Out of idle IAX2 threads for I/O, pausing!

So seeing the above from within socket_read() was a symptom of deadlock, caused because we were unable to obtain a new lock, as all available ones were consumed.

we noticed these functions:
registry_authrequest
register_verify
update_registry  

get a lock, but were not unlocking on error condistions, and just jumping to return_unref. so in the diff we added a ast_mutex_unlock here to make sure all cases have unlocked. This was the only obvious thing we could identify.

By: Russell Bryant (russell) 2007-12-05 13:36:07.000-0600

You are correct that the message is just a side effect of the problem.

Also, the functions that you have identified are doing the correct thing.  They get called with the mutex locked and are supposed to return with it locked.  They just have to unlock it temporarily in the middle.

By: Digium Subversion (svnbot) 2007-12-17 16:25:07.000-0600

Repository: asterisk
Revision: 93377

U   branches/1.4/main/utils.c

------------------------------------------------------------------------
r93377 | file | 2007-12-17 16:24:59 -0600 (Mon, 17 Dec 2007) | 7 lines

Do not try to access information about a lock when printing out a trylock attempt. It is possible for the lock that it references to no longer be valid. This would have caused segfaults or deadlocks.
(issue #BE-263)
(closes issue ASTERISK-10613)
Reported by: callguy
(closes issue ASTERISK-10632)
Reported by: callguy

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=93377

By: Digium Subversion (svnbot) 2007-12-17 16:26:48.000-0600

Repository: asterisk
Revision: 93378

_U  trunk/
U   trunk/main/utils.c

------------------------------------------------------------------------
r93378 | file | 2007-12-17 16:26:47 -0600 (Mon, 17 Dec 2007) | 15 lines

Merged revisions 93377 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r93377 | file | 2007-12-17 18:28:09 -0400 (Mon, 17 Dec 2007) | 7 lines

Do not try to access information about a lock when printing out a trylock attempt. It is possible for the lock that it references to no longer be valid. This would have caused segfaults or deadlocks.
(issue #BE-263)
(closes issue ASTERISK-10613)
Reported by: callguy
(closes issue ASTERISK-10632)
Reported by: callguy

........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=93378