[Home]

Summary:ASTERISK-11281: Deadlock in chan_zap between zt_request and do_monitor
Reporter:Michael FIG (michael-fig)Labels:
Date Opened:2008-01-22 13:17:45.000-0600Date Closed:2008-02-18 10:37:12.000-0600
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Channels/chan_zap
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 11818.patch
( 1) asterisk-1.4.16.2-deadlock.log
( 2) asterisk-1.4.17-deadlock.log
( 3) asterisk-1.4.17-deadlock2.log
( 4) asterisk-1.4.17-deadlock3.log
Description:We have Asterisk installed using SIP internally and a Sangoma T1 PRI card (AFT101) to the outside world.  Every once in a while, Asterisk would hang, and when I investigated further, the deadlock detection code found a deadlock in chan_zap.

I've attached the complete log (from the first Deadlock message to the point when the connection hung) for you to look at.  In brief, there were a lot of messages of the form:

[Jan 21 14:36:10] ERROR[4211]: /usr/src/asterisk-1.4.16.2/include/asterisk/lock.h:338 __ast_pthread_mutex_lock: chan_zap.c line 2722 (zt_hangup): Deadlock? waited 10 sec for mutex '&iflock'?
[Jan 21 14:36:10] ERROR[4211]: /usr/src/asterisk-1.4.16.2/include/asterisk/lock.h:342 __ast_pthread_mutex_lock: chan_zap.c line 6804 (do_monitor): '&iflock' was locked here.


****** ADDITIONAL INFORMATION ******

I have just installed 1.4.17 with deadlock detection enabled, so I will add new notes if I see it happen again with that version.
Comments:By: Russell Bryant (russell) 2008-01-23 11:23:13.000-0600

Instead of building with DETECT_DEADLOCKS, turn that off and build with DEBUG_THREADS turned on.  Then, if it locks up, grab the output of the "core show locks" CLI command.

By: Michael FIG (michael-fig) 2008-01-24 13:49:40.000-0600

Okay, I've uploaded the output of "core show locks".  I was also getting many lines like:

XXX ERROR XXX A thread holds more locks than '32'.  Increase AST_MAX_LOCKS!

Thanks,
Michael.

By: Michael FIG (michael-fig) 2008-01-25 16:50:41.000-0600

More details: the deadlock appears to start in chan_zap (all those ringing Dial applications in deadlock3.log are definitely not really active).  Internal SIP calls still work, but making an outbound T1 call just hangs.

At the end of the log, I demonstrate trying to show the channels again, but then the console refuses any further input (though still displays the debugging and verbose output).

Is there anything else I can do to help resolve this issue?

By: Mark Michelson (mmichelson) 2008-02-05 13:29:41.000-0600

I gave this a look, and what's odd is that the section of code where that lock is locked does not have any blocking calls. The only thing within that code section that I believe could cause problems would be if the iflist's pointers became incorrect and caused an infinite loop.

Having said that, I found a place in the code which seems to improperly handle a pointer and which could lead to the infinite loop I suspect is happening. I'm going to upload a patch. Please give it a try and see if this prevents the deadlock you are experiencing. Thanks.

By: Mark Michelson (mmichelson) 2008-02-05 13:36:15.000-0600

I have uploaded 11818.patch. Please report if this fixes the problem. Thanks!

By: Michael FIG (michael-fig) 2008-02-05 14:23:44.000-0600

Thanks for the patch, putnopvut!  It could be the problem (makes sense from my own analysis and a colleague's too).

I suggest that an administrator close this issue, and I will reopen only if I see the problem happen again.

By: Mark Michelson (mmichelson) 2008-02-05 16:12:38.000-0600

I'm actually going to just leave this open for a while longer. I'd rather not merge an untested patch into the source until I've heard some sort of positive feedback on it (or at least no negative feedback :) ). I'll leave this issue open for a couple weeks longer and if I don't hear of any problems, I'll merge it into 1.4 and trunk.

By: jmls (jmls) 2008-02-17 13:05:06.000-0600

it's been 15 days now ... you should merge ;)

By: Digium Subversion (svnbot) 2008-02-18 10:34:09.000-0600

Repository: asterisk
Revision: 103770

U   branches/1.4/channels/chan_zap.c

------------------------------------------------------------------------
r103770 | mmichelson | 2008-02-18 10:34:06 -0600 (Mon, 18 Feb 2008) | 10 lines

Fix a linked list corruption that under the right circumstances
could lead to a looped list, meaning it will traverse forever.

(closes issue ASTERISK-11281)
Reported by: michael-fig
Patches:
     11818.patch uploaded by putnopvut (license 60)
 Tested by: michael-fig


------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=103770

By: Digium Subversion (svnbot) 2008-02-18 10:37:12.000-0600

Repository: asterisk
Revision: 103771

_U  trunk/
U   trunk/channels/chan_zap.c

------------------------------------------------------------------------
r103771 | mmichelson | 2008-02-18 10:37:11 -0600 (Mon, 18 Feb 2008) | 18 lines

Merged revisions 103770 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r103770 | mmichelson | 2008-02-18 10:37:31 -0600 (Mon, 18 Feb 2008) | 10 lines

Fix a linked list corruption that under the right circumstances
could lead to a looped list, meaning it will traverse forever.

(closes issue ASTERISK-11281)
Reported by: michael-fig
Patches:
     11818.patch uploaded by putnopvut (license 60)
 Tested by: michael-fig


........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=103771