Summary: | ASTERISK-11281: Deadlock in chan_zap between zt_request and do_monitor | ||
Reporter: | Michael FIG (michael-fig) | Labels: | |
Date Opened: | 2008-01-22 13:17:45.000-0600 | Date Closed: | 2008-02-18 10:37:12.000-0600 |
Priority: | Major | Regression? | No |
Status: | Closed/Complete | Components: | Channels/chan_zap |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) 11818.patch ( 1) asterisk-1.4.16.2-deadlock.log ( 2) asterisk-1.4.17-deadlock.log ( 3) asterisk-1.4.17-deadlock2.log ( 4) asterisk-1.4.17-deadlock3.log | |
Description: | We have Asterisk installed using SIP internally and a Sangoma T1 PRI card (AFT101) to the outside world. Every once in a while, Asterisk would hang, and when I investigated further, the deadlock detection code found a deadlock in chan_zap. I've attached the complete log (from the first Deadlock message to the point when the connection hung) for you to look at. In brief, there were a lot of messages of the form: [Jan 21 14:36:10] ERROR[4211]: /usr/src/asterisk-1.4.16.2/include/asterisk/lock.h:338 __ast_pthread_mutex_lock: chan_zap.c line 2722 (zt_hangup): Deadlock? waited 10 sec for mutex '&iflock'? [Jan 21 14:36:10] ERROR[4211]: /usr/src/asterisk-1.4.16.2/include/asterisk/lock.h:342 __ast_pthread_mutex_lock: chan_zap.c line 6804 (do_monitor): '&iflock' was locked here. ****** ADDITIONAL INFORMATION ****** I have just installed 1.4.17 with deadlock detection enabled, so I will add new notes if I see it happen again with that version. | ||
Comments: | By: Russell Bryant (russell) 2008-01-23 11:23:13.000-0600 Instead of building with DETECT_DEADLOCKS, turn that off and build with DEBUG_THREADS turned on. Then, if it locks up, grab the output of the "core show locks" CLI command. By: Michael FIG (michael-fig) 2008-01-24 13:49:40.000-0600 Okay, I've uploaded the output of "core show locks". I was also getting many lines like: XXX ERROR XXX A thread holds more locks than '32'. Increase AST_MAX_LOCKS! Thanks, Michael. By: Michael FIG (michael-fig) 2008-01-25 16:50:41.000-0600 More details: the deadlock appears to start in chan_zap (all those ringing Dial applications in deadlock3.log are definitely not really active). Internal SIP calls still work, but making an outbound T1 call just hangs. At the end of the log, I demonstrate trying to show the channels again, but then the console refuses any further input (though still displays the debugging and verbose output). Is there anything else I can do to help resolve this issue? By: Mark Michelson (mmichelson) 2008-02-05 13:29:41.000-0600 I gave this a look, and what's odd is that the section of code where that lock is locked does not have any blocking calls. The only thing within that code section that I believe could cause problems would be if the iflist's pointers became incorrect and caused an infinite loop. Having said that, I found a place in the code which seems to improperly handle a pointer and which could lead to the infinite loop I suspect is happening. I'm going to upload a patch. Please give it a try and see if this prevents the deadlock you are experiencing. Thanks. By: Mark Michelson (mmichelson) 2008-02-05 13:36:15.000-0600 I have uploaded 11818.patch. Please report if this fixes the problem. Thanks! By: Michael FIG (michael-fig) 2008-02-05 14:23:44.000-0600 Thanks for the patch, putnopvut! It could be the problem (makes sense from my own analysis and a colleague's too). I suggest that an administrator close this issue, and I will reopen only if I see the problem happen again. By: Mark Michelson (mmichelson) 2008-02-05 16:12:38.000-0600 I'm actually going to just leave this open for a while longer. I'd rather not merge an untested patch into the source until I've heard some sort of positive feedback on it (or at least no negative feedback :) ). I'll leave this issue open for a couple weeks longer and if I don't hear of any problems, I'll merge it into 1.4 and trunk. By: jmls (jmls) 2008-02-17 13:05:06.000-0600 it's been 15 days now ... you should merge ;) By: Digium Subversion (svnbot) 2008-02-18 10:34:09.000-0600 Repository: asterisk Revision: 103770 U branches/1.4/channels/chan_zap.c ------------------------------------------------------------------------ r103770 | mmichelson | 2008-02-18 10:34:06 -0600 (Mon, 18 Feb 2008) | 10 lines Fix a linked list corruption that under the right circumstances could lead to a looped list, meaning it will traverse forever. (closes issue ASTERISK-11281) Reported by: michael-fig Patches: 11818.patch uploaded by putnopvut (license 60) Tested by: michael-fig ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=103770 By: Digium Subversion (svnbot) 2008-02-18 10:37:12.000-0600 Repository: asterisk Revision: 103771 _U trunk/ U trunk/channels/chan_zap.c ------------------------------------------------------------------------ r103771 | mmichelson | 2008-02-18 10:37:11 -0600 (Mon, 18 Feb 2008) | 18 lines Merged revisions 103770 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r103770 | mmichelson | 2008-02-18 10:37:31 -0600 (Mon, 18 Feb 2008) | 10 lines Fix a linked list corruption that under the right circumstances could lead to a looped list, meaning it will traverse forever. (closes issue ASTERISK-11281) Reported by: michael-fig Patches: 11818.patch uploaded by putnopvut (license 60) Tested by: michael-fig ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=103771 |