[Home]

Summary:ASTERISK-16029: 1.4.31rc1 Deadlock: Tried and failed to get Lock #1 (chan_dahdi.c): MUTEX 1063 pri_grab &pri->lock 0xe09684
Reporter:David Brillert (aragon)Labels:
Date Opened:2010-04-29 08:53:31Date Closed:2010-06-18 13:36:56
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Channels/chan_dahdi
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) deadlock.txt
Description:*1.4.31rc1 SVN (sorry, not sure which SVN version) deadlocks.
This has happened only once under moderate call volume but I am reporting this issue regardless.
CentOS5.4 Xeon 2.4GHz
1GB RAM

deadlock.txt info attached includes core show locks
gdb thread apply all bt
Not optimized
Comments:By: Paul Belanger (pabelanger) 2010-04-29 09:07:33

Looks good.  It would also be good to include 'bt' and 'bt full' with your backtrace.

By: Leif Madsen (lmadsen) 2010-04-29 16:28:46

Please check the latest 1.4.31-rc2 to see if that resolves your issue. There is a bad deadlock being fixed with Local channels in that version.

By: Leif Madsen (lmadsen) 2010-04-30 10:56:54

Note 1.4.31-rc2 was released today.

By: David Brillert (aragon) 2010-05-05 09:34:20

This issue looks possibly similar to ASTERISK-15987
Are they related?

By: Leif Madsen (lmadsen) 2010-05-10 10:59:16

Hmmm maybe? That issue was only resolved in trunk though...

By: Leif Madsen (lmadsen) 2010-05-10 10:59:40

Any chance this issue is related to an issue you've already closed? (ASTERISK-15987)

By: Jeff Peeler (jpeeler) 2010-05-10 11:07:52

The fix in 17216 was for trunk only.

By: David Brillert (aragon) 2010-05-10 13:13:49

Leif:
I'll hold off testing again until Asterisk 1.4.32 is released.
I haven't had any occurrences since last deadlock on 1.4.31 SVN I tested.

jpeeler: thanks for update

By: David Brillert (aragon) 2010-05-13 11:42:20

I cannot update to any new 1.4.31 release or 1.4.32rc or subversion due to this regression breaking all of my ACD CDR reports.
ASTERISK-16094

By: Jeff Peeler (jpeeler) 2010-05-25 12:51:41

Looking at the debug output, I can't see why the system would have deadlocked. All the functions that could deadlock are using pri_grab which uses deadlock avoidance. The d channel thread should have been able to continue on its way after the other threads let go of the dahdi_pvt lock. Do you know what manager commands were being executed? Console output would have been nice to have.

By: David Brillert (aragon) 2010-05-25 12:58:16

ASTERISK-16094 is now fixed so it no longer prevents me from updating Asterisk.
We do execute quite a few manager commands in real time but its impossible for me to know which commands were executed at the time of the lock.  If it happens again I will get console output as well and try to sniff for the manager commands.

By: Leif Madsen (lmadsen) 2010-05-27 10:43:27

@jpeeler: could issue ASTERISK-16156 be part of the problem here? The deadlock avoidance code appears to be capable of creating deadlocks itself.

By: David Brillert (aragon) 2010-05-28 08:57:38

Could also be related to ASTERISK-16162 which I believe is a child of ASTERISK-16156
I don't know how to reproduce this deadlock so there is no sense in me testing the patch pdf posted on ASTERISK-16156 (ready for testing) on a production site.

By: Jeff Peeler (jpeeler) 2010-06-04 13:10:36

aragon, were you using overlap dialing, specifically incoming or both? If so 17414 probably is the same problem. If not, adding 17407 with warning messages will probably help locate the problem.

By: David Brillert (aragon) 2010-06-04 13:18:59

jpeeler:  I don't think so.
I assume you mean on the PRI since the lock occurred there.
I do not use ! anywhere at the end of my dial patterns.
Also allowoverlap does not exist in my sip.conf file

By: Jeff Peeler (jpeeler) 2010-06-04 13:25:31

I meant the overlapdial option in chan_dahdi.conf. If it's unset then the answer is no.

By: David Brillert (aragon) 2010-06-04 13:27:09

jpeeler: Also I don't know if I should use patch from ASTERISK-16156
This deadlock only occurred once and I have no idea how to reproduce.  I'd rather wait for ASTERISK-16156 to be reviewed or committed since they are debating the merits of that patch on that report.  I'm not sure what davidw is trying to say over there but I'll put in my 2 cents though.  When it comes to telephony it is always best not to drop calls or stop the engine.  In my case the system could not process any calls while the PRI was locked.  And it stayed locked until I restarted Asterisk.

By: Jeff Peeler (jpeeler) 2010-06-04 13:30:31

Yeah I wasn't recommending you use 17407 yet. So was that a no for the option being enabled in chan_dahdi? I ask again only because I don't know what else can be done here otherwise at this point.

By: David Brillert (aragon) 2010-06-04 13:32:04

[zaptel.conf]
overlapdial=  no

By: Jeff Peeler (jpeeler) 2010-06-18 13:33:11

Aragon please reopen this issue if you still encounter problems with a release containing the commit below.

By: Digium Subversion (svnbot) 2010-06-18 13:33:17

Repository: asterisk
Revision: 271335

U   branches/1.4/channels/chan_dahdi.c

------------------------------------------------------------------------
r271335 | jpeeler | 2010-06-18 13:33:17 -0500 (Fri, 18 Jun 2010) | 13 lines

Eliminate deadlock potential in dahdi_fixup().

(This is a backport of 269307, committed to trunk by rmudgett.)

Calling dahdi_indicate() when the channel private lock is already
held can cause a deadlock if the PRI lock is needed because
dahdi_indicate() will also get the channel private lock.  The pri_grab()
function assumes that the channel private lock is held once to avoid
deadlock.

(closes issue ASTERISK-16029)
Reported by: aragon

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=271335

By: Digium Subversion (svnbot) 2010-06-18 13:36:55

Repository: asterisk
Revision: 271336

_U  trunk/

------------------------------------------------------------------------
r271336 | jpeeler | 2010-06-18 13:36:55 -0500 (Fri, 18 Jun 2010) | 20 lines

Recorded merge of revisions 271335 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
 r271335 | jpeeler | 2010-06-18 13:33:17 -0500 (Fri, 18 Jun 2010) | 13 lines
 
 Eliminate deadlock potential in dahdi_fixup().
 
 (This is a backport of 269307, committed to trunk by rmudgett.)
 
 Calling dahdi_indicate() when the channel private lock is already
 held can cause a deadlock if the PRI lock is needed because
 dahdi_indicate() will also get the channel private lock.  The pri_grab()
 function assumes that the channel private lock is held once to avoid
 deadlock.
 
 (closes issue ASTERISK-16029)
 Reported by: aragon
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=271336