Summary:ASTERISK-08189: Potential deadlock in zt_hangup() (with driver interaction...)
Reporter:Guillaume Knispel (gknispel_proformatique)Labels:
Date Opened:2006-11-26 13:37:25.000-0600Date Closed:2007-05-03 09:59:33
Versions:Frequency of
Environment:Attachments:( 0) diff
( 1) diff-2394
Description:I'm not 100% sure about this one but even if their no problem the code should clearly document why a deadlock is not possible and what is done to prevent it for the scenario I'll describe or others close to this one. For my example I'll consider interaction with wcte11xp.c but things also happens the same way with some other drivers. Description follow:

In zaptel.c zt_hangup() is always called while holding the zt_chan lock.
zt_hangup() can call zt_cas_setbits() or zt_rbs_sethook() which will call the rbsbits() callback of the driver.
Now in wcte11xp.c t1xxp_rbsbits() will lock &wc->lock before doing anything useful.
So in this code flow we have the following locking order : the zt_chan lock first, then &wc->lock.

Now let's see what is the flow when an interrupt occur :
t1xxp_interrupt() locks &wc->lock then calls __t1_do_counters and (sometimes) __t1_check_sigbits(), and __t1_check_alarms().
__t1_do_counters() calls zt_alarm_notify() on some timer triggering ; __t1_check_sigbits() calls zt_rbsbits() when appropriate and not surprisingly __t1_check_alarms() calls zt_alarm_notify() if it has to.
So in this second code flow the locking order is now &wc->lock first, then the zt_chan lock.

Of course this means deadlock can happen on SMP boxes and also with more exotic things like PREEMPT_RT.
spin_lock_irqsave are mostly used and I think it mitigates the problem with non-SMP classic kernels.
Comments:By: jshugart (jshugart) 2007-01-30 11:42:25.000-0600

I suspect I have been experiencing this error for some time.  Most recently we have been using the Zaptel drivers version, but also impacted a pre 1.0 version from CVS.  We use TE410P cards on SMP machines.  We hit the issue usually when there are multiple simultaneous disconnects.  The effect is complete lockup of the server.  Sometimes we can ping the server.  There are never any logs generated, even on the console.  We need to power cycle the server to get everything back.  I have tried to reproduce the issue, but cannot do it.  I will make another attempt to reproduce.  I doubt I will have much luck fixing the code, but maybe I can help give a test bed for changes.

By: Matthew Fredrickson (mattf) 2007-01-30 14:13:57.000-0600

What signalling type for your lines are you using that this occurs on?

By: jshugart (jshugart) 2007-01-30 14:32:04.000-0600



relaxdtmf = yes
usecallerid = no
immediate = yes
group = 1
signalling = fxs_gs
channel => 1-96

By: matti (matti) 2007-04-17 07:00:54

A similar bug exists also in wct4xxp/base.c in zaptel 1.2.16.
When I did rmmod wct4xxp on Linux 2.6.20 ... SMP, the kernel sometimes locked up.

By: matti (matti) 2007-04-18 02:19:18

I attached a patch to wct4xxp.c of Zaptel 1.0.10. The patch seemed to fix the bug according to my quick tests.

By: matti (matti) 2007-04-21 05:13:08

I attached a patch to wct4xxp/base.c SVN Revision 2394.
I have sent a disclaimer.

By: Matthew Fredrickson (mattf) 2007-04-21 13:20:11

Thanks.  Right now, I'm reviewing the code and trying to decide the best place to fix this.  There are multiple places that this can happen, not just at that particular point.  If we decide to fix this in the device driver, then your patch will probably be part of the fix.

By: Matthew Fredrickson (mattf) 2007-04-28 16:24:17

Ok, I just committed the fix for the 1.2 wct4xxp driver into 1.2 svn.  Can you verify that the fix works for you?

By: Matthew Fredrickson (mattf) 2007-04-28 16:34:19

And just committed 1.4 if you'd like to try that as well.

By: matti (matti) 2007-05-03 01:33:40

I did not get the deadlock with the SVN trunk 2484.

By: Matthew Fredrickson (mattf) 2007-05-03 09:59:22

Fixed in 1.2, 1.4, and trunk.