Summary:ASTERISK-14568: [patch] Channel not locked when it should in local_attended_transfer
Reporter:Laurent Steffan (lmsteffan)Labels:
Date Opened:2009-07-30 18:58:10Date Closed:2009-10-08 15:11:36
Versions:Frequency of
Environment:Attachments:( 0) deadlock_local_attended_transfers_trunk.diff
Description:Right after performing the transfer, the target channel (target.chan1) is unlocked. It is not locked again before performing the masquerade (although aa comment says so).

Performing a relock of this channel leads to a deadlock with &current->chan1. Trying to solve that deadlock led me to a bunch of astobj2 errors.


In order to see this bug manifest itself, it may be necessary to compile in non-dev mode.
Comments:By: Laurent Steffan (lmsteffan) 2009-07-30 22:43:30

By the way, this has been discussed with putnovput on IRC channel #asterisk-dev

By: Mark Michelson (mmichelson) 2009-07-31 10:20:03

This function is a total mess.

When local_attended_transfer is entered, we hold the lock on current->chan1 and its tech_pvt. Then, in sip_get_pvt_byid_locked, we lock targetcall_pvt and targetcall_pvt->owner, which is the same as target.chan1. A bit later, we unlock targetcall_pvt->owner (thankfully).

So at the point where the transfer has succeeded, we have current->chan1 and target.chan1 locked. ast_cel_report will attempt to lock target.chan2. ast_do_masquerade expects target.chan1 to be locked and will lock whichever channel is in target.chan1->masq, which may be either current->chan2 or target.chan2 depending on whether the calls are bridged.

So, in summary, the following locks will be locked by this thread during the transfer:

current->chan2 (maybe, if target.chan2 is NULL)
target.chan2 (maybe, if it is non-NULL)

I think I said this in #asterisk-dev, too. This is going to be *FUN*.

By: Laurent Steffan (lmsteffan) 2009-08-02 20:20:55

Speaking of ast_cel_report, you may want to note that I had filed a report (bug ASTERISK-14475) concerning a deadlock that appears between the channel handling during a transfer (ast_do_masquerade) and the ast_cel_report_event function. This deadlock occurs (randomly but frequently) even when not using the CEL logging functions. My workaround (not submitted as it probably does not solve the main problem) is to test whether CEL is enabled right at the start of the ast_cel_report_event function.

By: David Vossel (dvossel) 2009-09-29 14:38:59

The patch I just uploaded should resolve this.  Let me know if it fixes it for you.

By: David Vossel (dvossel) 2009-09-30 17:41:20

I deleted and re-uploaded the same patch with an important tweak.

By: Laurent Steffan (lmsteffan) 2009-10-04 18:27:27

We are currently testing your patch. It seems to work on our test system, but I'll get back with more information in a few days when we have performed all the non-regression tests. Thanks for your help!

By: David Vossel (dvossel) 2009-10-08 15:11:18

This issue was resolved by issue ASTERISK-14844

By: Digium Subversion (svnbot) 2009-10-08 15:11:34

Repository: asterisk
Revision: 222761

U   trunk/channels/chan_misdn.c
U   trunk/channels/chan_sip.c
U   trunk/include/asterisk/channel.h
U   trunk/main/channel.c
U   trunk/main/features.c
U   trunk/main/pbx.c

r222761 | dvossel | 2009-10-07 17:58:38 -0500 (Wed, 07 Oct 2009) | 35 lines

Deadlock in channel masquerade handling

Channels are stored in an ao2_container.  When accessing an item within
an ao2_container the proper locking order is to first lock the container,
and then the items within it.

In ast_do_masquerade both the clone and original channel must be locked
for the entire duration of the function.  The problem with this is that
it attemptes to unlink and link these channels back into the ao2_container
when one of the channel's name changes.  This is invalid locking order as
the process of unlinking and linking will lock the ao2_container while
the channels are locked!!! Now, both the channels in do_masquerade are
unlinked from the ao2_container and then locked for the entire function.
At the end of the function both channels are unlocked and linked back
into the container with their new names as hash values.

This new method of requiring all channels and tech pvts to be unlocked
before ast_do_masquerade() or ast_change_name() required several
changes throughout the code base.

(closes issue ASTERISK-14844)
Reported by: russell
     masq_deadlock_trunk.diff uploaded by dvossel (license 671)
Tested by: dvossel, atis

(closes issue ASTERISK-14568)
Reported by: lmsteffan
     deadlock_local_attended_transfers_trunk.diff uploaded by dvossel (license 671)
Tested by: lmsteffan, dvossel

Review: https://reviewboard.asterisk.org/r/387/