[Home]

Summary:ASTERISK-14844: [patch] Deadlock in channel masquerade handling
Reporter:Russell Bryant (russell)Labels:
Date Opened:2009-09-17 15:50:56Date Closed:2009-10-08 15:11:34
Priority:BlockerRegression?No
Status:Closed/CompleteComponents:Core/Channels
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) deadlock.txt
( 1) masq_deadlock_trunk.diff
Description:In Asterisk trunk and 1.6.2, ast_channels are stored in an astobj2 container, instead of a singly linked list as before.  The hash value is based on the channel name.  So, during a masquerade, when the channel name changes, the channel must be removed and reinserted into the hash table using its new hash value.

This change introduced locking of the channels container where it did not exist before.  Specifically, it introduces locking in the order of 1) the ast_channel, and 2) the container.

Many other places in the code use the locking order 1) ast_channel container, 2) channels, and this is the locking order that must be obeyed.

Some work needs to be put into masquerade handling to resolve this deadlock and get it to follow the proper locking order.
Comments:By: Russell Bryant (russell) 2009-09-17 15:55:17

I observed a deadlock caused by this problem earlier today when someone was working on SIP transfers.  The device state processing thread had locked the container and wanted the channel lock.  The SIP monitor thread processing a transfer held a channel lock and was trying to complete a masquerade, and was waiting on the container lock.

By: Russell Bryant (russell) 2009-09-18 09:51:37

Update: this is not a blocker for 1.6.2.0, as this code isn't in 1.6.2.0.  Oops.  I have updated the target version to 1.6.3.0.

By: Jeff Peeler (jpeeler) 2009-09-24 15:42:15

Although the issue here is well understood I attached backtraces for the deadlocked threads and core show locks for this scenario. It seems to be easy to reproduce by using the bridge application.

By: David Vossel (dvossel) 2009-09-30 18:38:52

a patch is up on reviewboard for this. https://reviewboard.asterisk.org/r/387/

By: Atis Lezdins (atis) 2009-10-01 06:07:07

I could do some load testing with local channels and masquerade in a week or two when our testing server frees, as our dialplan involves a lot of masquerades.

If I'm not saying anything, send me a remainder.

By: David Vossel (dvossel) 2009-10-05 15:41:59

This patch is not ready to be tested as I have found a rather large hole in its logic.  I will update the patch once this is resolved.

By: David Vossel (dvossel) 2009-10-07 14:26:07

ok, I uploaded a new patch and it should be ready for testing now.

By: Digium Subversion (svnbot) 2009-10-07 18:01:58

Repository: asterisk
Revision: 222761

U   trunk/channels/chan_misdn.c
U   trunk/channels/chan_sip.c
U   trunk/include/asterisk/channel.h
U   trunk/main/channel.c
U   trunk/main/features.c
U   trunk/main/pbx.c

------------------------------------------------------------------------
r222761 | dvossel | 2009-10-07 18:01:57 -0500 (Wed, 07 Oct 2009) | 30 lines

Deadlock in channel masquerade handling

Channels are stored in an ao2_container.  When accessing an item within
an ao2_container the proper locking order is to first lock the container,
and then the items within it.

In ast_do_masquerade both the clone and original channel must be locked
for the entire duration of the function.  The problem with this is that
it attemptes to unlink and link these channels back into the ao2_container
when one of the channel's name changes.  This is invalid locking order as
the process of unlinking and linking will lock the ao2_container while
the channels are locked!!! Now, both the channels in do_masquerade are
unlinked from the ao2_container and then locked for the entire function.
At the end of the function both channels are unlocked and linked back
into the container with their new names as hash values.

This new method of requiring all channels and tech pvts to be unlocked
before ast_do_masquerade() or ast_change_name() required several
changes throughout the code base.


(closes issue ASTERISK-14844)
Reported by: russell
Patches:
     masq_deadlock_trunk.diff uploaded by dvossel (license 671)
Tested by: dvossel, atis

Review: https://reviewboard.asterisk.org/r/387/


------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=222761

By: Digium Subversion (svnbot) 2009-10-08 15:11:33

Repository: asterisk
Revision: 222761

U   trunk/channels/chan_misdn.c
U   trunk/channels/chan_sip.c
U   trunk/include/asterisk/channel.h
U   trunk/main/channel.c
U   trunk/main/features.c
U   trunk/main/pbx.c

------------------------------------------------------------------------
r222761 | dvossel | 2009-10-07 17:58:38 -0500 (Wed, 07 Oct 2009) | 35 lines

Deadlock in channel masquerade handling

Channels are stored in an ao2_container.  When accessing an item within
an ao2_container the proper locking order is to first lock the container,
and then the items within it.

In ast_do_masquerade both the clone and original channel must be locked
for the entire duration of the function.  The problem with this is that
it attemptes to unlink and link these channels back into the ao2_container
when one of the channel's name changes.  This is invalid locking order as
the process of unlinking and linking will lock the ao2_container while
the channels are locked!!! Now, both the channels in do_masquerade are
unlinked from the ao2_container and then locked for the entire function.
At the end of the function both channels are unlocked and linked back
into the container with their new names as hash values.

This new method of requiring all channels and tech pvts to be unlocked
before ast_do_masquerade() or ast_change_name() required several
changes throughout the code base.

(closes issue ASTERISK-14844)
Reported by: russell
Patches:
     masq_deadlock_trunk.diff uploaded by dvossel (license 671)
Tested by: dvossel, atis

(closes issue ASTERISK-14568)
Reported by: lmsteffan
Patches:
     deadlock_local_attended_transfers_trunk.diff uploaded by dvossel (license 671)
Tested by: lmsteffan, dvossel

Review: https://reviewboard.asterisk.org/r/387/


------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=222761