[Home]

Summary:ASTERISK-13710: [patch] Race condition between bridge and channel masquerading
Reporter:Guillermo Winkler (guillecabeza)Labels:
Date Opened:2009-03-08 16:07:07Date Closed:2009-03-11 17:01:47
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Core/Channels
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) channel.c.179741._issue_14623.patch
( 1) channel.c.179741.issue14623.patch2
( 2) channel.c.patch
( 3) issue_14623_1.4.diff.txt
Description:The bridge checks for zombieness

/* Stop if we're a zombie or need a soft hangup */
if (ast_test_flag(c0, AST_FLAG_ZOMBIE) || ast_check_hangup_locked(c0) ||
   ast_test_flag(c1, AST_FLAG_ZOMBIE) || ast_check_hangup_locked(c1)) {

To see if some of the channels is going through a masquerading, but non atomically, access to channel member variables happens a few lines later

if (!ast_strlen_zero(pbx_builtin_getvar_helper(c0, "BRIDGEPEER")))
    pbx_builtin_setvar_helper(c0, "BRIDGEPEER", c1->name);

Without locking on those acceses. That causes random crashes when the memory is touched or free'd later.


****** ADDITIONAL INFORMATION ******

The channels should be locked before touching ->name and using getvar/setvar helpers.

There is some complexity regarding locking both channels in order to leave the code as it is cause it should be tone using trylock in order to avoid deadlocks, the other alternative is to move all the accesses to variables and locking the channels in order.

If I'm told which alternative is preferable I'll submit that patch.
Comments:By: Guillermo Winkler (guillecabeza) 2009-03-08 16:09:19

Same problem occurs in 1.2/1.4/1.6

By: Tilghman Lesher (tilghman) 2009-03-08 20:33:07

When you're reporting a crash, we'd really prefer if you could submit a stack backtrace, as detailed in doc/backtrace.txt, along with the report.  If you'd like to submit a patch as well, that would be fine.

In terms of locking order, the code does not currently enforce a particular order.  However, the stated policy for locking order between channels is to lock the lowest memory address first.  The way it's currently implemented in other places, though, is with trylock deadlock avoidance.  You're welcome to implement either way, as we are not yet enforcing the stated locking order.

By: Guillermo Winkler (guillecabeza) 2009-03-08 20:41:29

tilghman, I'll post a backtrace as soon as I have another one. Anyway, I think the problem is clear from dead-listing analysis.

Will post the smallest footprint patch.

Thanks

By: Russell Bryant (russell) 2009-03-09 07:17:59

Tilghman, where is that stated as the policy for locking order?  I don't know of a single place in the code that does that.  If it does, it's wrong, as that is not sufficient to avoid deadlocks because of the use of recursive locks.

By: Russell Bryant (russell) 2009-03-09 07:27:43

guillecabeza, As you stated, there are some locking complexities to deal with here.  I'm going to write up a candidate patch and post it here.  I hope you don't mind.

By: Russell Bryant (russell) 2009-03-09 07:31:50

I have uploaded a candidate patch for 1.4.  The changes for trunk would be pretty much the same.  Let me know what you think.

By: Tilghman Lesher (tilghman) 2009-03-09 11:57:42

russell:  in that case, I am incorrect.  :-)

By: Guillermo Winkler (guillecabeza) 2009-03-09 11:58:57

I was thinking about using the approach in ast_channel_masquerade

ast_mutex_lock(&c0->lock);
while(ast_mutex_trylock(&c1->lock)) {
ast_mutex_unlock(&c0->lock);
usleep(1);
ast_mutex_lock(&c0->lock);
}

and unlocking in the same places you're calling update_bridgepeer.

By: Guillermo Winkler (guillecabeza) 2009-03-09 13:35:18

mm, maybe the check hangup should be changed, check_locked not needed...

By: Guillermo Winkler (guillecabeza) 2009-03-09 18:33:58

I think potentially your patch is better (I really don't like usleep's dangling around that much). But for consistency I think everywhere should use the same strategy when faced with the same problem, that would mean changing ast_channel_masquerade also...

my 2 cents.

By: Russell Bryant (russell) 2009-03-10 08:39:05

Thanks for the feedback, guillecabeza.

I don't think that ast_channel_masquerade() is quite the same situation.  In this case, we can pretty easily avoid having to hold a lock on 2 channels at the same time.  That lets us avoid the nasty trylock/usleep loop.  In the case of ast_channel_masquerade() and a number of other places in the code, it would be much more difficult to avoid holding 2 channel locks, so we have the nasty deadlock avoidance attempt to get the locks.

I'll go ahead and commit this patch today.  Thanks for your work on this issue!

By: Digium Subversion (svnbot) 2009-03-11 16:42:59

Repository: asterisk
Revision: 181423

U   branches/1.4/main/channel.c

------------------------------------------------------------------------
r181423 | russell | 2009-03-11 16:42:59 -0500 (Wed, 11 Mar 2009) | 9 lines

Make code that updates BRIDGEPEER variable thread-safe.

It is not safe to read the name field of an ast_channel without the channel
locked.  This patch fixes some places in channel.c where this was being done,
and lead to crashes related to masquerades.

(closes issue ASTERISK-13710)
Reported by: guillecabeza

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=181423

By: Digium Subversion (svnbot) 2009-03-11 16:49:30

Repository: asterisk
Revision: 181424

_U  trunk/
U   trunk/main/channel.c

------------------------------------------------------------------------
r181424 | russell | 2009-03-11 16:49:30 -0500 (Wed, 11 Mar 2009) | 17 lines

Merged revisions 181423 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r181423 | russell | 2009-03-11 16:42:58 -0500 (Wed, 11 Mar 2009) | 9 lines

Make code that updates BRIDGEPEER variable thread-safe.

It is not safe to read the name field of an ast_channel without the channel
locked.  This patch fixes some places in channel.c where this was being done,
and lead to crashes related to masquerades.

(closes issue ASTERISK-13710)
Reported by: guillecabeza

........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=181424

By: Digium Subversion (svnbot) 2009-03-11 16:55:25

Repository: asterisk
Revision: 181425

_U  branches/1.6.0/
U   branches/1.6.0/main/channel.c

------------------------------------------------------------------------
r181425 | russell | 2009-03-11 16:55:25 -0500 (Wed, 11 Mar 2009) | 25 lines

Merged revisions 181424 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
r181424 | russell | 2009-03-11 16:49:29 -0500 (Wed, 11 Mar 2009) | 17 lines

Merged revisions 181423 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r181423 | russell | 2009-03-11 16:42:58 -0500 (Wed, 11 Mar 2009) | 9 lines

Make code that updates BRIDGEPEER variable thread-safe.

It is not safe to read the name field of an ast_channel without the channel
locked.  This patch fixes some places in channel.c where this was being done,
and lead to crashes related to masquerades.

(closes issue ASTERISK-13710)
Reported by: guillecabeza

........

................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=181425

By: Digium Subversion (svnbot) 2009-03-11 17:01:46

Repository: asterisk
Revision: 181426

_U  branches/1.6.1/
U   branches/1.6.1/main/channel.c

------------------------------------------------------------------------
r181426 | russell | 2009-03-11 17:01:46 -0500 (Wed, 11 Mar 2009) | 25 lines

Merged revisions 181424 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
r181424 | russell | 2009-03-11 16:49:29 -0500 (Wed, 11 Mar 2009) | 17 lines

Merged revisions 181423 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r181423 | russell | 2009-03-11 16:42:58 -0500 (Wed, 11 Mar 2009) | 9 lines

Make code that updates BRIDGEPEER variable thread-safe.

It is not safe to read the name field of an ast_channel without the channel
locked.  This patch fixes some places in channel.c where this was being done,
and lead to crashes related to masquerades.

(closes issue ASTERISK-13710)
Reported by: guillecabeza

........

................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=181426