Summary: | ASTERISK-13710: [patch] Race condition between bridge and channel masquerading | ||
Reporter: | Guillermo Winkler (guillecabeza) | Labels: | |
Date Opened: | 2009-03-08 16:07:07 | Date Closed: | 2009-03-11 17:01:47 |
Priority: | Critical | Regression? | No |
Status: | Closed/Complete | Components: | Core/Channels |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) channel.c.179741._issue_14623.patch ( 1) channel.c.179741.issue14623.patch2 ( 2) channel.c.patch ( 3) issue_14623_1.4.diff.txt | |
Description: | The bridge checks for zombieness /* Stop if we're a zombie or need a soft hangup */ if (ast_test_flag(c0, AST_FLAG_ZOMBIE) || ast_check_hangup_locked(c0) || ast_test_flag(c1, AST_FLAG_ZOMBIE) || ast_check_hangup_locked(c1)) { To see if some of the channels is going through a masquerading, but non atomically, access to channel member variables happens a few lines later if (!ast_strlen_zero(pbx_builtin_getvar_helper(c0, "BRIDGEPEER"))) pbx_builtin_setvar_helper(c0, "BRIDGEPEER", c1->name); Without locking on those acceses. That causes random crashes when the memory is touched or free'd later. ****** ADDITIONAL INFORMATION ****** The channels should be locked before touching ->name and using getvar/setvar helpers. There is some complexity regarding locking both channels in order to leave the code as it is cause it should be tone using trylock in order to avoid deadlocks, the other alternative is to move all the accesses to variables and locking the channels in order. If I'm told which alternative is preferable I'll submit that patch. | ||
Comments: | By: Guillermo Winkler (guillecabeza) 2009-03-08 16:09:19 Same problem occurs in 1.2/1.4/1.6 By: Tilghman Lesher (tilghman) 2009-03-08 20:33:07 When you're reporting a crash, we'd really prefer if you could submit a stack backtrace, as detailed in doc/backtrace.txt, along with the report. If you'd like to submit a patch as well, that would be fine. In terms of locking order, the code does not currently enforce a particular order. However, the stated policy for locking order between channels is to lock the lowest memory address first. The way it's currently implemented in other places, though, is with trylock deadlock avoidance. You're welcome to implement either way, as we are not yet enforcing the stated locking order. By: Guillermo Winkler (guillecabeza) 2009-03-08 20:41:29 tilghman, I'll post a backtrace as soon as I have another one. Anyway, I think the problem is clear from dead-listing analysis. Will post the smallest footprint patch. Thanks By: Russell Bryant (russell) 2009-03-09 07:17:59 Tilghman, where is that stated as the policy for locking order? I don't know of a single place in the code that does that. If it does, it's wrong, as that is not sufficient to avoid deadlocks because of the use of recursive locks. By: Russell Bryant (russell) 2009-03-09 07:27:43 guillecabeza, As you stated, there are some locking complexities to deal with here. I'm going to write up a candidate patch and post it here. I hope you don't mind. By: Russell Bryant (russell) 2009-03-09 07:31:50 I have uploaded a candidate patch for 1.4. The changes for trunk would be pretty much the same. Let me know what you think. By: Tilghman Lesher (tilghman) 2009-03-09 11:57:42 russell: in that case, I am incorrect. :-) By: Guillermo Winkler (guillecabeza) 2009-03-09 11:58:57 I was thinking about using the approach in ast_channel_masquerade ast_mutex_lock(&c0->lock); while(ast_mutex_trylock(&c1->lock)) { ast_mutex_unlock(&c0->lock); usleep(1); ast_mutex_lock(&c0->lock); } and unlocking in the same places you're calling update_bridgepeer. By: Guillermo Winkler (guillecabeza) 2009-03-09 13:35:18 mm, maybe the check hangup should be changed, check_locked not needed... By: Guillermo Winkler (guillecabeza) 2009-03-09 18:33:58 I think potentially your patch is better (I really don't like usleep's dangling around that much). But for consistency I think everywhere should use the same strategy when faced with the same problem, that would mean changing ast_channel_masquerade also... my 2 cents. By: Russell Bryant (russell) 2009-03-10 08:39:05 Thanks for the feedback, guillecabeza. I don't think that ast_channel_masquerade() is quite the same situation. In this case, we can pretty easily avoid having to hold a lock on 2 channels at the same time. That lets us avoid the nasty trylock/usleep loop. In the case of ast_channel_masquerade() and a number of other places in the code, it would be much more difficult to avoid holding 2 channel locks, so we have the nasty deadlock avoidance attempt to get the locks. I'll go ahead and commit this patch today. Thanks for your work on this issue! By: Digium Subversion (svnbot) 2009-03-11 16:42:59 Repository: asterisk Revision: 181423 U branches/1.4/main/channel.c ------------------------------------------------------------------------ r181423 | russell | 2009-03-11 16:42:59 -0500 (Wed, 11 Mar 2009) | 9 lines Make code that updates BRIDGEPEER variable thread-safe. It is not safe to read the name field of an ast_channel without the channel locked. This patch fixes some places in channel.c where this was being done, and lead to crashes related to masquerades. (closes issue ASTERISK-13710) Reported by: guillecabeza ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=181423 By: Digium Subversion (svnbot) 2009-03-11 16:49:30 Repository: asterisk Revision: 181424 _U trunk/ U trunk/main/channel.c ------------------------------------------------------------------------ r181424 | russell | 2009-03-11 16:49:30 -0500 (Wed, 11 Mar 2009) | 17 lines Merged revisions 181423 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r181423 | russell | 2009-03-11 16:42:58 -0500 (Wed, 11 Mar 2009) | 9 lines Make code that updates BRIDGEPEER variable thread-safe. It is not safe to read the name field of an ast_channel without the channel locked. This patch fixes some places in channel.c where this was being done, and lead to crashes related to masquerades. (closes issue ASTERISK-13710) Reported by: guillecabeza ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=181424 By: Digium Subversion (svnbot) 2009-03-11 16:55:25 Repository: asterisk Revision: 181425 _U branches/1.6.0/ U branches/1.6.0/main/channel.c ------------------------------------------------------------------------ r181425 | russell | 2009-03-11 16:55:25 -0500 (Wed, 11 Mar 2009) | 25 lines Merged revisions 181424 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ................ r181424 | russell | 2009-03-11 16:49:29 -0500 (Wed, 11 Mar 2009) | 17 lines Merged revisions 181423 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r181423 | russell | 2009-03-11 16:42:58 -0500 (Wed, 11 Mar 2009) | 9 lines Make code that updates BRIDGEPEER variable thread-safe. It is not safe to read the name field of an ast_channel without the channel locked. This patch fixes some places in channel.c where this was being done, and lead to crashes related to masquerades. (closes issue ASTERISK-13710) Reported by: guillecabeza ........ ................ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=181425 By: Digium Subversion (svnbot) 2009-03-11 17:01:46 Repository: asterisk Revision: 181426 _U branches/1.6.1/ U branches/1.6.1/main/channel.c ------------------------------------------------------------------------ r181426 | russell | 2009-03-11 17:01:46 -0500 (Wed, 11 Mar 2009) | 25 lines Merged revisions 181424 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ................ r181424 | russell | 2009-03-11 16:49:29 -0500 (Wed, 11 Mar 2009) | 17 lines Merged revisions 181423 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r181423 | russell | 2009-03-11 16:42:58 -0500 (Wed, 11 Mar 2009) | 9 lines Make code that updates BRIDGEPEER variable thread-safe. It is not safe to read the name field of an ast_channel without the channel locked. This patch fixes some places in channel.c where this was being done, and lead to crashes related to masquerades. (closes issue ASTERISK-13710) Reported by: guillecabeza ........ ................ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=181426 |