|Summary:||ASTERISK-18975: Manager Redirect action on bridged channel pair causes intermittent hangup on second channel|
|Reporter:||Ben Klang (bklang)||Labels:|
|Date Opened:||2011-12-06 17:24:46.000-0600||Date Closed:||2013-01-02 15:08:45.000-0600|
|Environment:||Attachments:||( 0) broken.log|
( 1) working.log
|Description:||We have an application where two channels are first bridged and then split back out, with the option of re-bridging. When the split occurs, we use an AMI Redirect action with both channels (Channel and ExtraChannel) filled out so both legs go somewhere in the dialplan. When this happens, approximately 50% of the time, the ExtraChannel will be hung up.|
Kevin Fleming was very helpful on IRC today working on the possible cause. His last comment on the issue was:
kpfleming: so... theorizing here: if the Redirect action is creating a new PBX thread for the second channel to use after it has been pulled out of the bridge, and somehow that thread ends up on the second channel before the original thread has caused the masquerade to occur, things will get very messy
I have attached two DEBUG logs illustrating the issue. In the first example, the app works as expected, where both channels are split and continue in the dialplan. In the second example, the second channel (SIP/grant-00000019) *should* be masqueraded, but is instead hung up.
This feels like a race condition where, somehow, the AST_FLAG_ZOMBIE is getting set on the secondary channel, when it should not.
|Comments:||By: Ben Klang (bklang) 2011-12-07 11:24:36.754-0600|
On further research, I found that adding a sleep(1) to main/pbx.c on line 8051 makes the bug consistently reproducible. This sleep goes into ast_async_goto() just before the channel masquerade occurs. My theory is that this sleep delays the masquerade so that the other thread has a chance to hang it up first, causing the masquerade to fail. I have not yet identified the other thread.
By: Ben Klang (bklang) 2011-12-07 17:04:39.230-0600
I think I now understand the cause. These two channels are originally connected by app_dial. When the Redirect occurs, both calls are sent to new locations in the dialplan. The race is caused by the cleanup behavior in app_dial. Looking on line 2829 of apps/app_dial.c, we can see where the bridging of the two channels occurs. That function call returns when the Redirect occurs. Later, on line 2860, there is a check for the app_dial option OPT_CALLEE_GO_ON, which is not set. The else side of that condition is a call to ast_hangup(peer), which is what ultimately kills our call.
I'm not entirely clear why the channel masquerade prevents the above call flow from happening, but I suspect that is the intent of the masquerade. When the masquerade is delayed and happens after app_dial completes, then the peer is hung up. If the masquerade beats app_dial to it, then the Redirect functions as expected.
However, I still don't know what the correct fix to the issue is.
By: Ben Klang (bklang) 2011-12-07 17:14:19.840-0600
And just to confirm the issue: commenting out the call to ast_hangup(peer) on line 2876 of apps/app_dial.c makes the Redirect work as expected, though obviously it's not a real solution.
By: Maciej Krajewski (jamicque) 2012-06-20 03:43:18.989-0500
It might be the same problem as mine - ASTERISK-19985
By: trol (trol) 2012-07-12 12:40:47.021-0500
I am also having this issue on 18.104.22.168. I was using this feature on 1.2 for years, without any problem. Redirect with extra channel is the only way to send a bridged call to a meet me room, as far as I know.
Any development or workaround?
By: Matt Jordan (mjordan) 2012-08-13 09:25:39.862-0500
A potentially related issue was resolved in 22.214.171.124. You may want to try reproducing the problem using that version (or later) to see if your issue still occurs.
If it does, please note in the comments here and I'll unlink the issue.
By: Jeremy Betts (freevoice) 2012-10-23 16:48:42.083-0500
This issue still exists in 126.96.36.199.
By: Richard Mudgett (rmudgett) 2012-12-12 14:26:01.623-0600
For those waiting for a fix. A patch is available on reviewboard: