Summary:ASTERISK-20492: Stuck DTMF when using ChannelRedirect to split a two channel bridge
Reporter:Jeremiah Gowdy (jgowdy)Labels:
Date Opened:2012-09-29 10:57:09Date Closed:2012-11-06 12:21:21.000-0600
Status:Closed/CompleteComponents:Channels/chan_bridge Core/Channels
Versions:10.7.1 10.8.0 10.9.0 Frequency of
Environment:OpenSuSE 11.4 x64 virtual instance on vSphere 4, CentOS 6.3 x64 on Dell M610 bladeAttachments:( 0) bridge_end_dtmf.patch.txt
( 1) bridge_end_dtmf-v2.patch
( 2) bridge_end_dtmf-v3.patch.txt
( 3) full-log.gz
( 4) jira_asterisk_20492_v1.8.patch
( 5) jira_asterisk_20492_v10.patch
Description:When two channels are bridged together and DTMF is used to indicate to our external application to end the bridge, our application executes ChannelRedirect on either one side or both (we tried both ways).  If ChannelRedirect happens after DTMF start but before DTMF end, it seems that the fact that we're in DTMF is copied to both channels as the bridge is dissolved, but the DTMF end is only exposed to one channel causing the other channel to continuously generate DTMF (dsp.c generating the tone in an endless loop).  The channel that gets stuck in the DTMF is always the channel that did not press the digit and thus doesn't get the DTMF end.  We have been able to reproduce this scenario with the DTMF generated by the caller or callee, either direction.  We only have DTMF handling enabled for the side of the call in which the key is pressed (we don't allow the other side to press keys), using the appropriate Bridge flags.  We have even seen a scenario in which the user presses the key that should cause the second channel to be hung up on, and we attempt to clear the second channel, but it seems so busy generating DTMF that the call is never ended.  So it seems to happen if a DTMF starts, a channel is cleared causing it to dissolve the bridge, and again the cleared channel failing to get DTMF end, leaving us stuck in continuous tone.  Issuing playback commands or pressing digits on the affected channel does not clear the continuous DTMF.

As stated in the above bug fields, we have tried this on 10.7.1, 10.8.0, and 10.9.0-rc, so the recent DTMF rewrite doesn't seem to cause nor remedy the situation.  This bug is basically a fatal show stopper for our Asterisk based application which we were hoping to launch for the end of Q3.  :(

We are working very hard to debug the issue and develop a patch, but any help would be greatly appreciated.

I will attach full logs and packet traces shortly.
Comments:By: Rusty Newton (rnewton) 2012-10-01 17:06:28.163-0500

Waiting on full logs, and packet traces from reporter.  Please try to include DTMF debug message types as well as VERBOSE and DEBUG (both set to at least level 5) in your full log. Thanks.

By: Jeremiah Gowdy (jgowdy) 2012-10-01 17:54:09.450-0500

Attached is the gzipped full log of the event reproduced.

By: Richard Mudgett (rmudgett) 2012-10-01 18:52:15.270-0500

FYI You are supposed to send the issue back when you have supplied the requested information.  Otherwise, your issue may fall off the radar waiting for you to supply information.

I sent it back for you.

By: Matt Jordan (mjordan) 2012-10-02 08:19:58.930-0500

I took a look at this last night and wrote some notes below on things you may want to look into to solve this.

When a DTMF tone is begun on a channel, the {{AST_FLAG_IN_DTMF}} flag is set on the channel the DTMF is being read from.  This doesn't help you a lot - because, if I'm reading your logs correctly, the problem is that the DTMF read from SIP/gateway1-00000001 was being relayed through onto SIP/gateway1-00000000, which is what continued on in the dialplan after the bridge was broken.  You could, however, mark a flag on the channel that the DTMF was being written onto using a similar flag (say, {{AST_FLAG_DTMF_BEING_WRITTEN}} (although that's really a terrible name, but you wouldn't want to repurpose {{AST_FLAG_IN_DTMF}} for this, as a channel can have DTMF read/written at the same time)).  You could, when a bridge is permanently broken, then check to see if DTMF was being relayed along the channels.  You would want to do something like this at the end of {{ast_channel_bridge}}:

if (!ast_test_flag(c0, AST_FLAG_DTMF_BEING_WRITTEN) && !ast_check_hangup(c0)) {
 /* Make DTMF END frame here */
 ast_queue_frame(c0, f);

/* And the same again for c1, since you don't know for sure who hung up */

The trick here is that we don't cache the digit value currently in flight, and you'll need that when you construct the dummy END frame.  It may actually be easier to cache the BEGIN frame on a channel when its transmitting the DTMF, and extract the values when you need them.  You'd need to cache the frame or values when a DTMF BEGIN is read on a channel, and clear it when an END is processed.  

If you go the route of caching the BEGIN frame on the channel, you wouldn't need a flag on the channel at all - you'd merely have to test for the presence of the frame on the channel when the bridge is broken.

By: Jeremiah Gowdy (jgowdy) 2012-10-03 18:07:00.653-0500

This is our patch that seems to fix the issue.  We were forced to go off into the masquerade code as well, since if bridgepeer is the first one to ChannelRedirect(), the channel that started the bridge is left with a zombie bridge partner causing the dtmf end to be ignored.

By: Richard Mudgett (rmudgett) 2012-10-04 13:52:04.234-0500

There are locking issues with the patch.  The bridgepeer pointer in ast_do_masquerade() is *only* valid while clonechan is locked.  See the ast_bridged_channel() function comment.  You can keep bridgepeer around by getting a reference to it before clonechan is unlocked.

Those guard checks in bridge_end_dtmf() are redundant because ast_write() does them for c1 and you only conditionally call it because c0->current_dtmf_digit is non-zero.

By: Jeremiah Gowdy (jgowdy) 2012-10-04 17:12:26.106-0500

How does this look?

By: Jeremiah Gowdy (jgowdy) 2012-10-04 18:13:38.178-0500


By: Richard Mudgett (rmudgett) 2012-10-04 19:45:55.585-0500

Aside from some minor coding guideline tweaks (Curly braces on if statements.), it looks ok.

By: Richard Mudgett (rmudgett) 2012-10-25 16:14:03.152-0500

[^jira_asterisk_20492_v1.8.patch] is the initial patch put on reviewboard.

By: Richard Mudgett (rmudgett) 2012-10-25 16:18:40.811-0500

I found some problems with [^bridge_end_dtmf-v3.patch.txt] when testing the patch.
The patch would erroneously send a simulated DTMF end event for any DTMF digit because ast_channel_bridge() is too early a place to call bridge_end_dtmf().  Those calls need to be made when ast_bridge_call() is exits the bridge loop.

By: Richard Mudgett (rmudgett) 2012-10-25 16:43:12.902-0500

[^jira_asterisk_20492_v10.patch] - for convienience