Summary: | ASTERISK-29428: DTMF on progress results in infinite loop if progress followed by hangup received | ||
Reporter: | N A (InterLinked) | Labels: | |
Date Opened: | 2021-05-12 21:22:35 | Date Closed: | |
Priority: | Minor | Regression? | |
Status: | Open/New | Components: | Applications/app_dial |
Versions: | 18.3.0 | Frequency of Occurrence | Constant |
Related Issues: | |||
Environment: | Debian 10 | Attachments: | ( 0) debug.txt |
Description: | I've been trying to troubleshoot a weird issue I can reproduce consistently, which results in thousands of these spamming the console:
``` [2021-05-12 22:02:44] WARNING[1817][C-00001ec9]: app_dial.c:1798 wait_for_answer: Unable to forward frametype: 2 [2021-05-12 22:02:44] WARNING[1817][C-00001ec9]: app_dial.c:1798 wait_for_answer: Unable to forward frametype: 2 [2021-05-12 22:02:44] WARNING[1817][C-00001ec9]: app_dial.c:1798 wait_for_answer: Unable to forward frametype: 2 [2021-05-12 22:02:44] WARNING[1817][C-00001ec9]: app_dial.c:1798 wait_for_answer: Unable to forward frametype: 2 [2021-05-12 22:02:44] WARNING[1817][C-00001ec9]: app_dial.c:1798 wait_for_answer: Unable to forward frametype: 2 [2021-05-12 22:02:44] WARNING[1817][C-00001ec9]: app_dial.c:1798 wait_for_answer: Unable to forward frametype: 2 [2021-05-12 22:02:44] WARNING[1817][C-00001ec9]: app_dial.c:1798 wait_for_answer: Unable to forward frametype: 2 ``` The frame type in question is an AST_FRAME_DTMF_END frame. I tested a few different ways, and what's happening is I'm trying to call a remote FXO line that appears to be down currently (HANGUPCAUSE = 20). Calling it directly, I just hear a reorder tone (since that's how it gets handled that particular way). In a different place, I have the D option configured to send DTMF on Progress, and the CLI confirms that's what it's trying to do: ``` [May 12 22:02:42] -- Channel Local/SIPATAxLA1@centrex-outgoing-00001d85;2 joined 'softmix' base-bridge <9f5d2e80-cdf9-433f-b84d-2494409ea318> [May 12 22:02:42] -- Local/SIPATAxLA1@centrex-outgoing-00001d85;1 requested media update control 20, passing it to Local/REDACTED@centrex-originate-local-00001d87;1 [May 12 22:02:42] -- Local/REDACTED@centrex-originate-local-00001d87;2 requested media update control 20, passing it to Local/FX1A1DC824@centrex-fx-00001d88;1 [May 12 22:02:42] -- Local/FX1A1DC824@centrex-fx-00001d88;2 requested media update control 20, passing it to IAX2/REDACTED:4569-13337 [May 12 22:02:42] -- Local/SIPATAxLA1@centrex-outgoing-00001d85;1 requested media update control 20, passing it to Local/REDACTED@centrex-originate-local-00001d87;1 [May 12 22:02:42] -- Local/REDACTED@centrex-originate-local-00001d87;2 requested media update control 20, passing it to Local/FX1A1DC824@centrex-fx-00001d88;1 [May 12 22:02:42] -- Local/FX1A1DC824@centrex-fx-00001d88;2 requested media update control 20, passing it to IAX2/REDACTED:4569-13337 [May 12 22:02:44] -- IAX2/REDACTED:4569-13337 is making progress passing it to Local/FX1A1DC824@centrex-fx-00001d88;2 [May 12 22:02:44] -- Local/FX1A1DC824@centrex-fx-00001d88;1 is making progress passing it to Local/REDACTED@centrex-originate-local-00001d87;2 [May 12 22:02:44] -- Sending DTMF 'REDACTED' to the called party as result of receiving a PROGRESS message. [May 12 22:02:44] -- Local/REDACTED@centrex-originate-local-00001d87;1 is making progress passing it to Local/SIPATAxLA1@centrex-outgoing-00001d85;1 [May 12 22:02:44] -- Hungup 'IAX2/redacted:4569-13337' [May 12 22:02:44] == Everyone is busy/congested at this time (1:0/0/1) [May 12 22:02:44] -- Executing [s@REDACTED-simple:4] Return("Local/FX1A1DC824@centrex-fx-00001d88;2", "20") in new stack [May 12 22:02:44] -- Executing [FX1A1DC824@centrex-fx:2] Hangup("Local/FX1A1DC824@centrex-fx-00001d88;2", "20") in new stack [May 12 22:02:44] == Spawn extension (centrex-fx, FX1A1DC824, 2) exited non-zero on 'Local/FX1A1DC824@centrex-fx-00001d88;2' [2021-05-12 22:02:44] WARNING[1817][C-00001ec9]: app_dial.c:1798 wait_for_answer: Unable to forward frametype: 2 [2021-05-12 22:02:44] WARNING[1817][C-00001ec9]: app_dial.c:1798 wait_for_answer: Unable to forward frametype: 2 [2021-05-12 22:02:44] WARNING[1817][C-00001ec9]: app_dial.c:1798 wait_for_answer: Unable to forward frametype: 2 [2021-05-12 22:02:44] WARNING[1817][C-00001ec9]: app_dial.c:1798 wait_for_answer: Unable to forward frametype: 2 [2021-05-12 22:02:44] WARNING[1817][C-00001ec9]: app_dial.c:1798 wait_for_answer: Unable to forward frametype: 2 [2021-05-12 22:02:44] WARNING[1817][C-00001ec9]: app_dial.c:1798 wait_for_answer: Unable to forward frametype: 2 [2021-05-12 22:02:44] WARNING[1817][C-00001ec9]: app_dial.c:1798 wait_for_answer: Unable to forward frametype: 2 [2021-05-12 22:02:44] WARNING[1817][C-00001ec9]: app_dial.c:1798 wait_for_answer: Unable to forward frametype: 2 [2021-05-12 22:02:44] WARNING[1817][C-00001ec9]: app_dial.c:1798 wait_for_answer: Unable to forward frametype: 2 [2021-05-12 22:02:44] WARNING[1817][C-00001ec9]: app_dial.c:1798 wait_for_answer: Unable to forward frametype: 2 [2021-05-12 22:02:44] WARNING[1817][C-00001ec9]: app_dial.c:1798 wait_for_answer: Unable to forward frametype: 2 [2021-05-12 22:02:44] WARNING[1817][C-00001ec9]: app_dial.c:1798 wait_for_answer: Unable to forward frametype: 2 [2021-05-12 22:02:44] WARNING[1817][C-00001ec9]: app_dial.c:1798 wait_for_answer: Unable to forward frametype: 2 [2021-05-12 22:02:44] WARNING[1817][C-00001ec9]: app_dial.c:1798 wait_for_answer: Unable to forward frametype: 2 [2021-05-12 22:02:44] WARNING[1817][C-00001ec9]: app_dial.c:1798 wait_for_answer: Unable to forward frametype: 2 [2021-05-12 22:02:44] WARNING[1817][C-00001ec9]: app_dial.c:1798 wait_for_answer: Unable to forward frametype: 2 [2021-05-12 22:02:44] WARNING[1817][C-00001ec9]: app_dial.c:1798 wait_for_answer: Unable to forward frametype: 2 [2021-05-12 22:02:44] WARNING[1817][C-00001ec9]: app_dial.c:1798 wait_for_answer: Unable to forward frametype: 2 [2021-05-12 22:02:44] WARNING[1817][C-00001ec9]: app_dial.c:1798 wait_for_answer: Unable to forward frametype: 2 [2021-05-12 22:02:44] WARNING[1817][C-00001ec9]: app_dial.c:1798 wait_for_answer: Unable to forward frametype: 2 [2021-05-12 22:02:44] WARNING[1817][C-00001ec9]: app_dial.c:1798 wait_for_answer: Unable to forward frametype: 2 [2021-05-12 22:02:44] WARNING[1817][C-00001ec9]: app_dial.c:1798 wait_for_answer: Unable to forward frametype: 2 [2021-05-12 22:02:44] WARNING[1817][C-00001ec9]: app_dial.c:1798 wait_for_answer: Unable to forward frametype: 2 [2021-05-12 22:02:44] WARNING[1817][C-00001ec9]: app_dial.c:1798 wait_for_answer: Unable to forward frametype: 2 ``` That goes on forever, hundreds of times per second, until I manually hang up the call. All the while, I don't hear anything in the call itself and dialplan execution doesn't continue until I hang up. There seems to be some kind of bug here with not accounting for something here but I can't quite put my finger on it... however, this does happen every single time. It's not able to send the DTMF, but this basically locks up the call afterwards for perpetuity because of that. It's worth mentioning the end call that is being made (via IAX2), is to another Asterisk machine over the Internet. That is to say, not my dialplan. The FXO gateway is connected to that Asterisk switch. It's possible that switch is introducing a Progress() it shouldn't be that's causing this to happen, but it seems Asterisk should be able to handle that. | ||
Comments: | By: Asterisk Team (asteriskteam) 2021-05-12 21:22:36.553-0500 Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. Please note that log messages and other files should not be sent to the Sangoma Asterisk Team unless explicitly asked for. All files should be placed on this issue in a sanitized fashion as needed. A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report. Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process]. Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur. Please note that by submitting data, code, or documentation to Sangoma through JIRA, you accept the Terms of Use present at [https://www.asterisk.org/terms-of-use/|https://www.asterisk.org/terms-of-use/]. By: Joshua C. Colp (jcolp) 2021-05-13 03:59:57.742-0500 We require additional debug to continue with triage of your issue. Please follow the instructions on the wiki [1] for how to collect debugging information from Asterisk. For expediency, where possible, attach the debug with a '.txt' file extension so that the debug will be usable for further analysis. Thanks! [1] https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information By: N A (InterLinked) 2021-05-13 07:47:51.146-0500 Attached is the debug. I cleared other calls from the system during this debug, and all I do is dial ## for Last Number Redial, which reproduces the issue. The warnings appear in the CLI printout but not in the debug. By: N A (InterLinked) 2021-05-13 21:11:43.190-0500 I did some further testing on this today and here is what I noticed: If I *remove* the Progress() on the remote end and replace it with a NoOp(), execution is as expected. There aren't any warnings, and dialplan execution continues immediately. The below will cause the above issues with the infinite barage of warning spam and frozen dialplan execution: [May 13 22:03:11] -- Executing [REDACTED@users:1] Progress("IAX2/REDACTED:4569-8334", "") in new stack [May 13 22:03:11] -- Executing [REDACTED@users:2] SIPAddHeader("IAX2/REDACTED:4569-8334", "Request-Line:7") in new stack [May 13 22:03:11] -- Executing [REDACTED@users:3] Dial("IAX2/REDACTED:4569-8334", "SIP/91,10") in new stack [May 13 22:03:11] WARNING[32382][C-00000843]: app_dial.c:2527 dial_exec_full: Unable to create channel of type 'SIP' (cause 20 - Subscriber absent) [May 13 22:03:11] == Everyone is busy/congested at this time (1:0/0/1) [May 13 22:03:11] -- Executing [REDACTED@users:4] Hangup("IAX2/REDACTED:4569-8334", "") in new stack This, however, will not: [May 13 22:06:12] -- Executing [REDACTED@users:1] NoOp("IAX2/REDACTED:4569-1345", "") in new stack [May 13 22:06:12] -- Executing [REDACTED@users:2] SIPAddHeader("IAX2/REDACTED:4569-1345", "Request-Line:7") in new stack [May 13 22:06:12] -- Executing [REDACTED@users:3] Dial("IAX2/REDACTED:4569-1345", "SIP/91,10") in new stack [May 13 22:06:12] WARNING[32470][C-00000844]: app_dial.c:2527 dial_exec_full: Unable to create channel of type 'SIP' (cause 20 - Subscriber absent) [May 13 22:06:12] == Everyone is busy/congested at this time (1:0/0/1) [May 13 22:06:12] -- Executing [REDACTED@users:4] Hangup("IAX2/REDACTED:4569-1345", "") in new stack It thus appears that if you have two Asterisk nodes, Nodes A and Node B, and Node A initiates a call to Node B which provides Progress() but immediately hangs up because it's unable to do what it wanted to do, this can cause serious issues with the call on Node A's side, since Node A now will want to send the DTMF on Progress, but the call is toast by the time it's doing that. By: George Joseph (gjoseph) 2021-05-14 09:10:15.849-0500 Is this issue IAX2 specific or does it happen with SIP calls as well? Also, do yo plan to work this issue yourself? By: N A (InterLinked) 2021-05-15 08:01:34.729-0500 Yes, this happens with SIP, too, so it appears to be technology-agnostic. I personally have no clue what could be causing this so I don't think I'd be qualified to work on this. I played around, and this is all that is necessary to cause this issue: exten => 123,1,Progress() same => n,Hangup() If you call an extension on your switch *OR* a different switch (doesn't matter), and the Dial() to it has the D progress option set (e.g. Dial(something,,D(::12345))) and you hit something like that, or that effectively does that (progress immediately followed by disconnect), then you can say goodbye to your call. Now it's an endless deluge of: WARNING[7009][C-0000085e]: app_dial.c:1798 wait_for_answer: Unable to forward frametype: 2 until you manually hang up the call. From what I can tell, there is no way to avoid this issue - obviously, it does not happen if the D(::something) option isn't used, because then the progress doesn't make it want to send DTMF, but I think the problem we have here is that app_dial wants to send DTMF as soon as it gets the Progress, and it doesn't account for scenarios where the channel might have hung up before it could successfully send all the DTMF. As a result, instead of failing with perhaps a single error "Unable to send DTMF", and moving on in the dialplan, it gets stuck in an endless loop of trying to send DTMF that it will never be able to successfully send - because the channel is gone. I actually think this is more serious now than I initially thought, because it is quite trivial for some other switch to easily start crashing calls on other Asterisk switches - perhaps completely inadvertently. It is not unreasonable to think that this could happen easily by coincidence. In fact, that's how I stumbled upon this. Hope that helps! Do you need any more info from me? By: George Joseph (gjoseph) 2021-05-17 07:31:03.063-0500 | Hope that helps! Do you need any more info from me? Nope. I can reproduce the issue. By: N A (InterLinked) 2021-05-21 10:41:32.621-0500 Based on my testing, it appears that the following condition is not getting evaluated to true (in app_dial): if (!o->chan || !ast_test_flag64(o, DIAL_STILLGOING)) { /* This outgoing channel has died so don't send the frame to it. */ continue; } This is despite the fact that the channel has hung up. Maybe there needs to be a more encompassing check? I'm not sure what else to test, seems like this ought to prevent the issue but it doesn't. I also wonder if the hangup is queued behind the DTMF, which might be blocking, locking it into something it can no longer complete? |