Summary: | ASTERISK-28139: RTP Stream Incorrect Payload Type Causes Asterisk To Drop Calls | ||||||
Reporter: | Paul Brooks (javauser) | Labels: | pjsip | ||||
Date Opened: | 2018-10-29 17:26:30 | Date Closed: | 2020-01-24 10:03:23.000-0600 | ||||
Priority: | Major | Regression? | No | ||||
Status: | Closed/Complete | Components: | Resources/res_pjsip | ||||
Versions: | 16.0.0 | Frequency of Occurrence | Frequent | ||||
Related Issues: |
| ||||||
Environment: | Debian 9.5 | Attachments: | |||||
Description: | I represent a company offering hosted PBX service to various small businesses. We use Asterisk as our host. Last week we upgraded our Asterisk 14.6.2 server to the new Asterisk 16.0.0. We began to see many calls dropping upon answer. After a thorough investigation we had to roll back to the older version of Asterisk to alleviate the problem. Hopefully this issue can be recognized as a bug and corrected in a near-future version of Asterisk 16 so that we can attempt to upgrade to Asterisk 16 once again.
The cause of the dropped calls turned out to be Asterisk's behavior in handling packet(s) received with the wrong payload type in the audio stream. One of our underlying carriers is Verizon Wholesale, and they deliver calls to us originated as either toll free or local dialed. Our dropped calls always involved Verizon's channels so we opened a ticket with them. Their support team helped us pinpoint the source of the dropped call problem. Our interconnection to Verizon is somewhat complex: they deliver calls to us over any one of 14 endpoints (in some seemingly random, quasi-round robin way). Every time a call is dropped by Asterisk 16, Asterisk logs errors similar to this, WARNING[11710][C-0000185e]: translate.c:485 ast_translator_build_path: No translator path: (ending codec is not valid) WARNING[11710][C-0000185e]: translate.c:485 ast_translator_build_path: No translator path: (starting codec is not valid) WARNING[11723][C-0000185e]: channel.c:5578 set_format: Unable to find a codec translation path: (g729) -> (ulaw) Our Asterisk is configured with all endpoints allowing ulaw only. Verizon's INVITE messages always offer media as 18 0 8 101 (g729, ulaw, alaw). Examination of SIP messages related to captured dropped calls between Verizon and our Asterisk shows normal negotiation. When the call is answered by Asterisk, it sends Verizon an answer message specifying type 0 for the media (ulaw/g711). But in a high percentage of the calls, Verizon will send a single g729 packet at the front of the resulting g711 stream. Verizon admitted to us that this is an old bug present in some of their DSP cards. They assert that they have an open case with their vendor involved but don't expect a fix anytime soon. Further, they point out that, "As a SIP wholesale connection, [Verizon will] interface [us] not only with Verizon PSTN gateways, but other wholesalers, retail customers, DSCIP carriers etc . . . [Verizon] cannot police the integrity of every RTP stream sent to [our] endpoint." Verizon pointed out to us further that RFC 3550 is pretty clear on the handling of incorrect payload types: "A receiver MUST ignore packets with payload types that it does not understand." So, from my perspective, assuming the RFC does say that Asterisk should be ignoring rtp packets containing the incorrect payload type, I must ask, why is Asterisk dropping the call when it encounters such a packet? It must be a bug. So I am reporting it as such. I must emphasize here that our previous versions of Asterisk that ran the same Verizon endpoints did not experience this dropped call issue. My hunch is that this undesirable behavior made its way into Asterisk beginning with version 15 when its framework for handling audio streams was significantly modified. It is well documented that prior to version 15, Asterisk had a "very loose" stream implementation that used a single pipe, and beginning with version 15, the framework was tightened up to allow for multiple streams for a single call. Here are a couple references on this topic: https://wiki.asterisk.org/wiki/display/AST/New+in+15, https://blogs.asterisk.org/2017/09/20/asterisk-15-multi-stream-media-sfu/ I suspect now that Asterisk 15,16 allows multiple streams for a single call, it is being confused into thinking there is a simultaneous g729 stream alongside the g711 stream when it encounters the invalid g729 packet at the beginning of Verizon's g711 stream (I performed a packet capture on a live dropped call and indeed I saw the g729 packet at the beginning of the g711 stream). I believe Asterisk's implementation of this multi-stream concept needs to somehow be tweaked to recognize when a call has negotiated only a single media stream and therefore be prepared to discard any invalid packets that pop up into a live media stream. It appears that Asterisk 14 and prior had no trouble ignoring such invalid packets. Asterisk's 16 behavior of abruptly hanging up a call whenever it encounters an invalid media packets seems untenable. | ||||||
Comments: | By: Asterisk Team (asteriskteam) 2018-10-29 17:26:31.918-0500 Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report. Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process]. By: George Joseph (gjoseph) 2018-10-30 07:52:51.740-0500 Hi Paul, We're having a hard time understanding how this could even happen. Once the negotiation is complete the only effect RTP should have is no, or one way audio. We already drop unrecognized RTP packets. This isn't to say there isn't an Asterisk bug but we need more info. Can you get us a PCAP of a bad call? Were there any other messages in the log for the dropped calls? By: Michael L. Young (elguero) 2018-10-30 09:09:12.226-0500 I am wondering if this is the same issue that we were having... we were able to solve it by setting in pjsip.conf: asymmetric_rtp_codec=yes Our configuration only allows g711 as well. From my investigation, the codec is negotiated properly but for whatever reason, the media server (only one was doing this, not sure if it was Verizon since the provider we use has multiple providers it uses for the media) would send at the very beginning of the call being answered a packet as g729. When Asterisk received this, it automatically tried to change to that codec. Since we do not have any g729 licenses, the call would fail because it could not use g729. To bypass that code which switches to the packet that it received, turning on asymmetric_rtp_codec is skipping the entire if conditional statement and we proceed without any issues. https://github.com/asterisk/asterisk/blob/fcd8e9d6ef6cad66ba3bfe58540aa1cdb372fc2c/channels/chan_pjsip.c#L834 {code:title=chan_pjsip.c} /* * Asymmetric RTP only has one native format set at a time. * Therefore we need to update the native format to the current * raw read format BEFORE the native format check */ if (!session->endpoint->asymmetric_rtp_codec && ast_format_cmp(ast_channel_rawwriteformat(ast), f->subclass.format) == AST_FORMAT_CMP_NOT_EQUAL) { struct ast_format_cap *caps; /* For maximum compatibility we ensure that the formats match that of the received media */ ast_debug(1, "Oooh, got a frame with format of %s on channel '%s' when we're sending '%s', switching to match\n", ast_format_get_name(f->subclass.format), ast_channel_name(ast), ast_format_get_name(ast_channel_rawwriteformat(ast))); caps = ast_format_cap_alloc(AST_FORMAT_CAP_FLAG_DEFAULT); if (caps) { ast_format_cap_append_from_cap(caps, ast_channel_nativeformats(ast), AST_MEDIA_TYPE_UNKNOWN); ast_format_cap_remove_by_type(caps, AST_MEDIA_TYPE_AUDIO); ast_format_cap_append(caps, f->subclass.format, 0); ast_channel_nativeformats_set(ast, caps); ao2_ref(caps, -1); } ast_set_write_format_path(ast, ast_channel_writeformat(ast), f->subclass.format); ast_set_read_format_path(ast, ast_channel_readformat(ast), f->subclass.format); if (ast_channel_is_bridged(ast)) { ast_channel_set_unbridged_nolock(ast, 1); } } {code} I haven't had time yet to look further into the code and see if this is by design or if it is actually a bug or rather error in the logic. One more tid bit, we are on 13.21.1 still on the machine that had this issue. By: Paul Brooks (javauser) 2018-10-30 12:44:41.938-0500 Michael, thank you for sharing your similar experience. I might want to tweak that asymmetric_rtp_codec setting in a test Asterisk 16 setup to see if it does resolve the issue we were having. I just peaked at the running settings for both our Asterisk 14 and Asterisk 16 versions, and the asymmetric_rtp_codec setting is turned off on our endpoints in both versions. Coincidentally, in troubleshooting our issue I noticed the asymmetric_rtp_codec setting and thought about trying it, but after seeing it described as allowing simultaneous different codecs for send/receive, I remember thinking it was the opposite of what I needed and looked past it. Your testimonial gives me hope that there might be an immediate solution on the horizon. However, if the asymmetric_rtp_codec setting does resolve our issue, I think it should only be temporary because it would be masking the real problem. George says Asterisk is supposed to already drop unrecognized rtp packets but in my case Asterisk is dropping the call, not the bad packet! George, I'll look into gathering a pcap related to a bad call for you. It might take a little time because I musk as Verizon to provide their SIP pcap from their end to supplement the pcap I obtained with the rtp packets. Luckily in my experience they appear to pcap SIP activity for all their live traffic, so I shouldn't have difficulty getting it from them. The particular pcap I captured only shows our SIP replies as encrypted on the interface, which wouldn't be of much use to you. We can see Verizon's sip packets as both unencrypted and encrypted, but the outgoing packets only show the encrypted version. Once I obtain Verizon's pcap I'll forward it to you along with my pcap of the same call and the warning messaged logged by Asterisk for the call. In the meantime, George can you weigh in on your thoughts about Michael's suggested use of asymmetric_rtp_codec to alleviate my probem? Seeing how he experienced the same issue with version 13.21.1, it makes me rethink my theory about this bug being related to Asterisk 15's new stream framework. My version 14 predates his 13.21, so maybe this issue is related to some other change made to Asterisk during the past 12 months or so? One thing I noticed about Verizon's codec negotiation that seemed different compared to others I've seen is that fact that their offer isn't followed by individual rtpmap lines like I normally see. For example, I typically see multiple rtpmap lines following a media offer, like this m=audio 11998 RTP/AVP 0 18 8 101 a=rtpmap:0 PCMU/8000 a=rtpmap:18 G729/8000 a=fmtp:18 annexb=no a=rtpmap:8 PCMA/8000 a=rtpmap:101 telephone-event/8000 a=fmtp:101 0-15 a=sendrecv a=ptime:20 by contrast, Verizon's offers don't have those additional rtpmap lines. Theirs looks like this, m=audio 38852 RTP/AVP 18 0 8 101 a=rtpmap:101 telephone-event/8000 a=fmtp:101 0-15 a=ptime:20 a=fmtp:18 annexb=no Could this have anything to do with with what's tripping up Asterisk when the bad g729 packet arrives? By: Paul Brooks (javauser) 2018-10-30 13:08:22.893-0500 George, you asked about whether we saw other types of warning messages logged for dropped calls. Yes. Usually our incoming calls from Verizon ring to SIP phone(s) somewhere. But we do have a "tech line" that rings only to Asterisk which answers with a simple IVR. Asterisk answers, plays back a few messages, then allows limited DTMF input. Whenever I saw tech line calls fail, this is typical of what Asterisk was logging, [Oct 29 09:52:09] WARNING[21341][C-000029c0] translate.c: No translator path: (ending codec is not valid) [Oct 29 09:52:09] WARNING[21341][C-000029c0] translate.c: No translator path: (ending codec is not valid) [Oct 29 09:52:10] WARNING[21341][C-000029c0] channel.c: Unable to find a codec translation path: (slin) -> (g729) [Oct 29 09:52:10] WARNING[21341][C-000029c0] file.c: Unable to open techline-tadah (format (g729)): Function not implemented [Oct 29 09:52:10] WARNING[21341][C-000029c0] app_playback.c: Playback failed on PJSIP/verizon-ATF3-00005ada for techline-tadah [Oct 29 09:52:10] WARNING[21341][C-000029c0] channel.c: Unable to find a codec translation path: (slin) -> (g729) [Oct 29 09:52:10] WARNING[21341][C-000029c0] file.c: Unable to open techline-keypad-test-intro-short (format (g729)): Function not implemented [Oct 29 09:52:10] WARNING[21341][C-000029c0] app_playback.c: Playback failed on PJSIP/verizon-ATF3-00005ada for techline-keypad-test-intro-short [Oct 29 09:52:10] WARNING[21341][C-000029c0] channel.c: Unable to find a codec translation path: (slin) -> (g729) [Oct 29 09:52:10] WARNING[21341][C-000029c0] file.c: Unable to open beep (format (g729)): Function not implemented By: George Joseph (gjoseph) 2018-10-30 15:02:53.011-0500 I've finally been able to reproduce this issue! [~elguero] is correct that asymmetric_rtp_codec=yes is a workaround. I just tested it. It IS a bug though as we should not be allowing that g729 frame to get out of the rtp or channel driver layers in the first place. By: Paul Brooks (javauser) 2018-10-30 15:19:56.047-0500 I'll assume this means you won't need my pcap after all. By: dank (dank) 2019-09-11 18:22:13.299-0500 +1 for this.. Thanks for the hard work into this Paul. Workaround fixed it for me as well. For me it was: Using 16.5.0 VSP is Internode in Australia Only had issues with inbound mobile calls from Telstra/Optus.. May have been others. https://community.asterisk.org/t/16-5-codec-inbound-issue-from-mobile/80800 ** channel.c:5589 set_format: Unable to find a codec translation path: (g729) -> (alaw) ** By: Friendly Automation (friendly-automation) 2020-01-24 10:03:23.958-0600 Change 13503 merged by Friendly Automation: chan_pjsip: Ignore RTP that we haven't negotiated [https://gerrit.asterisk.org/c/asterisk/+/13503|https://gerrit.asterisk.org/c/asterisk/+/13503] By: Friendly Automation (friendly-automation) 2020-01-27 06:05:49.385-0600 Change 13686 merged by Joshua Colp: chan_pjsip: Ignore RTP that we haven't negotiated [https://gerrit.asterisk.org/c/asterisk/+/13686|https://gerrit.asterisk.org/c/asterisk/+/13686] By: Friendly Automation (friendly-automation) 2020-01-27 06:06:08.541-0600 Change 13672 merged by Joshua Colp: chan_pjsip: Ignore RTP that we haven't negotiated [https://gerrit.asterisk.org/c/asterisk/+/13672|https://gerrit.asterisk.org/c/asterisk/+/13672] By: Friendly Automation (friendly-automation) 2020-01-27 06:06:23.280-0600 Change 13685 merged by Joshua Colp: chan_pjsip: Ignore RTP that we haven't negotiated [https://gerrit.asterisk.org/c/asterisk/+/13685|https://gerrit.asterisk.org/c/asterisk/+/13685] |