Summary:ASTERISK-28827: res_rtp_asterisk: Loop when receive buffer is flushed by a received packet that is also in receive buffer with NACK
Reporter:nappsoft (nappsoft)Labels:patch webrtc
Date Opened:2020-04-14 06:54:15Date Closed:2020-04-17 06:08:03
Versions:16.9.0 Frequency of
Environment:Attachments:( 0) patch2.diff
Description:100% CPU usage could be observed during a WebRTC call with clients with a bad internet connection. When this happened, asterisk was no longer sending any rtp packets out and it stayed in this state until the system got in an out of memory situation (what happened after a few minuts). However asterisk was still processing incoming packets (even after no more rtp packets were arriving on network level) as in the console the following message could repeatedly be seen:  

== SRTP unprotect failed on SSRC 508868665 because of authentication failure 160

After looking into the code, I suspect that this happens because of the following fact:

When a packet arrives while a packet with the same sequence number is already in the data_buffer (what can happen when the client retransmits a packet or when there is an error in the sequence number...) asterisk will end up in an endless loop as the packet with the same sequence number like the one that is currently being processed will never be removed from the data_buffer what means that the while condition will be true forever.

I've attached a patch that should fix this issue. (If I've identified the right issue...)

In the end this means, that one single changeset introduced 3 severe bugs what one really wouldn't expect from a LTS release... (https://github.com/asterisk/asterisk/commit/f295af447db656d14218f7b61c4bd7bd78d0b194)
Comments:By: Asterisk Team (asteriskteam) 2020-04-14 06:54:16.061-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.

By: Joshua C. Colp (jcolp) 2020-04-14 07:00:10.849-0500

Unfortunately even with testing it's sometimes possible for such things to occur. We do our best to not have it happen, and even do release candidates that people can try before doing the actual release to try to elicit as much input as we can. I urge everyone who can to try release candidates to try to find stuff like this so it can be fixed sooner.

By: Joshua C. Colp (jcolp) 2020-04-14 07:28:54.140-0500

Do you plan on putting this change up for code review and inclusion?

By: Joshua C. Colp (jcolp) 2020-04-14 07:34:07.191-0500

As well, like I said I strongly urge you to test out release candidates. The code in question was in testing by us daily for over a week, it went through code review, and went through release candidates. Any help in those before a release occurs helps to catch stuff like this.

By: Joshua C. Colp (jcolp) 2020-04-14 07:38:40.700-0500

Assigning to you since you'll be putting it up for review.

By: nappsoft (nappsoft) 2020-04-14 08:07:04.087-0500

About testing: that's what we usually do, see here: ASTERISK-28659

By: Friendly Automation (friendly-automation) 2020-04-17 06:08:05.261-0500

Change 14208 merged by Joshua Colp:
res_rtp_asterisk: Resolve loop when receive buffer is flushed


By: Friendly Automation (friendly-automation) 2020-04-17 06:08:29.638-0500

Change 14237 merged by Joshua Colp:
res_rtp_asterisk: Resolve loop when receive buffer is flushed


By: Friendly Automation (friendly-automation) 2020-04-17 06:10:12.615-0500

Change 14235 merged by Friendly Automation:
res_rtp_asterisk: Resolve loop when receive buffer is flushed


By: Friendly Automation (friendly-automation) 2020-04-17 06:12:43.894-0500

Change 14236 merged by Joshua Colp:
res_rtp_asterisk: Resolve loop when receive buffer is flushed