[Home]

Summary:ASTERISK-30477: res_srtp: ROC reset bugs
Reporter:Phil Lavin (phil-lavin)Labels:
Date Opened:2023-03-23 14:34:46Date Closed:2023-04-10 12:00:18
Priority:MinorRegression?No
Status:Closed/CompleteComponents:Resources/res_srtp
Versions:16.30.0 Frequency of
Occurrence
Constant
Related
Issues:
Environment:Debian 11Attachments:
Description:I think ASTERISK-29519 noticed the issue but didn't describe the root cause correctly or provide useful analysis.

There are two separate cases where Asterisk resets the rollover counter (ROC) on the SRTP stream it outputs, meaning the stream has invalid auth tags and is rejected by remote parties who verify the SRTP auth tags (e.g. Oracle SBCs).

The first is [here|https://github.com/asterisk/asterisk/blob/1bd8694a6e428bf17fc28506de92c177b42981a0/main/rtp_engine.c#L2744].

When using chan_sip (I haven't tested pjsip), ast_rtp_instance_add_srtp_policy() is called when the remote party sends a re-INVITE which re-keys the SRTP stream (see [backtrace|#backtrace] below). ast_rtp_instance_add_srtp_policy() replaces the libsrtp resource, adding the remote stream at the same time. It then re-adds the local stream. If the local stream's sequence has rolled over by this point, its ROC was 1 however the act of re-creating the libsrtp resource resets it back to 0 and outputs invalid auth tags.

The second is [here|https://github.com/asterisk/asterisk/blob/1bd8694a6e428bf17fc28506de92c177b42981a0/res/res_srtp.c#L424].

When Asterisk gets back an err_status_replay_old from libsrtp, it resets the libsrtp session and re-adds both remote and local stream. As above, local stream ROC is reset back to 0 at this point. This was initially discovered because the remote SBC (our friends Oracle, again) incorrectly reset the RTCP index on every re-INVITE. This triggers the replay_old error and subsequently triggers Asterisk to reset the stream.

I have automated tests to replicate both issues. In essence, they make a sip call to Asterisk using sipp, capture the media IP/port and crypto key from the 200 response, start a tcpdump, wait 23 minutes, re-key or send invalid RTCP (depending on which test), hang up the call, run the tcpdump capture through libsrtp rtp_decode and then analyse the auth tags. This issue can be replicated every time, on Asterisk 16.

I have a working patch for Asterisk 1.8 (don't ask!) which may better help understand what the issue is but isn't entirely applicable to Asterisk 16. It's hacky in such that it uses libsrtp "private" internals to preserve the index across sessions. libsrtp 2.1 added functions to get and set the roc (https://github.com/cisco/libsrtp/pull/289), which makes it cleaner for Asterisk 16. I'm happy to create and test a patch for Asterisk 16 which uses the libsrtp 2.1 functions, if you think this is the right approach.

Another approach would be to ensure that Asterisk always changes the ssrc of its stream when it resets it and, as such, a ROC value of 0 would be valid. A further approach would be to stop using "wildcard" ssrc for the remote stream in libsrtp and, rather, use an ssrc-specific stream. This means that stream could be replaced without having to replace the whole libsrtp session.

Happy to take some guidance on the optimal solution then crack on with the patch.

I'm reasonably sure this bug applies all the way up to Asterisk 20 because the code is the same in the affected areas, though I haven't tested it.

{anchor:backtrace}
{noformat}
#0  0x00007fe37ecd2b78 in srtp_stream_init () from /lib/libsrtp.so.1
#1  0x00007fe37ecd4a6b in srtp_add_stream () from /lib/libsrtp.so.1
#2  0x00007fe37ecf961d in ast_srtp_add_stream (srtp=0x7fe34c07b4a0, policy=0x7fe34c054228) at res_srtp.c:509
#3  0x0000000000523927 in ast_rtp_instance_add_srtp_policy (instance=0x7fe34c0779b8, remote_policy=0x7fe34c103998, local_policy=0x7fe34c054228) at rtp_engine.c:1843
#4  0x00007fe37e9f6599 in sdp_crypto_activate (p=0x7fe34c07b260, suite_val=1, remote_key=0x7fe37e281270 "\345Q\020\351|\266s\215\202\334\350\315.\343Q\351ad\217\f\303\070\260\340\247>\265m\364\004", rtp=0x7fe34c0779b8) at sip/sdp_crypto.c:172
#5  0x00007fe37e9f6baa in sdp_crypto_process (p=0x7fe34c07b260, attr=0x7fe34c2f427b "crypto:1 AES_CM_128_HMAC_SHA1_80 inline:5VEQ6Xy2c42C3OjNLuNR6WFkjwzDOLDgpz61bfQE", rtp=0x7fe34c0779b8) at sip/sdp_crypto.c:278
#6  0x00007fe37e9f26d1 in process_crypto (p=0x7fe34c075308, rtp=0x7fe34c0779b8, srtp=0x7fe34c0767b0, a=0x7fe34c2f427b "crypto:1 AES_CM_128_HMAC_SHA1_80 inline:5VEQ6Xy2c42C3OjNLuNR6WFkjwzDOLDgpz61bfQE") at chan_sip.c:30593
#7  0x00007fe37e988062 in process_sdp (p=0x7fe34c075308, req=0x7fe37e286330, t38action=1) at chan_sip.c:9454
#8  0x00007fe37e9cd418 in handle_request_invite (p=0x7fe34c075308, req=0x7fe37e286330, debug=0, seqno=11, addr=0x7fe37e2862a0, recount=0x7fe37e28624c, e=0x7fe34c2f3e6f "sip:15555551234@172.19.149.244:5060", nounlock=0x7fe37e286248) at chan_sip.c:23478
#9  0x00007fe37e9d7f60 in handle_incoming (p=0x7fe34c075308, req=0x7fe37e286330, addr=0x7fe37e2862a0, recount=0x7fe37e28624c, nounlock=0x7fe37e286248) at chan_sip.c:26032
#10 0x00007fe37e9d8984 in handle_request_do (req=0x7fe37e286330, addr=0x7fe37e2862a0) at chan_sip.c:26218
#11 0x00007fe37e9d8599 in sipsock_read (id=0x2a75360, fd=10, events=1, ignore=0x0) at chan_sip.c:26151
#12 0x00000000004da609 in ast_io_wait (ioc=0x2a60fb0, howlong=649) at io.c:292
#13 0x00007fe37e9da2d4 in do_monitor (data=0x0) at chan_sip.c:26704
#14 0x000000000055e8d2 in dummy_start (data=0x2a75360) at utils.c:1173
#15 0x00007fe382068fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#16 0x00007fe38232b06f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
{noformat}
Comments:By: Asterisk Team (asteriskteam) 2023-03-23 14:34:50.437-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. Please note that log messages and other files should not be sent to the Sangoma Asterisk Team unless explicitly asked for. All files should be placed on this issue in a sanitized fashion as needed.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.

Please note that by submitting data, code, or documentation to Sangoma through JIRA, you accept the Terms of Use present at [https://www.asterisk.org/terms-of-use/|https://www.asterisk.org/terms-of-use/].

By: Phil Lavin (phil-lavin) 2023-03-23 14:39:51.257-0500

Patch for 1.8 attached. As mentioned, not entirely applicable to Asterisk 16 but it gives an idea of the proposed fix

By: Joshua C. Colp (jcolp) 2023-03-23 14:43:25.286-0500

Just so you're aware, Asterisk 16 does not receive bug fixes. It is in security fix only status. Any changes will not be applied to it. Asterisk also supports older versions of libsrtp before 2, so any changes that are not backwards compatible would not be accepted.

Changing the SSRC is something that has been done in the past for other purposes, with variable results sometimes causing issues and sometimes not depending on the remote endpoint. I think if that were done it would need to be behind an option in case it breaks existing working SRTP sessions with endpoints.

I don't think there is a "best" option. There's just various options with tradeoffs, but trying the SSRC option may be the best to ensure compatibility across different versions of libsrtp - if it works.

By: Joshua C. Colp (jcolp) 2023-03-23 14:44:07.924-0500

The patch has been removed. Any attached patches must be marked as a code contribution with a contributor license agreement signed.

By: Phil Lavin (phil-lavin) 2023-03-23 14:56:12.447-0500

Thanks, Joshua. Understood re 16 - patches would apply to versions up to and including 20 but I'd need to spend some time getting that set up and the issue replicated.

Resetting SSRC, behind a feature flag, doesn't seem entirely ideal, given the issues you have highlighted with certain remote endpoints. Enabling the flag may fix the issue for some endpoints and make it worse for others.

The safest thing to do would be to preserve the ROC, when we re-create the stream. This is entirely transparent to the remote endpoints.

I didn't realise latest Asterisk versions supported libsrtp1 - thanks for the info. Are you averse to accessing the private structures within libsrtp1? I.e. including srtp/srtp_priv.h. It's hacky but workable. If this isn't desirable, is it acceptable to only fix this for libsrtp >= 2.1 users (i.e. wrapping the code in #if blocks)? libsrtp2 has been around for some years so I'd be surprised if there were many users on latest Asterisk versions, running on an OS which doesn't provide v2.

Let me know your thoughts, when you get a moment.

By: Joshua C. Colp (jcolp) 2023-03-23 15:00:09.026-0500

You'd be surprised, then. :D

I think it would be acceptable to fix it for >= 2.1 users, though looking at the available versions across distro to see how widely available it is would need to be done.

By: Phil Lavin (phil-lavin) 2023-03-23 15:03:49.891-0500

Thanks for the advice. I'll spend some time over the next couple of weeks getting it replicated on Asterisk 20 and I'll send over a patch. Also understood re contributor license agreements. My corporate overlords have some polices about that so give me some time to navigate those first.

By: Asterisk Team (asteriskteam) 2023-04-10 12:00:16.613-0500

Suspended due to lack of activity. This issue will be automatically re-opened if the reporter posts a comment. If you are not the reporter and would like this re-opened please create a new issue instead. If the new issue is related to this one a link will be created during the triage process. Further information on issue tracker usage can be found in the Asterisk Issue Guidlines [1].

[1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines