Summary:ASTERISK-27646: ICE fails with no candidate nominated
Reporter:Thomas Guebels (tguescaux)Labels:patch
Date Opened:2018-01-31 10:34:21.000-0600Date Closed:2018-02-23 08:14:37.000-0600
Versions:13.18.0 Frequency of
Environment:Attachments:( 0) ASTERISK-27646.diff
( 1) C-0000003b.log
( 2) udp_C-0000003b.pcap
( 3) udp_C-0000031C_20180207.pcap
( 4) workaround_ice_set_role.diff
Description:Since the change made in ASTERISK-27088, we have intermittent ICE issues.

The ICE negotiation fails on the asterisk side with all checks succeeded but none nominated.

[2018-01-31 16:22:00.159361] DEBUG[4908][C-00000023] pjproject: icess0x7efe980 .ICE process complete, status=All ICE checklists failed (PJNATH_EICEFAILED)
[2018-01-31 16:22:00.159377] DEBUG[4908][C-00000023] pjproject: icess0x7efe980 .Valid list
[2018-01-31 16:22:00.159394] DEBUG[4908][C-00000023] pjproject: icess0x7efe980 . 0: [1]> (not nominated, state=Succeeded)
[2018-01-31 16:22:00.159414] DEBUG[4908][C-00000023] pjproject: icess0x7efe980 . 1: [1]> (not nominated, state=Succeeded)

On the browser (Chrome 64) side, it thinks that ICE succeeded but there is one-way voice and the call soon stops because asterisk doesn't reply to STUN pings anymore.

It seems to be timing dependent as it can only be reproduced with at least 2 network paths with similar latency. When one is noticeably slower, it doesn't happen.
Comments:By: Asterisk Team (asteriskteam) 2018-01-31 10:34:21.772-0600

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: George Joseph (gjoseph) 2018-01-31 14:04:56.558-0600

[~jcolp] took a look at this and at first glance didn't think it was related to the earlier patch.  Can you get a wireshark capture of the exchange and can you also confirm that you're using the bundled version of pjproject?   I know you're using chan_sip but pjproject is used for ICE.

By: Thomas Guebels (tguescaux) 2018-02-01 03:22:18.404-0600

Reverting the changes made in ASTERISK-27088 makes the issue disappear for me. More precisely, since that patch has 2 parts, either setting the ICE role earlier as before or deactivating the other part about unidirectional ICE, fixes the issue.

I don't use the bundled version of pjproject. I can reproduce with an external pjproject 2.5.5 and even when upgrading to 2.7.1.

By: Thomas Guebels (tguescaux) 2018-02-01 03:23:14.306-0600

Added pcap of the failed ICE on asterisk side and corresponding debug log

By: Joshua C. Colp (jcolp) 2018-02-06 13:03:33.685-0600

I'm currently working on this and digging into things, would you have a working packet capture as well?

By: Thomas Guebels (tguescaux) 2018-02-07 03:46:14.395-0600

With the setup I currently use to reproduce, about 1 calls in 10 succeeds. I took a capture of that case (udp_C-0000031C_20180207.pcap). You can see a role conflict being solved in STUN.

By: Joshua C. Colp (jcolp) 2018-02-07 05:08:36.035-0600

The problem itself is in chan_sip, it isn't configuring ICE properly so it has the incorrect role. It was relying on the negotiation to sort itself out automatically.

By: Thomas Guebels (tguescaux) 2018-02-07 05:21:17.184-0600

From what we saw, chan_sip is calling ice->set_role in add_sdp or process_sdp, but with the ASTERISK-27088 change this doesn't actually set the role of ICE session anymore, it is "delayed" until ice->reset is called. Do you mean chan_sip should now reset ICE at some point ?

By: Joshua C. Colp (jcolp) 2018-02-07 05:24:44.439-0600

It's not calling it in add_sdp at the start of the call, which means the ICE session is in "controlled" state while it should be in the "controlling" state. I've attached a patch which may help, give it a try.

By: Thomas Guebels (tguescaux) 2018-02-08 04:32:34.868-0600

I tested the patch, it doesn't improve the situation. ice_start is now indeed called in add_sdp (actually we saw the SIP_RESPONSE condition earlier when trying to debug it and were confused by it) but this just stores it in rtp->role.

Later on, when we receive the SDP from the browser, process_sdp calls change_ice_components and start_ice and this triggers 2 ICE reset:

[2018-02-08 10:11:42.459202] DEBUG[25391][C-00000025] res_rtp_asterisk.c: Resetting ICE for RTP instance '0x7f1e5c030bd0'
[2018-02-08 10:11:42.459215] DEBUG[25391][C-00000025] res_rtp_asterisk.c: Nevermind. ICE isn't ready for a reset
[2018-02-08 10:11:42.459224] DEBUG[25391][C-00000025] res_rtp_asterisk.c: Set role to CONTROLLING (0x7f1e5c030bd0)
[2018-02-08 10:11:42.459240] DEBUG[25391][C-00000025] res_rtp_asterisk.c: Resetting ICE for RTP instance '0x7f1e5c030bd0'
[2018-02-08 10:11:42.459247] DEBUG[25391][C-00000025] res_rtp_asterisk.c: Nevermind. ICE isn't ready for a reset

However the role still isn't applied to the ICE session since now the state check doesn't pass. We are already nominating (note that this is apparently cause by the ICE aggressive nomination activated in pjnath).

By: Thomas Guebels (tguescaux) 2018-02-08 04:35:12.583-0600

Here is the workaround we are currently using, allowing to set the role on ICE start.

By: Joshua C. Colp (jcolp) 2018-02-08 05:50:52.743-0600

I think that change is safe and shouldn't cause a problem.

By: Joshua C. Colp (jcolp) 2018-02-08 12:21:51.295-0600

I've dug deeper into things and I think your fix is the right path forward for this. I'd suggest taking it through the contribution process[1] so it can be included, specifically putting it up for review. Is this something you would be willing to do?

[1] https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process

By: Thomas Guebels (tguescaux) 2018-02-09 03:41:08.507-0600

Yes, sure. I'm reading the process documentation right now.

By: Friendly Automation (friendly-automation) 2018-02-23 08:14:37.998-0600

Change 8262 merged by Jenkins2:
res_rtp_asterisk: Fix ICE candidate nomination


By: Friendly Automation (friendly-automation) 2018-02-23 08:21:52.017-0600

Change 8264 merged by Jenkins2:
res_rtp_asterisk: Fix ICE candidate nomination


By: Friendly Automation (friendly-automation) 2018-02-23 08:28:17.293-0600

Change 8265 merged by George Joseph:
res_rtp_asterisk: Fix ICE candidate nomination