Summary:ASTERISK-20288: PhonerLite reports RTP read error when ICE Support Enabled
Reporter:Joern Krebs (smartbyte)Labels:
Date Opened:2012-08-21 01:33:49Date Closed:2012-09-05 12:43:07
Versions:SVN 11.0.0-beta1 Frequency of
is related toASTERISK-20702 res_rtp_asterisk.c:3455 ast_rtp_read: RTP Read error: Invalid or incomplete multibyte or wide character. Hanging up.
Environment:Linux UbuntuAttachments:( 0) debug
( 1) debug
( 2) jira_asterisk_20288_err_msg.patch
( 3) phonerlite-capture-filtered.pcap
( 4) phonerlite-capture-filtered2.pcap
( 5) tcpdump.log
( 6) wireshark.pcap
Description:O.K. this is a bit hard to explain for me:
If I use PhonerLite (and I can only replicate this issue with PhonerLite, but there it happens 100% of the time!) to call asterisk (any number, and any setup, nat, no-nat, etc.) I get the warning:
RTP Read error: Success. Hanging up
The "Success" part of the error should be the error message. BUT Success indicates, that there actually is no error.
So what I did was changing the code and checking if the errno is 0, ignore that line ... and it works.
I do not get this error with the SVN for Asterisk 10, just with SVN Asterisk 11.

So basically under some circumstances (I cannot replicate the real issue, ... even the debug log doesn't show any issues at all!)
the errno is set to 0, instead of whatever it should be...

Can please someone have a look at this.
(I would be open to test anything, although my dev capabilities are limited.)

It really is a show-stopper to me! :-(
Comments:By: Michael L. Young (elguero) 2012-08-21 14:16:40.529-0500

The error message is misleading and should be changed.  There must be something else happening, though, and ignoring that an error occurred would be masking the problem.  In taking a look at the code surrounding that error message, I am thinking that the ICE support is turned on and an error is occurring which leads to this message and the teardown of the call.

If you could provide a complete debug log with ICE support turned on to help triage the issue would be very helpful. This document will provide instructions on how to collect debugging logs from an Asterisk machine for the purpose of helping bug marshals troubleshoot an issue: https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information

Once you gather the logs, can you turn off ICE support in sip.conf and see if this problem still occurs?

By: Joern Krebs (smartbyte) 2012-08-21 17:21:47.212-0500

This is the asterisk debug log for this issue. I tried to keep it as small as possible.

By: Joern Krebs (smartbyte) 2012-08-21 17:25:05.009-0500

I attached the debug log to this issue and tested setting the icesupport flag.

With ice support enabled that error occurs, with
icesupport = no
(in the peer section) it doesn't.
You found a workaround for me! Thanks!
(Nevertheless I am happy to assist in further investigations!)

By: Matt Jordan (mjordan) 2012-08-23 08:34:06.301-0500

The issue to fix here is ast_rtp_read's assumption that a failure in rtp_recvfrom implies that errno has been set.  While in many cases that should be true, pj_ice_sess_on_rx_pkt can cause __rtp_recvfrom to fail without setting an errno.

Other than that, we should document that PhonerLite appears to not play nicely with the ICE settings.  If you could provide a DEBUG log complete with 'sip set debug on' enabled, that might give us some more clues as to what PhonerLite responds with when we send it an INVITE with an SDP containing ICE candidates.

By: Joern Krebs (smartbyte) 2012-08-23 17:47:16.459-0500

This is the debug with "sip set debug on".

By: Joshua C. Colp (jcolp) 2012-08-23 19:09:24.070-0500

After taking a gander at this some I think PhonerLite is sending STUN packets over the RTP and RTCP ports. This would cause pjnath to return an error, since no candidates were negotiated. Would it be possible to get a packet capture as well to confirm this?

By: Michael L. Young (elguero) 2012-08-23 19:17:18.874-0500

Glad that I was thinking things through correctly.  That is why I stated that the error message was misleading and needed to be changed.  I saw the same thing that you just wrote about assuming errno was set.

I have been playing with Phonerlite and trying to figure out what is going on with it and Asterisk.  It would appear that Phonerlite doesn't support ICE (I don't see any settings or anything on the website about support for ICE).  So, Phonerlite doesn't send any ICE candidates when sending the INVITE to Asterisk.  I added some debug messages using functions from pjlib to find out what error message pj_ice_sess_on_rx_pkt is returning back and it is "Bad Request", which is correct since there was no ICE candidates in the SDP.

I have been reviewing rfc5245 and found section 5.1.  It would appear that we need to verify ICE support first before going through the ICE procedures.  Asterisk should parse the SDP on the initial request and if there are no ICE elements in the SDP, we should handle the call as if ICE is turned off for the peer.

By: Joshua C. Colp (jcolp) 2012-08-23 19:22:08.667-0500

We don't perform the ICE negotiation if there are no remote candidates. The ICE structure still sticks around though, so if a STUN packet comes in... it would get caught by it. Tweaking things so the ICE session was destroyed would of course "fix" this but knowing why the heck it's sending a STUN packet would be useful too.

By: Michael L. Young (elguero) 2012-08-23 19:27:48.780-0500

You are correct... sorry for the wrong terminology.  I am working on getting a packet trace for you and hopefully Joern does too.

One thing I am noticing, if there are no remote candidates, should we be bothering with sending local candidates?  I have been playing around with destroying the ICE session, which "fixes" it but it doesn't feel right seeing Asterisk send ice candidates when we know that the other end doesn't have any candidates.  Or perhaps I am still not quite understanding how it should work?  I am not that familiar with ICE and have been trying to understand it better while gathering data for this issue.

By: Joern Krebs (smartbyte) 2012-08-23 19:28:46.240-0500

tcpdump ASCII Log. I hope it's usefull.

By: Joshua C. Colp (jcolp) 2012-08-23 19:29:02.807-0500

It shouldn't hurt things to send them, but not sending them would indeed be better.

By: Michael L. Young (elguero) 2012-08-23 20:26:02.150-0500

I have attached a packet capture using Phonerlite.

By: Joern Krebs (smartbyte) 2012-08-23 20:36:49.920-0500

I think what you are looking for is this little Packet (not found in the Wireshark capture)
10:26:11.790543 IP > UDP, length 20

10:26:11.790739 IP > UDP, length 64
E..\..@.@.i.g......r<\...H.....,!..B.b.....*b..a. ......Bad Request.."..pjnath-2.0...(...\.J

As tcpdump doesn't show non-ASCII characters you cannot "read" the request.
But the Bad request answer basically confirms that the Asterisk ICE implementation has an issue with PhonerLite.

By: Michael L. Young (elguero) 2012-08-23 20:51:05.228-0500

Yep, somehow I filtered out more than I wanted to.  Attaching another file that should contain those RTP packets.

By: Joern Krebs (smartbyte) 2012-08-23 20:57:42.599-0500

This is a wireshark capture from my windows PC. But basically it doesn't give more info than the tcplog protocol - at least not for me. :-/
phonerlite-capture-filtered2.pcap had some many duplicate lines. Where do they come from? My capture looks a lot cleaner.

By: Richard Mudgett (rmudgett) 2012-08-29 16:02:42.856-0500

[^jira_asterisk_20288_err_msg.patch] should give a more meaningful message than "Fail: Success".

You should get these messages:
PJ ICE Rx error status code: %d '%s'.
RTP Read error: Likely an ICE or SRTP failure. Hanging up.

By: Richard Mudgett (rmudgett) 2012-09-04 16:41:54.772-0500

Can someone please confirm that the [^jira_asterisk_20288_err_msg.patch] now gives a usable message from PJ ICE.

By: Michael L. Young (elguero) 2012-09-04 21:00:56.234-0500

Richard, here is the error message I am getting with the applied patch.  Yes, with the patch it is a usable message from PJ ICE.

[2012-09-04 21:45:41] WARNING[26756][C-00000000]: res_rtp_asterisk.c:899 __rtp_recvfrom: PJ ICE Rx error status code: 370400 'Bad Request'.
[2012-09-04 21:45:41] WARNING[26756][C-00000000]: res_rtp_asterisk.c:2887 ast_rtp_read: RTP Read error: Invalid or incomplete multibyte or wide character. Hanging up.

If you look at the notes above, I had added these debug messages on my side but didn't post the patch.  So, it is the same error message, 'Bad Request', from PJ_ICE.

If the client doesn't send ice candidates in the initial invite, Asterisk should not be using ice for the session or sending local ice candidates.

Just one comment about patch:

I think 'status' should be of type pj_status_t to match the code in res_rtp_asterisk as well as pj.

By: Richard Mudgett (rmudgett) 2012-09-05 12:42:23.046-0500

I am just fixing the confusing error message when ICE fails.