ASTERISK-25632: res_pjsip_sdp_rtp: RTP is sent from wrong IP address when multihomed

[Home]

Summary: ASTERISK-25632: res_pjsip_sdp_rtp: RTP is sent from wrong IP address when multihomed

Reporter: Olivier Krief (okrief) Labels:

Date Opened: 2015-12-16 09:58:39.000-0600 Date Closed: 2016-02-15 14:07:15.000-0600

Priority: Major Regression? No

Status: Closed/Complete Components: Resources/res_pjsip_sdp_rtp

Versions: 13.6.0 Frequency of
Occurrence

Related
Issues:
is duplicated by ASTERISK-25637 Multi homed server using wrong IP

Environment: Centos 7 Attachments: ( 0) floating.pcap
( 1) native.pcap

Description: Setup is a cluster of two Asterisk boxes. Each box has several Ethernet interfaces and both share a set of floating IP addresses (pacemaker/corosync).

One box is volontarily powered off.
The remaining box send an outbound call to a PJSIP trunk.
Within SDP portion of INVITE message, I can read:
IN IP4 10.20.143.100
where 10.20.143.100 is a floating IP.
The INVITE message itself also comes from 10.20.143.100

A bit later, as captured with tcpcump, I can see outbound RTP is sent from
10.20.143.101 (non-floating IP).

Trunk's transport is configured with:
external_media_address : 10.20.143.100

I must add I'm new to clusters.

My question is:
is this feature supported (use of floating IP addresses)

Regards

Comments: By: Asterisk Team (asteriskteam) 2015-12-16 09:58:40.724-0600

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].
By: Daniel Journo (journo) 2015-12-20 15:05:30.120-0600

I'm having the same issue and opened a new issue before I saw yours.
ASTERISK-25637

From a discussion with 'file' on IRC, PJSIP determines which transport is going to be used. Asterisk doesn't know which transport was used, it only knows the IP of the endpoint. So it doesn't know which IP to send the RTP data out from. If there is only one transport, then it's obvious. But what about the case where there is more than one.

File made a comment saying that it might be possible to get PJSIP to make an educated guess as to which transport is being used, and then get the bind address, but I'm struggling to figure that one out.
By: Olivier Krief (okrief) 2016-01-05 07:58:42.075-0600

Hello,

Is there any update on this issue ?
Is there any input I can gather to help reproducing this ?

Regards
By: Joshua C. Colp (jcolp) 2016-01-05 08:04:51.712-0600

Any updates will be posted on this issue, as well as requests for any additional information. At this time there are no updates.
By: Daniel Journo (journo) 2016-01-05 08:26:08.273-0600

Olivier Krief, assuming that this might take some time to resolve, i'm looking at changing the default routes using a pacemaker resource agent. Maybe give that a go?
By: Daniel Journo (journo) 2016-01-07 10:28:28.787-0600

Following comments by 'file', I've tested the theory that PJSIP is letting the system decide which source IP to use. The hope was to fix this by changing the source IP on the system.

It seems like Asterisk is not allowing the system to determine the source IP as previously thought. I set the source IP to the pacemaker virtual IP. Ping tests show that the source IP change has taken effect correctly.

However, Asterisk is still sending audio out of the original IP.
By: George Joseph (gjoseph) 2016-01-07 10:29:53.503-0600

Can you reproduce this in a non-cluster environment?
If so, can you post your pjsip configs.
By: George Joseph (gjoseph) 2016-01-07 12:12:20.941-0600

I figured it out.
By: Olivier Krief (okrief) 2016-01-11 08:11:31.941-0600

Hello,
I could also reproduce this in a non-cluster environment:
SIP signaling uses expected "auxiliary" IP address
while RTP media uses "main or default" IP address.

Main PJSIP settings are:

[foo-transport]
bind=192.168.50.16
external-signaling-address=192.168.50.16
external-media-address=192.168.50.16
local_net=192.168.50.0/24

where 192.168.50.16 is an auxiliary IP added with a statement like ip addr add 192.168.50.16/24 dev eth0

Following "I figured it out" comment above, shall I provide more data ?
By: Daniel Journo (journo) 2016-01-11 08:26:24.168-0600

It's been fixed with the addition of some new options to go in the endpoint. From what George has told me, I'd expect it in 13.8.0.
Or you can download the patch and apply it yourself.

Unfortunately, the patch will not make Asterisk automatically use the floating IP.
It will allow you to make Asterisk bind to the floating IP using media_address=YOUR_FLOATING_IP and bind_rtp_to_media_address=yes in the endpoint settings. You can have different floating IPs for each endpoint if required.

The side effect of this is that endpoints will not be able to move from a public external floating IP to a private internal floating IP. Might be problematic if you have users who connect to Asterisk from within the network and also from outside the network. In this case, I would fix things with routing.
By: Olivier Krief (okrief) 2016-01-11 08:54:47.030-0600

@Daniel:
Can you elaborate a bit "endpoints will not be able to move from a public external floating IP to a private internal floating IP" ?
By: Daniel Journo (journo) 2016-01-11 11:55:44.034-0600

Scenario:
Server with two NICs.
NIC1 configured with an internal static IP: 10.10.10.1
NIC1 also has a floating IP: 10.10.10.3
NIC2 configured with an external static IP: 212.123.123.1
NIC2 also has a floating IP: 212.123.123.3

Before the patch, you would connect an external endpoint to 212.123.123.3 but when you make calls, the RTP data comes out of 212.123.123.1 which is the primary IP for that NIC.
After the patch, if you don't make any config changes, the same issue will occur.

After the patch, add the following config to an endpoint 'media_address=212.123.123.3' and 'bind_rtp_to_media_address=yes', this will fix things so that RTP data comes out of the floating IP for data going to that endpoint.

However, take that endpoint and plug it into the internal network and point it to the internal private floating IP 10.10.10.3, the audio will still try to go out of 212.123.123.3 until you change 'media_address=212.123.123.3' to 'media_address=10.10.10.3'

Unfortunately, there doesn't appear to be a way around this apart from playing with routing on the system.
By: Olivier Krief (okrief) 2016-01-12 02:24:49.163-0600

@Daniel:
In your example, how do your Asterisk and your enddpoint relate to each other ?
Does Asterisk register itself to your endpoint ? Is it the opposite ? Do both have a static address known to each other ?

By: Olivier Krief (okrief) 2016-01-18 03:39:38.335-0600

Hello,
I've seen patches present in this issue Source tab.
If you think I can be of any help testing them, please do not hesitate to ask.

By: Daniel Journo (journo) 2016-01-18 03:45:35.515-0600

The patch has already been accepted and will be released in Asterisk 13.8.0 which might be a few months as 13.7.0 was only just released.
By: George Joseph (gjoseph) 2016-01-18 09:11:08.182-0600

Olivier, You opened this ticket so please do test and see if the patch addresses your specific issue. As Daniel said, the patch was accepted and is in the current 13 git branch so if you grab it, you should be able to test.

By: Olivier Krief (okrief) 2016-01-19 06:12:13.471-0600

Using Github's commit 9a13df1b3c2c924dc4016a14eeadc254e5a7504b and pjsip.conf media_address and bind_rtp_to_media_address settings, I could successfully observe an RTP flux coming from an auxiliary IP address and not from main IP address as with asterisk 13.6.0 (or 13.7.0).

I'm planning to test anew this commit with my cluster environment and report my findings here.

Thank you very much, for all, anyway.

By: Olivier Krief (okrief) 2016-01-22 04:40:22.489-0600

My tests in my cluster environment were not successfull. Please find enclosed two PCAP files.

In native.pcap, an Avaya PBX is sending a call to IP 100.66.1.102, which I call native address. This call is correctly treated (Answer(), Playback(), Hangup())

In floating.pcap, the Avaya PBX is sending an INVITE to IP 100.66.1.103, which I call floating address.
Asterisk receives this INVITE but takes 16 seconds to reply back with a 200OK (it took 20ms during the previous test)
At the time this 200OK is sent, Avaya PBX has already terminated the call.

Both tests were run on the same box, the only difference being pjsip.conf content (with a core restart between tests).

By: George Joseph (gjoseph) 2016-02-15 14:07:15.293-0600

I'm not sure what's going on with the 16 seconds but I don't think it's related to the original issue so I'm going to close it.

The 16 seconds could be an issue with the outbound leg, or it could be something entirely different. If you can reproduce the 16 seconds, definitely open another issue.