|Summary:||ASTERISK-20904: RFC1918 NAT Issue On Prune|
|Date Opened:||2013-01-08 01:58:49.000-0600||Date Closed:||2013-02-28 22:38:18.000-0600|
|Environment:||Attachments:||( 0) asterisk-20904-nat-auto-and-rt-peersv2.diff|
( 1) rfc1918_patch.diff
|Description:||Issue is related to ASTERISK-20572, but appears to have been reintroduced in Asterisk 11.x.
Bring up a realtime peer behind separated from the Asterisk server behind NAT. When the peer is pruned and brought back up, the peer's external NAT address is replaced with the private IP from the contact header, if that address was present.
|Comments:||By: JoshE (n8ideas) 2013-01-08 02:18:27.649-0600|
Sample patch attached. Not sure this is really reflective of the best approach, but it solved the problem for me.
For additional information, I tested this with nat=yes (deprecated in 11) and nat=force_rport,comedia
Seems to work in both cases.
By: Michael L. Young (elguero) 2013-01-08 10:33:33.740-0600
I took a look and the original patch is still in 11. So, I think what you are saying is that the original issue was not really fixed in 11?
By: JoshE (n8ideas) 2013-01-08 10:46:52.612-0600
That's correct. The version of the patch I supplied did fix the problem in Asterisk 10. The version committed to branches/trunk did not, due to the way !found actually processes.
With a change with what appears to be the deprecation of nat=yes, the force_rport check now doesn't work properly either. The patch I supplied fixed the issue for me when the fullcontact is in the form of:
Not sure there isn't a better approach, but the supplied diff does resolve a two step issue - the zeroing out of the ipaddr field (which ensures that the fullcontact private address is used) and the force_rport check within that conditional.
By: Michael L. Young (elguero) 2013-01-08 11:50:15.192-0600
What settings do you have for nat? The setting in [general] and the setting for the realtime peer?
By: JoshE (n8ideas) 2013-01-08 11:53:41.550-0600
General is unconfigured. Realtime peer has been tried both with nat=force_rport,comedia and nat=yes (deprecated). Both exhibit same behavior.
By: Michael L. Young (elguero) 2013-01-10 13:46:54.611-0600
I would like to confirm some things.
Do you have "rtcachefriends" enabled? I am pretty sure that you do based on looking at the code path and based on your report.
I was not able to reproduce by pruning a realtime peer and then having the peer register again. But, I was able to reproduce by issuing "sip reload" (as was done in the prior issue related to this) and then have the peer register after the reload.
Can you confirm the above and the exact steps you did to reproduce this?
By: JoshE (n8ideas) 2013-01-10 13:52:21.350-0600
Correct. Rtcachefriends is enabled.
You won't reproduce by reregistering, but if you prune and then just have the UA execute a dial event, Asterisk attempts to repopulate the peer. It does using the private IP address and then the peer goes unreachable until it registers again.
By: Michael L. Young (elguero) 2013-01-10 13:58:58.916-0600
Thanks for the quick response.
Okay... so it looks like there are two issues. I reproduced this issue by issuing a sip reload and then the peer registered, it looks like I might have a fix for that.
I just confirmed pruning and then making a call reproduces this. Let me see what I can find for this second issue (which is what you originally reported :) ).
By: Michael L. Young (elguero) 2013-01-10 22:48:36.555-0600
Give this patch a try, if you could. [^asterisk-20904-auto-nat-prune_v1.diff]
A few things. The default global nat setting is now auto_force_rport. Keep that in mind. I have updated the CHANGES file to state that. Even though nat=yes was deprecated, I think it was the changes to add auto_force_rport and auto_comedia and then setting auto_force_rport as the default global setting which caused this bug to appear in branch 11.
The force_rport setting was having no effect because the flags used to turn those bits on and off were not being applied until later in the code, instead of before we needed them. I have moved that code up before we start needing to use those flags.
The check for "found" in relation to realtime is when we have realtime caching on. So, I changed it to not check for realtime but rather whether we have realtime caching on or not. This seems to take care of the reload problem from the other issue.
The prune problem is fixed by adding a check to see if nat was detected or not from the request coming from the peer when auto_force_rport is on. When it is not on, it is also fixed by moving those flags for determining if force_rport is on or not to being called sooner as I explained up above.
Let me know if that solves your problem. It seemed to be working well in my testing. I had to do a lot of debugging before finally pinpointing the exact areas where the problems were.
By: JoshE (n8ideas) 2013-01-16 01:30:25.278-0600
This patch appears work, though I haven't tested extensively at this point.
One thing to note, when nat = force_rport,comedia it doesn't work. When nat is set to auto_force_rport,auto_comedia, it actually does do what it's supposed to.
By: Michael L. Young (elguero) 2013-01-16 09:53:43.705-0600
JoshE, Thanks for the feedback. I just want to make sure I understand what you are saying. With the patch applied, "nat=force_rport,comedia" is not working?
By: JoshE (n8ideas) 2013-01-16 09:56:31.621-0600
That's correct. Didn't try any other combinations, but that was the behavior I was seeing.
By: Michael L. Young (elguero) 2013-01-16 10:08:27.343-0600
Hmm... that is a bummer since I specifically tested for that since that is what your original report was for. I tested the reloading as reported in the other issue and I tested for the prune scenario, with:
nat not set,
nat set to no,
with nat set with just force_rport,
with just comedia,
with force_rport and comedia,
and with auto_force_rport and auto_comedia.
Did you reload chan_sip.so module or asterisk after applying and compiling the patch?
In the [general] section of sip.conf, you still have "nat=" commented out?
I guess I will have to relook at this since basically what you are saying is that there is no change in behavior since auto_force and auto_comedia should be working without this patch at all.
I am just trying to figure out what I could be missing.
By: JoshE (n8ideas) 2013-01-21 15:40:46.749-0600
I did restart Asterisk completely. Let me retest some of these scenarios, before I go too crazy. I believe some of these were incorrect, but let me retest across everything just to be sure.
By: Michael L. Young (elguero) 2013-01-21 15:44:49.029-0600
Sure... thanks. We appreciate the help with testing.
By: JoshE (n8ideas) 2013-01-24 01:27:37.018-0600
I think I can definitely confirm that with nat=yes the problem still reoccurs when the fullcontact field has a private IP address in it. Haven't verified everything else, but this case looks clear from what I am seeing in my environment.
By: Michael L. Young (elguero) 2013-02-04 12:26:46.820-0600
JoshE... sorry to keep asking but with this patch in place I tried the following:
I have "nat" commented out in the sip.conf in the general context (unconfigured, using the default setting), localnet and externaddr commented out as well. For the realtime peer I have nat=yes. The realtime peer (behind NAT) registers. Everything looks good. I then do "sip prune realtime peer <peername>". I then attempt to dial a number from the peer that was just pruned. The peer is able to complete the call and the "ipaddr" is populated with the external IP address even though the fullcontact field contains a private ip.
Can you attach a debug log that perhaps can help show what is happening? Can you attach a copy of your sip.conf (sanitized version)? In sip.conf, do you have localnet set? Is externaddr set? I am just trying to figure out the difference between our two environments.
By: JoshE (n8ideas) 2013-02-10 17:38:41.248-0600
I went ahead and retested this and I think we're good. I can't reproduce this on my production servers running 11.2.1 with the straight patch as you have defined it.
I'll let you know if I have found another case that seems to present an issue, but this appears to be resolved from my testing.
By: Mark Michelson (mmichelson) 2013-02-27 16:04:46.560-0600
elguero: I've taken a look at this patch and it gets a "Ship it!" from me. Feel free to commit it. The only suggestion I have is that you upate other realtime scripts in the contrib/ directory to have proper 'nat' configuration.
By: Michael L. Young (elguero) 2013-02-27 16:18:08.275-0600
Mark, thanks for reviewing that last patch. I was getting ready to post a review since it has been in use for over a month now without any issues.
In bringing the patch up to the latest revision and reviewing it one more time, I made two changes that I would like you to confirm that is correct.
The changes I made were to remove copying the SIP_NAT_FORCE_RPORT flag a second time since a few lines up, all these flags were already copied. Added the copying of the flags to another part of the code that did not have the flags copied at all. In both places, run check_via which determines if nat is involved and sets SIP_NAT_RPORT and SIP_PAGE2_SYMMETRICRTP accordingly if auto is on. This is needed before running do_setnat.
If the "useglobal_nat" is turned on, currently we are only copying the SIP_NAT_FORCE_RPORT flag. I think we need to also copy SIP_PAGE3_NAT_AUTO_RPORT, SIP_PAGE2_SYMMETRICRTP and SIP_PAGE3_NAT_AUTO_COMEDIA as well. Can you confirm that this change I made is correct? I have been testing this for the past week+ and haven't seen any problems occur.
Attaching my latest patch, [^asterisk-20904-nat-auto-and-rt-peers.diff], which included changes to the other realtime schemas.
By: Mark Michelson (mmichelson) 2013-02-27 16:55:18.685-0600
I gave the new patch a quick look and nothing looks harmful there. In transmit_response_using_temp(), it doesn't really seem necessary to copy AUTO_COMEDIA or SYMMETRIC_RTP flags because the function is only used in order to transmit an error response using a temporary sip_pvt struct. There should be no media involved at all. However, it makes sense to set the AUTO_RPORT flag though.
The other changes all look perfectly fine to me. Thanks for keeping in mind the CHANGES file as well!