[Home]

Summary:ASTERISK-03641: chan_sip does not support pre-authenticated re-REGISTER/does not deal with non-compliant gateway
Reporter:Vahan Yerkanian (vahan)Labels:
Date Opened:2005-03-07 07:25:37.000-0600Date Closed:2011-06-07 14:05:05
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Channels/chan_sip/Registration
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) ap1002-debug1.txt
( 1) ap1002-debug2.txt
( 2) ap1002-debug2.txt
( 3) patch_register_sip.txt
( 4) patch_register_sip2_invite_bug.txt
( 5) patch_register_sip2_register.txt
( 6) patch_register_sip2.txt
( 7) same-register-on-snom-sip-proxy.txt
Description:For the past 2 months I've been struggling with registration problems with asterisk+external FXS/FXO gateways (www.addpac.com) that use rfc3665 re-registration procedure for the 2nd and following registrations to the asterisk. This problem occurs for devices with more than one FXS port.

Those clients attempt to re-register after the initial register timeout period expires fully compliant with RFC3665, clause 2.2 (http://www.zvon.org/tmRFC/RFC3665/Output/chapter2.html#sub2), but asterisk fails to authenticate them.

The 1st FXS port of the device always registers successfuly, but the remainder fail miserably. Using an account/username with an empty password for the affected ports fixes the problem.

I've spent 2 weeks debugging this with addpac development team, and the same device authenticates flawlessly with Sonus Proxy Server, SNOM Proxy Server, LongBoard Proxy Server, Nortel Proxy so this seems to be a problem with chan_sip.

I'm attaching the the sip debug - 1st registration and 2nd re-registration attempts for two FXS ports are separated by !!!!s. Notice how in the 2nd attempt FXS gateway attempts to re-REGISTER sending REGISTER with full auth info, while asterisk replies to it as an initial RFC3665 clause 2.1(http://www.zvon.org/tmRFC/RFC3665/Output/chapter2.html#sub1) registration attempt.


****** ADDITIONAL INFORMATION ******

Asterisk CLI output:

sip*CLI>
sip*CLI>
   -- Registered SIP '1038' at 195.250.74.131 port 5060 expires 60
   -- Saved useragent "AddPac SIP Gateway" for peer 1038
   -- Registered SIP '1039' at 195.250.74.131 port 5060 expires 60
   -- Saved useragent "AddPac SIP Gateway" for peer 1039
sip*CLI>
sip*CLI>
sip*CLI>
sip*CLI>
sip*CLI>
   -- Saved useragent "AddPac SIP Gateway" for peer 1038
Feb  1 01:26:05 NOTICE[1916]: chan_sip.c:7654 handle_request: Registration from 'sip:1039@sip.arminco.com' failed for '195.250.74.131'
sip*CLI>
sip*CLI>
sip*CLI>
sip*CLI>
   -- Saved useragent "AddPac SIP Gateway" for peer 1038
   -- Registered SIP '1039' at 195.250.74.131 port 5060 expires 60
   -- Saved useragent "AddPac SIP Gateway" for peer 1039
sip*CLI>
sip*CLI>
sip*CLI>
   -- Saved useragent "AddPac SIP Gateway" for peer 1038
Feb  1 01:27:42 NOTICE[1916]: chan_sip.c:7654 handle_request: Registration from 'sip:1039@sip.arminco.com' failed for '195.250.74.131'
sip*CLI>
sip*CLI>
Comments:By: Vahan Yerkanian (vahan) 2005-03-07 07:45:45.000-0600

ap1002-debug2.txt (19,371 bytes) 03-07-05 07:41 is the longer sip debug that shows several registration attempts.

Note that after the 2nd FXS's re-registration fails, gateway attempts to register it with full REGISTER next time and succeeds. However on next attempt (re-register timeout was set to 1 min), gateway sends short re-REGISTERs, and while the 1st and 2nd FXS re-registers follow the rfc3665, the 2nd ports auth fails.

Hence the 50% auth failure rate for 2nd+ FXS ports.

By: Olle Johansson (oej) 2005-03-07 09:57:21.000-0600

This is clearly an error in Asterisk. We will need some time working on this since it's the Von conference this week, so please do not expect quick answers.

Asterisk currently do not support re-use of nonces for re-registration, we just added that function when Asterisk registers to another SIP proxy. However, we should not answer "Forbidden", that's a bit strange in the case of a re-registration with re-use of a previous nonce.

By: Vahan Yerkanian (vahan) 2005-03-07 11:30:06.000-0600

oej, any way of doing a fast temporal local patch? we can't deploy lots of equipment because of this..

By: Kevin P. Fleming (kpfleming) 2005-03-07 15:19:44.000-0600

This is not a major bug, please re-read the bug posting guidelines.

Unless you can locate some documentation that claims Asterisk _is_ RFC-3665 compliant, this should be categorized as a feature request, not a bug.

By: Vahan Yerkanian (vahan) 2005-03-11 14:22:38.000-0600

Just added file same-register-on-snom-sip-proxy.txt that contains the sip debug from the AddPac VoiceFinder 1002 gateway registering/re-registering both FXS ports successfully on demo version of Snom SIP Proxy server.

They just reply with

'407 Proxy Authorization Required'

instead of

'401 Unauthorized'

on both initial REGISTER and following re-REGISTER.


The patch shouldn't be a difficult one, anyone?

By: Vahan Yerkanian (vahan) 2005-03-11 14:32:00.000-0600

Also the bug category should be changed from '[SIP] Subscriptions' to '[SIP] Registration' IMHO.

By: Kevin P. Fleming (kpfleming) 2005-03-12 01:02:29.000-0600

OK, first off: the Snom server is using '407 Proxy Authentication Required' in response to REGISTER, because it is a proxy, and SIP proxies are not allowed to use '401 Unauthorized' in response to any SIP request. Your chosen example (RFC3665) shows '401 Unauthorized' being used, as it should be when the UAC is registering to a UAS or registrar, not a proxy.

Please keep in mind that Asterisk is _NOT_ a proxy, and does not follow the SIP example flows for a proxy, but the example flows for a UAS do apply. These are different roles, and result in different responses and behavior.

I don't see the relevant portions of your sip.conf file posted; please do so.

From your traces, it appears that your SIP gateways are not quite RFC3261 compliant. I see two problems:

1) A two-port ATA is _two_ UACs, not one. When the two UACs register, they should be using different Call-ID values, but they are not.

2) According to RFC3261 Section 10.2, a UAC SHOULD (RFC language) use the same Call-ID value for all REGISTERS sent to the same registrar. Your gateway is not doing that, at least not in the first debug trace. In the second one, it appears to be using the proper Call-ID for re-REGISTERs, but it's still sharing the same Call-ID between the two UACs.

I suspect what may be happening is that in addition to oej's comments about Asterisk not supporting re-use of a nonce for registration, your second port is failing because it is reusing the same Call-ID and the SIP structure related to the first registration has not yet expired out of memory, so chan_sip tries to use the auth information present in it and fails (saying 'Forbidden') because the secret calculation/match produces a failure.

If you can get the gateway to properly use different Call-IDs for its ports (since they are clearly separate UACs), then the registration will work properly, although Asterisk will still fail the re-REGISTER with auth info provided, and require the gateway to authenticate on each REGISTER.

By: khb (khb) 2005-03-12 03:09:14.000-0600

This is very likely an interplay of several issues and problems.
Aside from that, RFC-3665 is not a normative document, i.e. a device cannot be compliant to it, as it does not specify any requirements. Instead it is a best-practices compilation of examples or valid interpretations that conform to the specification, which is RFC-3261 and others.

Your sip traces point out several problems, on both sides, Asterisk and your gateway. Your gateway's protocol problems are alleviated by its transmission timing.  The gateway uses the same (partial) dialog identifier (i.e. the combination of Call-Id and From-tag) for both registration requests for the two ports, in the very least it should use a new From-tag for the second one, if not a new Call-Id as well.  On the re-registration then it does use a new Call-ID, but again the same for both ports.  Fot the re-registration it is recommended that the same Call-Id be used as originally.  The gateways behavior works out without problems though, because it seems to transmit the two registrations in strictly sequential fashion, waiting for the first registration to complete before sending the second.
Now the Asterisk side: Asterisk still sends the 100 Trying response, some one should finally delete that ONE line in chan_sip that generates it, to make asterisk sip look less silly. It is even a malformed 100 Trying, since it should not have a To-Tag.  But the 100 Trying causes no harm in the process.
The initial registration for the two ports succeeds, it seems.  I am guessing that this is the first registration after Asterisk starts?
The re-registrations show the real problems that Asterisk has more clearly.
First of all, Asterisk does not properly accept credentials with prior nonces, since it requires to still have a copy of the nonce stored in the dialog private  data space (pvt) and it doesn't store nonces anywhere else, the pvt gets destroyed after 15 seconds or so.  So it sends 401, which is the correct behavior, if it doesn't want to accept old credentials.
There is no formal requirement to accept old credentials, and you can't interpret RFC-3665 that way.
The SNOM proxy sends 407, which is ok also, since that is a proxy and not a SIP endpoint.  Asterisk should never send 407, but there is still some 407 stuff in auth_check).
The first port succeeds, ok. the less obvious question is why the second port registration fails.  The reason for that is that the first port call pvt is still hanging around and it has the same Call-ID as the second port request coming in.  So, Asterisk, which never does proper dialog id matching can't distinguish that the second port's is a different device and reuses the first ports pvt.  And since this pvt contains the nonce of the first port, it doesn't know that this is a new call and so it doesn't send a 401 offering a new nonce.
The way it's written it can't match up the correct username/password for the second port and it can't send a 401 since it thinks it already did that, and therefor sends a 403.  Sending a 403 here is also wrong, it should just keep sending 401, unless there is some more restricting local policy in place for escalating failed attempts, but not after the first one.
In this failure senario you can also see why your gateway should be using different Call-Ids for each port.  Or if, at a minimum, it used different tags, Asterisk could, if it did proper dialog-id matching, differentiate between the two calls.  Asterisk has this pedantic=yes parameter for sip checking which does turn on tag matching (among other things) but that is half broken in its own way, but should be always done in any case.

So, your problem is now completely understood.
You can do a quick fix, by expiring or destroying registration sip calls (the pvt) immediately after each registration process is done, not scheduling it to hang around for another 15 seconds.

Your issue really is a lot more than just a feature request,
as it is an expression of major bugs in Asterisk SIP.
It's improper protocol handling at the most basic levels.

By: Vahan Yerkanian (vahan) 2005-03-12 06:29:39.000-0600

Here is the snip from my sip.conf, same problem is when I use the MYSQL_FRIENDS in 1.0.6 to auth against mysql table.

---8<-----------------------------
[general]
realm=sip.arminco.com
amaflags=billing
port = 5060
bindaddr = 195.250.77.70
context = sip
nat=yes
canreinvite=no
videosupport=yes
dtmfmode=rfc2833
disallow=all
allow=g729
allow=gsm
allow=ulaw  

[1038]
type=friend
username=1038
secret=201038
host=dynamic
callerid=1038

[1039]
type=friend
username=1039
secret=201039
host=dynamic
callerid=1039
---8<-----------------------------

By: Kevin P. Fleming (kpfleming) 2005-03-12 10:30:36.000-0600

khb's analysis is pretty much correct, although in spite of Asterisk's current limitations, it would work properly with this gateway if the gateway followed RFC3261 more completely and did not share Call-ID values between its two (or more) ports, since they are distinct UACs.

By: khb (khb) 2005-03-12 11:50:28.000-0600

Here is your 3 minute soft quick fix.
This was taken against current stable release.

BTW, many gateways have the same problem as yours.
If you have a budget for commercial support and a better driver post something on the biz list.

By: Kevin P. Fleming (kpfleming) 2005-03-12 13:40:46.000-0600

I don't think that's a very safe solution at all... if the UAC retransmits the REGISTER because it did not receive the '200 OK', we won't treat it as a retransmission, but as a new request, with new authentication.

This is completely a bug in the gateway; it _cannot_ use the same Call-ID and tag values for registering two different users (UACs). I'll quote from RFC3261:

Section 8.1.1.3

  The From field MUST contain a new "tag" parameter, chosen by the UAC.
  See Section 19.3 for details on choosing a tag.

Section 8.1.1.4

  The Call-ID header field acts as a unique identifier to group
  together a series of messages.  It MUST be the same for all requests
  and responses sent by either UA in a dialog.  It SHOULD be the same
  in each registration from a UA.

  In a new request created by a UAC outside of any dialog, the Call-ID
  header field MUST be selected by the UAC as a globally unique
  identifier over space and time unless overridden by method-specific
  behavior.  All SIP UAs must have a means to guarantee that the Call-
  ID header fields they produce will not be inadvertently generated by
  any other UA.

Since REGISTER requests do not create a dialog, they are _always_ outside of a dialog. Based on my reading of RFC3261, every REGISTER request should have a unique tag, period. REGISTER requests from different UACs should _never_ have the same Call-ID, even if they are coming from the same physical device (because it has multiple ports, in which case it is actually multiple UACs as well).

Asterisk's dialog matching (or lack thereof) does not enter into the equation; REGISTER requests are not part of a dialog. When a REGISTER request comes in, Asterisk rightly checks the Call-ID to see if it has already processed a REGISTER for that UAC (since, according to the RFC, a UAC SHOULD use the same Call-ID for every register request, and a UAC MUST use a Call-ID that is unique from all other UACs). If so, it handles the REGISTER request as a retransmission, and acts accordingly. If the authentication information provided does not match that which was previously provided, the REGISTER fails.

Now the RFC does not say a UAC _MUST_ use the same Call-ID for every REGISTER request, only that it SHOULD. Asterisk deals with this appropriately as well, since if the UAC changes Call-ID for every REGISTER, Asterisk will treat every one as new, and never assume they are retransmissions. What it _cannot_ deal with, since this behavior is not RFC compliant, is having the same Call-ID used for REGISTER requests from multiple UACs. If the requests appeared at widely spaced enough intervals, it would not actually cause a problem, but it's still broken behavior on the part of the UAC.

I'll leave it up to Olle to have the final say here, but my opinion is that while there may be protocol requirements that Asterisk does not implement properly, this is not one of them. If this gateway followed RFC3261 properly, there would be no problem.

By: Kevin P. Fleming (kpfleming) 2005-03-12 13:58:02.000-0600

bug description updated to match actual problems

By: Vahan Yerkanian (vahan) 2005-03-14 02:13:33.000-0600

FYI, khb, your 3 minute patch didn't work for me...
Asterisk crashed after first register:

----8<-----
*CLI>     -- Registered SIP '1018' at 195.250.77.72 port 5060 expires 60
asterisk in free(): error: chunk is already free
cat: stdout: Broken pipe
cat: stdout: Broken pipe
Abort (core dumped)
sip# cat: stdout: Broken pipe
cat: stdout: Broken pipe
cat: stdout: Broken pipe
cat: stdout: Broken pipe
cat: stdout: Broken pipe
cat: stdout: Broken pipe
----8<-----

By: khb (khb) 2005-03-14 08:09:01.000-0600

Oops, sorry about that, 3 Min. just never does it.
Here is another 1/2 Min.

Did you notice what happens after register_verify()?
Closing RTP ports, believe it or not. As if we ever
needed RTP for REGISTER.  Perhaps someone thought
they can REGISTER by telephone.

By: Kevin P. Fleming (kpfleming) 2005-03-14 08:31:12.000-0600

That happens because sip_alloc (which creates the private structure) always allocates RTP/RTCP ports, in case they _ever_ want to be used during the lifetime of that private structure. That's on my list to take care of, but the list is quite long, and that will be a complicated patch that will require much testing.

By: khb (khb) 2005-03-14 09:12:19.000-0600

Already done it

By: Kevin P. Fleming (kpfleming) 2005-03-14 09:35:28.000-0600

Has it been posted somewhere? I've not seen it, unless I've just overlooked it.

By: Vahan Yerkanian (vahan) 2005-03-14 10:01:09.000-0600

khb, your patch_register_sip2.txt "fixed" this "bug", thanks for your 3 1/2 minutes of time :) All ports register OK now on all models of gateways.

I've pointed AddPac developers to this URL in hope they'll fix the Call-ID issue on their side.

By: Vahan Yerkanian (vahan) 2005-03-17 03:46:02.000-0600

*EDITED*

Actually calls fail now after applying khb's patch_register_sip2.txt :

1011 is the gateways first fxs port,
1027 is a softphone(x-pro) previously registered with *,

Please note the verbose CLI output messages near *'s reply to the call INVITE (sip debug attached as patch_register_sip2_invite_bug.txt). There is a reference to the peer 1034, which is a different fxs port on this gateway, actually the last port on the interface (gateway has 2 interfaces with 4 fxs each)

Calls from 1027 to 1011 ring the fxs phone ok.

The last gateway re-register debug is attached as patch_register_sip2_register.txt.

edited on: 03-17-05 10:48

By: Olle Johansson (oej) 2005-03-17 05:59:38.000-0600

Please *do not* add SIP debug output in the bug report, add them as attachments in .txt files. Otherwise, it will become very hard reading and working with the bug report. Thank you.

By: Olle Johansson (oej) 2005-03-17 06:08:42.000-0600

You did not add the full SIP debug of the INVITE... Please do that, starting with the INVITE without any authentication.

By: Olle Johansson (oej) 2005-03-17 06:12:27.000-0600

Vahan: Seems like we have solved the original problem in this bug report, it was really a bug with your equipment that khb solved with a patch that will not make it into CVS.

Apart from that, we have a lot to do with the SIP channel - I agree with KHB there. Looking forward to seeing more of his patches to solve this, and not only complaints :-).

If there are new problems with Invites with this buggy equipment, let's discuss that in another bug report if it is a bug with Asterisk.

By: Vahan Yerkanian (vahan) 2005-03-17 10:53:03.000-0600

I apologize for including the debug inline, I've edited my previous post.

I just tested with the unpatched chan_sip again, and the calls were working in both way ok, when gateway had empty passwords.

So khb's last patch while fixing the registration issue with the non-comforming gateway, creates a new bug that is present only on the patched version of chan_sip, and I think won't make sense being reported as separate bug.

The INVITE debug i pasted in my previous message was what I received after I attempted a call from fxs port to a registered softphone after the gateway was registered and had re-registered its ports several times.

The INVITE bug debug is uploaded as patch_register_sip2_invite_bug.txt
The last gateway re-REGISTER debug is uploaded as patch_register_sip2_register.txt

I can provide access to the gateway and the * server if it'll help the debuging process.

By: Olle Johansson (oej) 2005-03-17 12:27:17.000-0600

Closing this bug report, since the issue was resolved. If there is another bug, please open another bug report so we can keep them apart. Thank you.

By: Vahan Yerkanian (vahan) 2005-03-17 16:40:47.000-0600

Bug wasn't fixed, and the proposed fix actually bugged chan_sip module more.

By: Kevin P. Fleming (kpfleming) 2005-03-17 16:44:26.000-0600

This bug was closed because it has not been demonstrated that this is actually a bug in Asterisk. Asterisk is not handling registrations from different UACs at the same IP address using the same Call-ID value, but that is not RFC compliant behavior.