[Home]

Summary:ASTERISK-04240: [PATCH] REGISTER "deadlock" between SPA's and Asterisk/Non-SPA Interoperability
Reporter:Forrest Christian (fwc)Labels:
Date Opened:2005-05-20 01:09:16Date Closed:2008-01-15 15:37:02.000-0600
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Channels/chan_sip/Interoperability
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) sipnoncestale0524.txt
( 1) sipstalestable.txt
Description:After a random period of time (minutes to days), asterisk starts denying authentication requests from SPA boxes with a 403 Forbidden result.   The cause of this error is twofold.  First, for some reason, the nonce value gets out of sync (stale) between the Sipura and Asterisk.  Thus, the MD5 hash provided by the sipura box does not match the hash calculated by Asterisk.   Second, the Sipura simply retries the same request, with the same nonce, instead of taking the 403 Forbidden result to mean that it should start over.  This appears to make the call never expire since the sipura retries faster than the call times out.  As a result, you end up in a state where the SPA asks to be registered, and asterisk says no. lather. rinse. repeat.  I will carefully sidestep the whole "the sipura is broken also" issue.

Based on other discussions among the VoIP community, most notably the vovida bug at http://bugzilla.vovida.org/bugzilla/show_bug.cgi?id=605, certain non-sipura clients will assume that a 403 response means "go away, come back never".  Sipura being different of course assumes 403 means "come back in 5 seconds and try the exact same request again".  (Yes, I will be submitting a bug to sipura as well).

From everything I've read, it sounds like responding with a 401 is the more interoperable way to handle a nonce mismatch.  It also appears to fix the sipura issue (see below).

****** ADDITIONAL INFORMATION ******

The attached patch is my attempt to fix this issue by checking nonce during the authentication phase, and if it doesn't match (i.e. it's stale), asterisk then replies with a 401 with a new nonce.  This appears to be working in my environment, but it's hard to tell because of the randomness of the events.  I will post updates over the next few days.
Comments:By: Michael Jerris (mikej) 2005-05-20 09:12:46

oej, can you take a look at this one please.

By: Forrest Christian (fwc) 2005-05-20 15:10:28

Additional data from submitter:  It appears there is still a condition where the sipura boxes will still get into a "registration deadlock".  However, this patch appears to have significantly reduced the instances of this occuring (additional time is needed to verify).  I am attempting to work with sipura to resolve the remaining issue since the new deadlock appears to be somewhat related to the original cause.

Based on my research, however, I am convinced that the proper response to a nonce mismatch *is* a 401, and not a 403....

By: Olle Johansson (oej) 2005-05-20 17:14:36

Please follow bug guidelines and add a full SIP debug.

I vaguely remember seeing that the SIPURA has a bug where it does not update the CSEQ properly at failed registrations, so watch the CSEQ.

Your code is a good start, maybe we should fix the whole auth thing and allowing re-use of old nonces for a limited amount of time.



By: Olle Johansson (oej) 2005-05-20 17:19:01

We need to know if this code is dislaimed or not - again, please read the bug guidelines :-)

By: Forrest Christian (fwc) 2005-05-20 19:49:17

FYI,

This code has been disclaimed...  Paperwork has been faxed in.

I came across a couple of references which indicate if the problem you are responding to with the 401 is a stale nonce, a parameter "stale=true" should also be added to help the UA know that the problem is a stale nonce, not a bad password.  Doing this is going to require some additional intrusion into the code.  I'll try to add this to the patch.

On the couple of debug snipplets I have (not sufficient to post) it does appear that the sipura is incrementing the CSEQ on each try.  I'm believe that's the correct behavior (I could re-read the RFC's a hundred times and still not be 100% sure until I saw a for-sure-good implentation in action).

I had previously mentioned here (now edited/deleted) that I had been seeing new authentication loops.  That was due to a problem with the patch I provided earlier.   This has been resolved.



By: Forrest Christian (fwc) 2005-05-23 02:17:39

Just for completeness, here are some additional notes/workarounds:

1) If the retry-timers on the sipura are sufficiently large, asterisk's calls will time out, causing asterisk to send a new nonce in a new 401, thus avoiding this issue.  By sufficently large, I have found > 30 seconds to be sufficient in testing.  However, since I found several references to people having this (or a similar) issue online, I believe turning these timers down below the "always works" threshold is desired.  Plus, it will definately improve interoperability.

2) Sipura has a register retry long interval which is apparently used when the sipura receives a 403.  The default is extremely long.  Without this patch, asterisk may send a 403 when nonce is stale, basically shutting the sipura off for the duration of this timer.  I had turned down this timer to solve this issue (and subsequently aggrevated the bug which this patch fixes).  Sending a 401 causes the sipura to retry the request immediately.

3) I'm sure similar issues occur with other SIP clients.  Especially if the default action for a 403 is to go away for a long period of time, if not forever.

4) Restarting asterisk (cold) and/or the sipura will cause the boxes to register.  Restarting asterisk causes asterisk to issue 401's to each client.   Restarting the sipura causes the sipura to issue a reg request without a MD5 hash, which causes asterisk to send a nonce.

5) In a note above, I mentioned that I thought stale=true should be added.  I'm not finding sufficent discussion and/or examples online to prove or disprove this.   Since the patch as supplied appears to resolve the issue, I am dropping the stale=true suggestion pending further research.

6) Additional verification of the register may be a good idea, however, I am probably not the right person to add this to the code (not sufficiently versed in SIP).

7) I am not convinced that the sipura is doing anything strictly wrong at this point.  However, if sipura fixed their implementation such that once they got a 403, they didn't re-use the nonce, this problem would not have occured.

-forrest



By: Olle Johansson (oej) 2005-05-23 02:28:19

I can not apply your patch to current chan_sip.c in cvs head. Are you sure you are producing the patch file from latest chan_sip.c ?

By: Forrest Christian (fwc) 2005-05-23 02:32:00

Yep.. Cvsup'd 5 mins ago...  Let me upload it differently.  (Perhaps it got mangled in the translation).

Update...  Not sure what happened to the last one...  There is a significant size difference (perhaps ascii vs binary transfer), but I can't see any difference online.

-forrest



By: Olle Johansson (oej) 2005-05-23 02:43:10

It's still not right. Line 5638 is in the middle of a completely different function in my CVS chan_sip. Either you or I have the wrong file. I deleted my chan_sip.c, check out a fresh copy and tried to apply the diff.

Move your patched chan_sip to chan_sip.c and issue a
cvs diff -u channels/chan_sip > sipnoncepatch.txt

Let's see what that produces. Thank you for your patience.

By: Forrest Christian (fwc) 2005-05-23 03:01:04

Hopefully one last time:

Ran command, then removed chan_sip.c and then re-cvs'd.
fetched patch from digium via wget:

[root@mail2 channels]# md5sum chan_sip.c
f43be4aaf01b9f66d7ad85946c939a24  chan_sip.c
[root@mail2 channels]# patch <fromtracker.patch
patching file chan_sip.c
[root@mail2 channels]# md5sum chan_sip.c
e525f28eaf6f6b43953a4b2226cfc3e4  chan_sip.c

FYI, if you are near irc, I'm on #asterisk-dev right now, user forrestc{hm}
-forrest

By: Olle Johansson (oej) 2005-05-23 03:28:04

Monday mornings. Got it patched in and testing it on my system for a while. Thank you for your patience :-)

By: Olle Johansson (oej) 2005-05-23 04:29:43

We need to apply this to CVS stable as well.

By: Olle Johansson (oej) 2005-05-23 06:41:50

Patch file for stable uploaded

By: Olle Johansson (oej) 2005-05-23 06:55:56

Rfc 2069:
  stale
    A flag, indicating that the previous request from the client was
    rejected because the nonce value was stale.  If stale is TRUE (in
    upper or lower case), the client may wish to simply retry the
    request with a new encrypted response, without reprompting the
    user for a new username and password.  The server should only set
    stale to true if it receives a request for which the nonce is
    invalid but with a valid digest for that nonce (indicating that
    the client knows the correct username/password).

By: Olle Johansson (oej) 2005-05-23 10:37:54

Uploaded an improved version for testing. This version
* Adds "stale=true" if they authenticate properly using the wrong nonce
* Gives error if they try to use another auth user name than display user name (from name in URI). Asterisk currently authenticates on From: user name so it is important that they don't try to set another auth user name, then nothing will work authenticationwise. Currently, we're just not accepting the registration if they set callerid to whatever and try to set the auth name to the [peer] name.

Please test and confirm.

Disclaimer on file.

By: Forrest Christian (fwc) 2005-05-24 01:02:50

Couple of discovered bugs in the revised patch:

1)  The test on line 5474 is backwards.  strncasecmp returns 0 if everything matches,  thus the ! is backwards.  Removing the ! in front of strncasecmp fixes this.

2) The stale=true portion of the header is mangled:

WWW-Authenticate: Digest realm="asterisk", nonce="397d281a" stale=true

Fixed by changing line 3737.  Remove the space bfore the last %s on the line, and change "stale=true" to ", stale=true"

With these changes, everything seems to be working.   Will leave in production for a while...

By: Olle Johansson (oej) 2005-05-24 01:14:28

Thank you for the feedback. Let's see how the test goes.

By: Olle Johansson (oej) 2005-05-24 01:40:15

Updated patch.

By: Olle Johansson (oej) 2005-05-31 15:51:03

fwc: Any test results?

By: Forrest Christian (fwc) 2005-05-31 17:26:59

Works good here... Seems to have solved the problem completely.
-forrest

By: Olle Johansson (oej) 2005-06-01 01:08:16

Kevin: Disclaimed, tested, approved by Mark and ready for CVS!

By: Kevin P. Fleming (kpfleming) 2005-06-03 15:14:22

Committed to CVS HEAD, with formatting fixes. Please review the coding guidelines and the existing code before submitting patches in the future. Thanks!

By: Digium Subversion (svnbot) 2008-01-15 15:37:02.000-0600

Repository: asterisk
Revision: 5834

U   trunk/channels/chan_sip.c

------------------------------------------------------------------------
r5834 | kpfleming | 2008-01-15 15:37:01 -0600 (Tue, 15 Jan 2008) | 2 lines

handle stale authentication nonces more properly (bug ASTERISK-4240, with formatting fixes)

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=5834