[Home]

Summary:ASTERISK-01229: chan_sip.c: LAN traffic + using sip secret causes registration to fail/drop/no-authorization
Reporter:chrisorme (chrisorme)Labels:
Date Opened:2004-03-17 08:08:31.000-0600Date Closed:2004-09-25 02:52:15
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Core/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 300304.2.snom.nr.txt
( 1) 310304.notraffic.txt
( 2) sipura_lantrafkillsauth
( 3) snom_lantrafkillsauth
( 4) snom_notregistered
( 5) snom.traffic-300304.txt
( 6) snomdebug.4apr.1.txt
( 7) snomreg.020404.2.txt
( 8) snomreg.020404.3.txt
( 9) snomreg.020404.txt
Description:Hi!   I'm currently testing using an snom 200 and sipura units talking to *.   When either there is a secret specified or no secret is specified and there is no load on the LAN both units register fine, stay registered and make and receive calls fine.   However when traffic is put on the LAN / downloads etc (but there is still sufficient bandwidth available for the SIP call) then calls can only be made or received IF no secret is specified.   The clients de-register and therefore no calls can be made.   Removing the secret and the auth=md5 allows calls to be made and received no problem under LAN load.

****** ADDITIONAL INFORMATION ******

The two errors reported when communication breaks down under LAN traffic when secret and md5 are enabled are as follows :

Error 1:

Mar 17 02:58:23 NOTICE[81926]: AAFailed to authenticate user horizon3 <sip:horizon3@212.18.255.13>;tag=8ff720ebbd7d13b1

This is generated by the following bit of channels/chan_sip.c

               if (!p->lastinvite) {
                       /* Handle authentication if this is our first invite */
                       res = check_user(p, req, cmd, e, 1, sin);

// GOES WRONG HERE
                       if (res) {
                               if (res < 0) {
                                       ast_log(LOG_NOTICE, "AAFailed to authent
icate user %s\n", get_header(req, "From"));
                                       p->needdestroy = 1;
                               }
                               return 0;


and

Mar 17 02:47:26 NOTICE[81926]: Registration from 'horizon3 <sip:horizon3@212.18.255.13>' failed for '195.10.96.134'

generated from channels/chan_sip.c

               /* Use this as the basis */
               if (sipdebug)
                       ast_verbose("Using latest request as basis request\n");
               copy_request(&p->initreq, req);
               check_via(p, req);
               if ((res = register_verify(p, sin, req, e)) < 0)
// GOES WRONG HERE - CHRIS

                       ast_log(LOG_NOTICE, "Registration from '%s' failed for '
%s'\n", get_header(req, "To"), inet_ntoa(sin->sin_addr));
               if (res < 1) {
                       p->needdestroy = 1;
               }

I haven't read the code in detail as I haven't read all that much of the sip code but the problem might be coming from check_auth in that it contains an 'if' showing that all is always ok if there's no secret or md5, which is exactly what is happening.  I have currently put a nasty hack in to always return 0 from this subroutine and all works fine (except of course I have no authentication) but the clients stay registered and there are no errors.

static int check_auth(struct sip_pvt *p, struct sip_request *req, char *randdata
, int randlen, char *username, char *secret, char *md5secret, char *method, char
*uri, int reliable)
{

// chris - hack auth ;
       return 0;
// that should be it - c
       int res = -1;
       /* Always OK if no secret */
       if (!strlen(secret) && !strlen(md5secret))
               return 0;
       if (!strlen(randdata) || !strlen(get_header(req, "Proxy-Authorization"))
) {

My guess is that due to the lag or something when there is traffic on the LAN that check_auth either gets the wrong answer as to whether a client is authorised or either only one shot is given at trying to auth the client and if this doesn't work out quickly or first time then the client isn't allowed to connect or otherwise make a call.

My C is bad but I was wondering if in fact the check_auth code is ok if around the two points in the code where the two errors are generated in the log if a variable could be defined (if it isn't already) as to the number of 'goes' to give a client before giving up on it and deregistering or otherwise refusing it.

I tried re calling the subroutine defining 'res' in the if's again ie adding if (res < 0){res=subr;}  and this seemed to make things a little more solid but didn't totally solve the problem and I'm not really a programmer.

This bug is 100% consistent and you should have no problem repeating it.  it's a major problem for us as our clients have dynamic IPs and we would like to authenticate them using md5, snom or sipura hardware without having to use iptel's SER, which I realise lots of people use.  

I think this is something that should work.

If anyone knows how I can get the chan_sip2.c code to try, OR can email it to me please do and I can try it.

Thanks for a brilliant product, sorry to find this - hope my report is helpful - Chris
Comments:By: chrisorme (chrisorme) 2004-03-17 22:22:41.000-0600

Files attached.

the sipura is horizon3 / horizon4 in file sipura_lantrafkillsauth
the snom is horizon1  in file snom_lantrafkillsauth

The registration is only lost when auth=md5 secret=whatever is in sip.conf without my return=0; hack in check_auth.  Ie if the body of chan_sip.c check_auth isn't run then it doesn't matter if there is traffic on the LAN the calls all work fine - so there's something going on in there.

The sip debug output attached is first with no traffic on LAN, successful call, then traffic is put on the LAN, and more calls made, the sipura gives a tone when it doesn't go through, the snom gives a proxy authentication error on its screen when they fail.  This is repeatable 100%.  
Hope my report helps.

Chris

By: Mark Spencer (markster) 2004-03-20 16:57:26.000-0600

I can't duplicate this bug, and clearly its very existence would appear to violate even the most basic sanity of a system.  Clearly general network traffic cannot affect the calculation of the checksum within Asterisk, unless a system is totally messed up.  You need to find someone completely independent who can duplicate this bug or i'm just going to have to resolve it as "cannot be duplicated" :(

By: chrisorme (chrisorme) 2004-03-21 03:15:29.000-0600

If you can please give me 10 days before marking this as 'cannot be duplicated' please as I don't have the hardware and am off to cebit as it certainly can be duplicated but probably on a different setup than you have.  

Two of our other offices have found this bug.  
Basically a bandwidth limited pipe is needed (512/256k ADSL or ISDN), commence a big download from the net (eg. red hat iso) and then the sipura and snom start to fail with 'proxy authentication required' (see files) if a secret is defined once latency increases on the pipe ( I would assume ).
If no secret is defined they continue fine (or with my return=0 hack).
I don't think the checksum is being affected I think that maybe only one go is given at auth and maybe the reply isn't given in time it fails?.  I'll get better debug -vvvvvv -d when I get the equipment back and maybe file a short avi too.

Thanks for looking at this but it definitely happens and it'd be really handy to get a secret defined.   (the connection did get a bit more solid when I did nasty things like

changing

if ((res = register_verify(p, sin, req, e)) < 0)

to

res = register_verify(p,sin,req,e)
if (res <0) { res=register_verify(p,sin,req,e);}  << -added line

ie to make it try again...

but it didn't make it 100% but it made it a bit better / more solid.

I'll get you more proof.  Chris

By: chrisorme (chrisorme) 2004-03-21 11:55:33.000-0600

Heya,

What I think might be a good idea is for me (or someone) to install * on a development machine on restricted bandwidth (64k or 256k) and a secret defined for a sip account and put a red hat ISO on that machine also for download from a webpage link.

That way I can give out the sip username and secret here in a bug report (and only allow a message to be played when any number is dialled) and give out the webpage with the iso (on the same dev box) then you or any independent person can try registering with the sip account with secret with some sip client (preferably an snom or sipura unit) and then once that is successful then the remote user can start to download the iso and try to register again and see the sip client give up with an error with the authentication on when downloading the iso.

Errors are usually one of something like  :
1.  Mar 17 00:12:58 NOTICE[1125329600]: Registration from 'horizon4 <sip:horizon4@212.18.254.17>' failed for '195.10.96.134'

2.  Mar 17 00:13:53 NOTICE[1125329600]: Failed to authenticate user horizon3 <sip:horizon3@212.18.254.17>;tag=1e47c10bb7d7f215

(error 2 can eminate I think from either of two points of the code but error 1 only comes from one point)

I could also define a second account with no secret because then there is no bug and no 'proxy authentication required' or any other problem with the sip client when no secret is definied.   Ie the experiment described above can be repeated again with no secret and there will be no failure for anyone who tries.

This should replicate this definite bug.

I could then use the version of the code I have where all the definitions of 'res=' lines are duplicated (as described above) if they give res<0 (which causes the code to bail out at the moment with the errors above are given (I think) another chance or two at calling the relevant subroutine to determine res when res < 0) and you could try again and see that LESS whinging over 'proxy authentication required' 'failed to authenicate user' etc errors seem to occur with the download going on.  (still 10-15% of the time the client won't register, probably as it's such an ugly hack and I don't really understand what I'm doing to chan_sip.c)

I think maybe it is currently taken for granted that successful registration will occur immediately/in the required (small) timescale on all links and if the answer isn't given right or quickly enough then the user has been failed to be authenticated.  I can't see how to change this in chan_sip.c right now for the best. sorry.

Or for another experiment I could put in place the version of the code where return 0; is returned from subroutine check_auth immediately and again then there is no problem and no bug when this is done (apart from of course the fact that it doesn't care if the password/secret is correct or not - but we've used this hacked code successfully).

The comments in check_auth don't fill me with inspiriation ! :)

It'll take me 10 days to get this done as I'm off to Cebit tomorrow and not back until a week on Monday but that's the jist of how I think I can get the independent verification you require and to prove this bug we've spend about a week isolating and made 100% repeatable that I am 100% confident that anyone who takes 20 minutes out to work through the experiment above will see this behaviour.

At the moment none of our offices use secrets and there is no problem but this is clearly far from ideal as if the username is sniffed we're in trouble !

Looking forward to proving this to you or someone proving this bug definitly using the above method or something like it whilst I'm away !

Perhaps everyone has too much bandwidth and fantastic connections between everywhere to notice this in the states ?

- Chris

By: Mark Spencer (markster) 2004-03-21 13:42:04.000-0600

Okay I have a theory, but you'll have to look at the SIP debug on the *Asterisk* side to confirm this...

I'm guessing that the SNOM is transmiting INVITE, then, before it gets our 100 Trying and/or 407 Proxy Authentication Required, it's *retransmitting* its INVITE, and so we're recalculating the random stuff.  Can you confirm that Asterisk sees *two* invites and/or *two* register requests on the same call number from the snom or sipura?

By: chrisorme (chrisorme) 2004-03-21 13:59:50.000-0600

The two files attached are debugs from the asterisk side of things (with sip debug on) between asterisk and the snom (in one file) and sipura (in the other) I did from a few days ago with the bug going on (secret defined) (they don't have all the -vvvvvv 's (just a few) and no -d you might need that Olle suggested) - but may still help.  They also don't end in .txt - soz.

I'm trying to read them through though.

They were produced with asterisk -r > /tmp/whatever and then I did a successful call (with no traffic on the line) and followed then by one or more failed calls rejected with Proxy Authentication Required etc type errors.

I'm having a bit of trouble trying to see if your theory is what is going on but I'm going to have a try to look for it.   There do seem a lot of INVITEs floating around.

Thanks loads for looking into this Mark.

Very best wishes, Chris

By: chrisorme (chrisorme) 2004-03-21 14:02:28.000-0600

The sipura debug I uploaded a few days ago may be more useful as I fear the 'I only want gsm' (from snom) and 'I only want g729' (from *) might be mixed into the snom debug that I did a few days ago and might make it a little tricker to follow.-C

By: chrisorme (chrisorme) 2004-03-21 14:21:19.000-0600

I've attached snom_notregistered  as there are loads of REGISTERs flying around in that.

By: Mark Spencer (markster) 2004-03-21 14:29:15.000-0600

Okay i've put a first-pass fix in asterisk CVS head, but if it works i'll back-port it to -stable.  Please let me know if it works or even makes any difference at all as soon as possible.

By: chrisorme (chrisorme) 2004-03-21 15:21:58.000-0600

Ok, I've put the latest CVS on a devbox ready to be tested.  
I'll try and explain to someone over email how to test it using our sipura or snom units whilst I'm away - if they can't manage it (quite likely) I'll do it myself a week tomorrow.  
If anyone with an snom or sipura reading this on a connection they can bandwidth limit wants to have a go at this please try.

I looked at the new CVS code on the list, and I'm not a C programmer, but I didn't see where the new ignore parameter was set to true or false anywhere.  But I should read up on C.  
Anyway I don't know.. sorry - I'll let you know how we get on, and thanks again for looking at this.  Thanks!!!

Chris

By: chrisorme (chrisorme) 2004-03-21 15:41:50.000-0600

Ok I found it in handle_request - cool :-) - thanks!

By: Mark Spencer (markster) 2004-03-22 14:52:47.000-0600

What's the story, can we close this now?

By: chrisorme (chrisorme) 2004-03-23 13:02:14.000-0600

If possible please could you wait until next Monday night to do this.
If at all possible.  I will be able to retrieve the sipura unit and try it with traffic to see if the registration still falls over.

I am currently in Germany at Cebit looking for SIP gateways to work with *.

My staff tried the new version but told me that it did not fix the bug.  

However I do not believe them (having seen the patch) and would like to test the patch myself before saying definitively if the patch solved the problem or not ?

sorry for the delay - very best wishes, Chris

By: Mark Spencer (markster) 2004-03-23 13:33:14.000-0600

Also please remember this is in CVS head only, and not in -stable!  I will be gone to VON next week, so it will be nearly impossible for me to work on it.  Please try to test it as soon as possible.

By: chrisorme (chrisorme) 2004-03-30 09:39:44.000-0600

I tried the latest CVS and with the 'make samples' config (old one deleted) it will not boot past parsing chan_sip.conf / loading SIP and doing asterisk -r doesn't get past the ==='s below your name.  I will try installing CVS on another machine or earlier CVS until I can get one to boot just incase your patch isn't in stable.

It appears to be in the stable download I did..

from check_auth

  if (ignore) {
               /* This is a retransmitted invite/register/etc, don't reconstruct authentication
                  information */

Not sure what went wrong with my download if it's not....

anyway.... I've attached debug 300304.txt.

The snom and sipura still lose dialtone and their registration fails at some point when the traffic is heavy but they can usually make calls without the registration.  (or they do something before the call)

Basically it seems a little better (ie calls can usually be made under traffic now with a password defined) but the dialtone does drop and they do drop into an NR state.

Hope the debug is some help.

Any chance someone could send me the configuration script for the IAXy so I can try that and perhaps kiss bye bye to these sip problems ?

Have a great week at VON.  Wish I was there!

- Chris

By: Mark Spencer (markster) 2004-03-30 11:13:27.000-0600

You were unable to update to CVS?  Did you check that /var/log/asterisk/* was not over 2 gigs in size?

By: chrisorme (chrisorme) 2004-03-30 11:43:22.000-0600

No I didn't.  But that got it to boot.  Thanks.
I will now put the latest CVS on that server and see what happens with these two bugs with that.

Maybe the iaxy greg was kind enough to send over here will talk adpcm to the CVS rather than to the stable too...  (John told me about iaxyprov btw)

Thanks!  Chris

By: chrisorme (chrisorme) 2004-03-30 12:34:15.000-0600

Sorry, but the same result with the latest CVS as with v1.0stable.
New log file 300304.2.snom.nr.txt attached/uploaded being output from the latest CVS.  (noted by all the 'urgent handlers')
The patch is in the stable branch anyway so I'm not surprised the result was the same as with our production server.

The snom goes to NR and the dialtone is lost when there is quite a bit of traffic on the network when (I think) it tries to reregister (see near the end of the log or grep).  
This bug only occurs when a password/authentication is defined.
Calls can be made when the phone isn't registered if you ignore the fact there is no dialtone as the phone is not registered 'NR'.

Chris

By: Mark Spencer (markster) 2004-03-31 03:07:09.000-0600

Do you still receive registration errors or does the registration simply expire?

By: chrisorme (chrisorme) 2004-03-31 05:40:16.000-0600

Using a password with the snom causes registration to fail reasonably frequently without traffic too but traffic will make it drop certainly.

see the 310304 uploaded file.  (with v10 stable which has the patch in and there was no traffic)
Extracts from sip.conf for this debug was

maxexpirey=1800         ; Max length of incoming registration we allow
defaultexpirey=300              ; Default length of incoming/outoing registration

[devsnom1]
type=friend
context=default
dtmfmode=rfc2833
username=devsnom1
disallow=all
allow=gsm
;;allow=g729
secret=XXX
auth=md5
nat=yes
host=dynamic
canreinvite=no
allowreinvite=no
qualify=10000   <<- this line is new today (10secs), an attempt to keep the NAT open as calls initiated from the snom were dropping after about 2 minutes -
I'm guessing either the NAT timeout period OR the registration interval which I've had between 1 and 5 minutes was causing the placed call to drop?  Obviously it's annoying when a call drops after a couple of minutes.   This happens also without passwords defined.

All the best,

Chris

By: chrisorme (chrisorme) 2004-03-31 06:05:08.000-0600

We get 'proxy authentication required' on the attached sip debug when trying to register which is an error as I see it and not an expire but I am not 100% sure as to how the difference would express itself.  
I also would think it's an error as taking the secret and auth lines out solves the problem?
Sorry if that doesn't answer your question.

If not, how can I look for the difference?

Chris

By: Mark Spencer (markster) 2004-04-02 02:44:41.000-0600

Clearly in your testing you are introducing *extreme* network latency.  Can you even operate a phone call in such an environment?  It's taking several seconds for responses to come back.  Looking at the log, we are *properly* handling the initial registration, each time sending back the proper nonce.  However once we're registered, we have no way of knowing that we're still in that dialog, meaning that we ask it to again authenticate its registration.  I'm not sure what the "right thing" to do is in that case, but clearly what we do doesn't introduce any problem as it doesn't remove the other registration.  I don't see any failed registrations in the log you posted.  Am I missing something?

By: chrisorme (chrisorme) 2004-04-02 06:20:25.000-0600

Many thanks.  I don't think the network latency is extreme in the early debugs. At maximum it ran about 600-1100ms and phone calls are intelligible in this environment and can be placed fine.  Only the registrations with secret/auth defined fail.  (all is fine without this).  I am judging a failed registration as one where the snom reports NR and gives no dialtone (and I assume has problems then receiving calls).

If the problem is latency then is there anyway could it be considered for asterisk to be asked to wait for reponses to come back, perhaps with a variable built in so that people who need this functionality have it but with your default/current behaviour as standard, rather than * giving up and the snom dropping into an NR state if there was some traffic on the line when it went to re-register (using secret/auth) ?

I'm not sure what is happening but alternatively if a few requests for registration are sent and there are several possible answers can it be made (as an option) that if the phone comes back with one of these answers it is told it is accepted?

It seems everyone that develops gets instant responses which doesn't happen for real when * is remote from the site 150ms+ away.

I'm sorry but I would argue that there is definitely some problem as even without traffic and the resulting latency (latency down to its usual 150ms-200ms from 600-800ms) when a secret/auth is defined the snom has dropped into unregistered state (it did this in the log of the 31st, although it may not show it) although placing calls from this NR state does work (receiving may not).  Maybe I should wait for this to happen and post both the asterisk and snom log of when it drops to NR with latency at only 150ms-200ms?

Sadly something is definitely going wrong so that we can't use secret/auth on the snom without suffering NR frequently even without traffic increasing latency (although traffic always brings it on if it coincides with a registration interval).

Out of interest what do you commonly use as a registration interval ?  I've been trying 1min,5min,10min and 1 hour.

I'm not sure exactly what you're missing, and I don't want to sound bolshy when I don't understand the mechanics but I know there is a definite problem (with latency of only 150-200ms) with the snom dropping to NR and if there's any way it can be solved so we can use passwords/secret/auth I'd be so incredibly grateful as we could deploy this and build our network using * servers and connecting them to the PSTN via multiple ISDN30s, and this is all that holds us back.

Hope you had a great time at VON!  Chris

By: Mark Spencer (markster) 2004-04-02 10:41:16.000-0600

Something much more severe than 150ms of latency is going on with these messages.  
Notice in the log you attached that we send OPTIONS and then we retry not just once but *twice*.  The retransmission time built into chan_sip is one second.  That means that we are not receiving the 200 OK to the OPTIONS for at least 3000 milliseconds!  Then we receive all three back, meaning the first two weren't dropped, just extremely delayed.  You can see similarly that when it sends us messages it takes a several tries of our sending responses before they get them.

This is not at all a normal environment for packet voice, and again, looking at our responses we are sending the 407 at the right times, except arguably after we've sent our 200 OK, and I'm not sure what the right thing to do there is, except possibly to keep our registration around for several more seconds just in case we get another such response.

By: Mark Spencer (markster) 2004-04-02 16:45:15.000-0600

Okay I patched chan_sip to keep the dialog around for another 15 seconds.  I think that's an extremely excessively long time but at least lets see if that makes your SNOM happy.

By: Mark Spencer (markster) 2004-04-02 16:48:45.000-0600

Try it again and get another log and get back with me ASAP.

By: chrisorme (chrisorme) 2004-04-02 17:48:29.000-0600

Ok. Many thanks! I put the latest CVS on.

I've attached three new logs all of course with passwords/secrets defined.

snomreg.020404.txt .. here there was some lan traffic going on from the start and the snom was booted and the snom didn't ever manage to register at all.  Lots of Proxy Authenication Required errors, and Not Authorized etc. from * in the debug attached.
The 'NR' flashed on and off quickly a few times and then stayed on 'NR' status and calls couldn't be made or received over the several minutes of the log.

snomreg.020404.2.txt .. here there was very little lan traffic going on and the snom registered and actually stayed registered with a password - cool.

snomreg.020404.3.txt .. here the snom initially registered (with no traffic on the link) and then medium lan traffic was put on the link and it managed to stay registered through the traffic for a while which is good.  
A call was made and it only seemed to have about 200ms or so delay on it (you could notice the delay)and but it was intelligible and usable.  
The snom then dropped into an NR state sadly a few minutes later.

I had set the registration time to 1 minute on the snom.

Christian, the developer from snom sent me the following suggestion -
--------------------------------------------
Whow you must have real heavy traffic!

Retry T1 (ms): Time between retries to send UDP SIP packets

Retry T2 (ms): Legacy, not used any more

Session Timer (s): Time after a session must be re-invited (otherwise UA
thinks session is over). Important when pulling the Ethernet cable...

Dirty Host TTL (s): When a SIP entity cannot be reached its pointless to try
immediately again. Set this to zero in case your network goes up and down.

CS
----------------------------------------------

I have the settings set at T1, 5000ms, T2,0, Session Timer 120, Dirty Host 0.

I hope some of this helps in knowing what is going on..  It'd be great to use passwords and not worry about traffic pushing the registration over.

Chris

By: Mark Spencer (markster) 2004-04-02 19:38:15.000-0600

If you were really sure you had the latest CVS head and this is still the output, I will have to login to look at the problem.

Your latencies, as I explained, are NOT 200ms or anywhere near it, or this wouldn't be an issue.  Maybe it's some sort of prioritization or something like the signalling isn't getting a high priority?

So again, if you want -- test to see if this is in fact the latest CVS *head* not *stable* and then if you still have the same problem, I will need to login so find me on IRC.

By: chrisorme (chrisorme) 2004-04-03 00:40:33.000-0600

Thanks for looking at those.  Yes it is definitely the latest CVS - I even checked and saw the '15000' in channels/chan_sip.c

I've emailed you the login details as I've got exams this weekend and might miss you on IRC.

I'm sorry if SIP isn't supposed to work with heavy traffic with passwords by design - it's something I didn't know.  We don't prioritise VoIP traffic at our sites yet, haven't found the need yet.  

If this behaviour is a limitation of SIP then is the best way to try to avoid these Proxy Auth Required / No Authorization messages to set the SIP registration to as long a period as possible, like 1 day, so that the likelyhood of high traffic at the time of the attempt for reregistration is as low as possible ?  (and to set the timers on the SNOM to long periods ?)

Thanks.  Chris

By: chrisorme (chrisorme) 2004-04-03 11:21:50.000-0600

I think sometimes our latencies do go to about 1,500ms at times due to traffic bursts/downloads.  Calls seem be ok even in this environment as it doesn't last long but if perhaps if this coincides with a re-registration request (by the snom?) I think this is where the problem arises?

If you don't see this as a normal environment and don't want to implement it in the release may I cautiously ask how can we prevent it ourselves in our environment and those of our offices.  
Here on the Isle of Man is quite bandwidth restrictive and latency can rise beyond our control at times due to traffic shapers at our ISPs - also they run their connections at 100% downstream at all times except 2-4am.

Maybe you could give us the lines in chan_sip.c for instance where the 1sec retry is set/defined and the registration period and maybe we could set these to 3 or 4 seconds so our snom (hopefully to be snoms) stay registered and can make and receive calls through this nonsense?

Many thanks, Chris

By: chrisorme (chrisorme) 2004-04-03 13:37:32.000-0600

3:April:04 - The snom dropped to NR a couple of times for about 1-2 minutes at a time during the day with passwords defined and next to no traffic on the line.  I'm not sure what caused this.
I was wondering if it loses/forgets old registrations when it goes to register.

Chris

By: Mark Spencer (markster) 2004-04-03 20:45:56.000-0600

I don't think the SNOM recovers well from having registration unavailable and then return.  I can also tell you that there is NOTHING we can do about the SNOM going into NR.  Not only that, but just because it "says" NR doesn't mean it's not actually registered.

Anyway find me on IRC and I'll login and at least see why my "keep regs around" code isn't working for you to at least be sure that the re-register that comes back does get a 200 OK even on multiple passes.

Would be nice if you could find a sample call flow that demonstrates this.

By: Mark Spencer (markster) 2004-04-03 20:55:39.000-0600

I've modified chan_sip to provide more potentially useful debugging on this.  Please cvs update and get me one more trace exhibiting the problem.

By: chrisorme (chrisorme) 2004-04-04 01:31:36.000-0600

Thanks for perservering...  I've attached snomdebug.4apr.1.txt from running the debug on the latest CVS with traffic on the network.

It has a Proxy Authentication Required in it.

If you could get us OK's rather than Proxy Auth required / Not Authorized's etc that would be great.  Thanks for setting me straight about the NR's on the snom.

I'll run another debug later today in case this one isn't very good.  I'll include a sample call flow then as I've got to dash right now.

- Chris

By: Mark Spencer (markster) 2004-04-04 12:00:26

This last sample does not exhibit the problem.

I think you're failing to see that the 407 Proxy Authentication Required is a *necessary* part of the flow of the register and cannot possibly be avoided because it is used to send the "nonce" that is used...

The flow looks like this:

SNOM => REGISTER w/out AUTH => *
SNOM <= 100 Trying <= *
SNOM <= 407 Proxy w/ nonce <= *
SNOM => REGISTER + AUTH => *
SNOM <= 100 Trying <= *
SNOM <= 200 OK <= *

The only part I'm working on has to do with what to do if we receive multiple REGISTER + AUTH's after we send the 200 OK because of your absurdly high latency.  Again, in the attached debug, the SNOM never transmitted additional REGISTER + AUTH requests, so it didn't excercise the portion I was talking about.

By: chrisorme (chrisorme) 2004-04-05 07:33:01

Ok, really sorry about that..   Thanks for taking the time to explain how it works too.  I really should read the RFC.  sorry

Attached is a new log where I'm hope it's demonstrated throughout.
 
I did three successful calls in it - 2x30secs + 1xabout 50secs.  They sound ok.

Hope that helps and your code runs / is exercised this time (is there any quick way I can check with GREP if I didn't get a decent log?).  
If you think it's just us that has lousy bandwidth then don't worry too much - the joke of it is we have a monopoly carrier, and masses of fibre to the UK which isn't lit because of Government bureaucracy!

Thanks again, Chris
PS Below is some correspondance with SNOM that may or may not help...

-------message 1------------
No, it *should* deregister first. But you can only deregister if you have
the right credentials...

CS

> -----Original Message-----
> From: chris orme [mailto:chris@XXXX.com]
> Sent: Saturday, April 03, 2004 7:14 PM
> To: Christian Stredicke
> Subject: Re: snom settings
>
> Hi
>
> Does the snom200 register without removing previous registrations
>
> I'm having problems and was wondering if this might be the cause.  (see
> the
> asterisk / digium bug report url below?)
>
> Thanks
>
> chris
>
> ----- Original Message -----
> From: "Christian Stredicke" <XXX@snom.de>
> To: "'chris orme'" <chris@fXX.com>
> Cc: "'Sven Fischer'" <sven.fischer@XXXX.de>; <suX@snom.com>
> Sent: Friday, April 02, 2004 9:11 PM
> Subject: RE: snom settings
>
>
> > Whow you must have real heavy traffic!
> >
> > Retry T1 (ms): Time between retries to send UDP SIP packets
> >
> > Retry T2 (ms): Legacy, not used any more
> >
> > Session Timer (s): Time after a session must be re-invited (otherwise UA
> > thinks session is over). Important when pulling the Ethernet cable...
> >
> > Dirty Host TTL (s): When a SIP entity cannot be reached its pointless to
> try
> > immediately again. Set this to zero in case your network goes up and
> down.
> >
> > CS

------ message 2--------

>
> Hi!
>
> I'd like to suggest the following features for inclusion in future
> firmware
> as a result of our registration issues and field trials.  The customers in
> the field trials loved the phone once the port conflicts and NAT issues
> were
> resolved.
>
> We're just having a few problems keeping the phone registered that we'd
> like
> to solve before deploying it that's all.  (or would like to hide this
> problem!!)
>
> Here are the suggestions for future firmware :
>
> * Keep playing the dialtone when phone not registered    on/off
[Christian Stredicke] I thought we have this already?!
> * Display NR when not registered                         on/off
[Christian Stredicke] Absolutely.
> * don't send subscribe and notify requests (for the userdefined buttons)
> on/off
[Christian Stredicke] There are already some settings. The only thing which
cannot be turned off if the subscription for configuration.
> * don't lose old registrations                              on/off   (may
> or
> may not be needed)
[Christian Stredicke] Well I think that sounds more like a bug...

By: Mark Spencer (markster) 2004-04-05 09:32:15

The situation is where you see the call flow happen something like this:
SNOM => REGISTER (Cseq=X) => *    -- The original Register
SNOM <= 407 (Cseq=X) <= *         -- Our original 407
SNOM => REGISTER (Cseq=X) => *  -- Here, the SNOM retransmits the original reg
SNOM <= 407 (Cseq=X) <= *       -- Here we properly send the original 407
SNOM => REGISTER (Cseq=X+1) => * -- Now the SNOM sends the register with auth
SNOM <= 200 (Cseq=X+1) <= *    -- We send 200 OK as we should
SNOM => REGISTER (Cseq=X+1) => * -- The SNOM again sends the register with auth
SNOM <= 200 (Cseq=X+1) <= *    -- Ideally we should send 200 OK again, if working

By: Mark Spencer (markster) 2004-04-05 17:03:12

Okay I really need some debug here to move along.

By: chrisorme (chrisorme) 2004-04-05 17:18:16

I pressed upload before I went out this morning .. But then I see it failed now trying again.
---------
2006
MySQL server has gone away

Fatal error: Allowed memory size of 8388608 bytes exhausted (tried to allocate 1436784 bytes) in /home/httpd/bugs/core/error_api.php on line 105
-------------
As I guess the file is too big...

Try downloading from http://www.prioryhomes.com/snom.5apr04.txt

Soz.

By: Mark Spencer (markster) 2004-04-07 10:30:27

I don't seem to be able to download your file.  Can you not duplicate this with a normal sized file just as you did before?  Either that or select the relevent region where this is happening.

By: Mark Spencer (markster) 2004-04-07 11:09:23

Nevermind, I just simulated the problem by intentionally blocking the first 100 Trying / 200 OK going out to my pingtel and making it re-register.  Anyway it's fixed now to the greatest degree we can be, meaning that we can handle all sorts of repeats within the dialog of the REGISTER and the INVITE with all this delay.