[Home]

Summary:ASTERISK-03047: sip registrations fail if there are too many regiser => entries
Reporter:mxhess (mxhess)Labels:
Date Opened:2004-12-20 02:04:52.000-0600Date Closed:2008-01-15 15:31:06.000-0600
Priority:MinorRegression?No
Status:Closed/CompleteComponents:Core/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) big_register_sip_dump.txt
( 1) big_sip_debug.txt
( 2) sipdebug.txt
( 3) sipdebug2.txt
( 4) sipregisrations.HEAD.rev1.txt
( 5) sipregistrations.STABLE.rev2.txt
Description:receiving multiple errors when asterisk attempts to register with proxy..

Dec 20 00:37:40 WARNING[19838]: chan_sip.c:685 retrans_pkt: Maximum retries exceeded on call 69fc25a63838cbe719f9639423feae3d@66.185.96.28 for seqno 102 (Critical Request)

it appears that asterisk attempts to register all the "register =>" lines at the same time and either asterisk gets confused or the proxy cannot keep up with them and respond to all the registrations at once.

also seen is this multiple error:
Dec 20 01:00:58 WARNING[9300]: chan_sip.c:6704 handle_response: Got 200 OK on REGISTER that isn't a register

to me it looks like asterisk has troubles keeping track of a lot of sip registrations at the same time. Perhaps a rate limiter needs to be in place for handling so many registrations?

To me this isn't major right now as the registrations get retried in asterisk and will eventually all register.
Comments:By: Mark Spencer (markster) 2004-12-20 04:23:32.000-0600

How many register lines are you running?

By: mxhess (mxhess) 2004-12-20 15:33:59.000-0600

# grep register sip.conf | grep -v "\;" | wc -l
    109

By: Mark Spencer (markster) 2004-12-20 17:56:09.000-0600

Yah, that's a lot heh!

By: Clod Patry (junky) 2005-01-01 02:20:22.000-0600

Do you really need all these 109 registries?
all these registers are to the same hosts?
give us more details please about all this.

Thanks.

By: mxhess (mxhess) 2005-01-01 12:23:22.000-0600

I was planning for much more than 109..
What I'm doing is setting up a master proxy and having each asterisk system act like a client.. since the sip channel so often is called a b2bua I figured why fight it..

so the plan was to have a master SER router taking all the registrations from different asterisk systems.. sort of like dynamic routing.. I could do backup systems that way with the master trying both registration paths to find the registered client when the client in our case has the capabiility to use a failover proxy..

the registrations are all to the same master router.. only one host would register the sip entries at a time so I wasn't overloading the master..

I'm happy to provide any other details you may need but I'm not sure what else.. if anyone has further questions feel free to ask.

By: twisted (twisted) 2005-01-01 19:15:58.000-0600

Does this issue raise it's head on a non-BSD box? or do we even have the resources to try it?

By: twisted (twisted) 2005-01-01 19:17:06.000-0600

Also, did the astobj patches fix this issue?  They are in -HEAD if you can try them.

By: mxhess (mxhess) 2005-01-01 22:16:29.000-0600

As this is a production system I am not able to run -head.
Also, we do not run any linux based systems at all.. sorry.
I would speculate that it is something specific to non-linux platforms.. but that's my speculation.

By: twisted (twisted) 2005-01-01 22:24:16.000-0600

Well, I'm gonna try to get oej to take a look at this, since he's the SIP Master.  Otherwise, we're going to have to co-ordinate to allow us to recreate your exact setup on our side, and do the testing that needs to be done to isolate the issue.

By: twisted (twisted) 2005-01-01 22:24:46.000-0600

Reminder sent to oej

OEJ, can you take a look at this and make any comments/suggestions/pointers as to what could be going on here?

By: syslod (syslod) 2005-01-01 22:30:48.000-0600

I have same bug on linux.  Slack 10

By: twisted (twisted) 2005-01-01 22:32:28.000-0600

syslod, how many register entries are you using?  Can you provide more detail on this bug?

By: syslod (syslod) 2005-01-01 22:42:35.000-0600

I have tested as low as 5 and up to 200 with same results.  Also noticed that subscriptions seem to disappear then reappear when device is able to register again then it looses subscription again.  I have tested this on two other boxes all running current Slack 10 distro.  Below is errors after a few minutes with
7 registrations.  Tested with 2 Poly 500's, 1 poly 300, 3 grandstreams, and 1 softphone firefly.  The only one that seems to stay subscribed is the softphone.  Not sure the bug and the subscription is related but I've noticed the errors at the same time I noticed subscriptions.

Jan  1 23:33:25 WARNING[3775]: chan_sip.c:703 retrans_pkt: Maximum retries excee
ded on call 33348a9e192174786b2f146c40e42780@68.144.15.106 for seqno 102 (Non-cr
itical Request)
Jan  1 23:33:45 WARNING[3775]: chan_sip.c:703 retrans_pkt: Maximum retries excee
ded on call 0ade8a2e3965d51f4d6bb16875fe1e36@68.144.15.106 for seqno 102 (Non-cr
itical Request)
Jan  1 23:36:41 WARNING[3775]: chan_sip.c:703 retrans_pkt: Maximum retries excee
ded on call 4b605b491f5cabb12b73d23b6388fca7@68.144.15.106 for seqno 102 (Non-cr
itical Request)
Jan  1 23:39:48 WARNING[3775]: chan_sip.c:703 retrans_pkt: Maximum retries excee
ded on call 4a89d2fe58792c0f4c60fee40ead7e21@68.144.15.106 for seqno 102 (Non-cr
itical Request)

edited on: 01-01-05 22:46

By: Olle Johansson (oej) 2005-01-02 03:39:47.000-0600

syslod: You are talking about devices registering with or subscribing to Asterisk. This bug report are about Asterisk registering with other proxies. If you've fond a registration bug, please open another bug report :-)

mxhess: Is it possible to get a SIP dbeug of this or to log in to a server when this happens?

By: mxhess (mxhess) 2005-01-02 12:13:28.000-0600

a login to my system is not an option as this is a production system.. or did I misunderstand your question? I could also read it as "is it possible to login to the system while this is happening?" and if that is the case, then yes it is possible to login while this is happening. And by sip debug do you mean a debug display from the asterisk console or something more like an ngrep dump of all the traffic? usually, an ngrep is much easier for me..

By: khb (khb) 2005-01-02 12:25:18.000-0600

Well, first of all you need to look at the packet trace itself to see if the requests actually go out on the wire or if the target is just not responding.
If they don't go out at all, then you don't have to look any further, you have a locking issue or network problems, if they do and the target is not responding you have a protocol or addressing issue and need to look at the packets to see if they are malformed.
109 registrations is not a lot of registrations.

edited on: 01-02-05 12:29

By: mxhess (mxhess) 2005-01-02 17:42:45.000-0600

Attached is a large dump of the whole registration process from asterisk start to final sip registration with the master proxy..

It gave these errors on console.. (note that I might not have been able to get all errors on console as I can only type so fast).

Jan  2 16:33:49 WARNING[13160]: chan_sip.c:6713 handle_response: Got 200 OK on REGISTER that isn't a register
Jan  2 16:33:49 WARNING[13160]: chan_sip.c:6713 handle_response: Got 200 OK on REGISTER that isn't a register
Jan  2 16:33:49 WARNING[13160]: chan_sip.c:6713 handle_response: Got 200 OK on REGISTER that isn't a register
Jan  2 16:33:49 WARNING[13160]: chan_sip.c:6713 handle_response: Got 200 OK on REGISTER that isn't a register
Jan  2 16:33:49 WARNING[13160]: chan_sip.c:6713 handle_response: Got 200 OK on REGISTER that isn't a register
Jan  2 16:33:49 WARNING[13160]: chan_sip.c:6713 handle_response: Got 200 OK on REGISTER that isn't a register
Jan  2 16:33:49 WARNING[13160]: chan_sip.c:6713 handle_response: Got 200 OK on REGISTER that isn't a register
Jan  2 16:33:49 WARNING[13160]: chan_sip.c:6713 handle_response: Got 200 OK on REGISTER that isn't a register
Jan  2 16:33:49 WARNING[13160]: chan_sip.c:6713 handle_response: Got 200 OK on REGISTER that isn't a register
Jan  2 16:33:49 WARNING[13160]: chan_sip.c:6713 handle_response: Got 200 OK on REGISTER that isn't a register
Jan  2 16:33:50 WARNING[13160]: chan_sip.c:6713 handle_response: Got 200 OK on REGISTER that isn't a register

unfortunately these errors do not give us the best detail as to which registration was having an issue..

edited on: 01-02-05 17:43

By: khb (khb) 2005-01-02 18:55:20.000-0600

ah,  this dump isn't any good, at least I don't have time to read it, the lines are all merged and not even complete. You're better off providing a pcap file, or maybe someone else is more patient.

Here just the following hints:

Yes, all registrations get launched at virtually the same time when the sip channel starts, in sequence of course.
Are you using pedantic=yes?   (their could be a tag matching issue)
In the packets check the IP addresses, make sure the packets are going to the right ones.
If there are response only sometimes, but not always, look at the differences.
The error messages regarding getting OK on registers that aren't registers seems to indicate that the responses are arriving way too late and chan_sip has already destroyed the call since it didn't get a response.

By: mxhess (mxhess) 2005-01-02 21:08:21.000-0600

So a tcpdump file would be more helpful because? This is what ngrep sees.. the contents of each packet.. if the packets are incomplete then that is asterisk's fault and not mine. Also.. I do ngrep dumps like this all the time for debugging.. I have no problem finding things when doing such dumps..

Also this raises a question.. why does asterisk not send authentication details in the initial register request if it is provided on the sip register line? Seems like the registration sequence could be cut in half (sip message wise) if asterisk did so.. but then again, that's just my personal speculation.

I am not using pedantic checking as that is slower.. and these registrations we are dealing with are registrations done by asterisk to another master sip proxy.. in this case, ser.

And as far as taking too long to respond..
the first registration is at: 16:33:41.471601
with the final OK at: 16:33:50.591174

And that includes the registration retries..

9 seconds to do this many registrations.. I would think that isn't too bad at all.. but from what you seem to be saying.. sip registrations time out if they do not get responded to in less than 9 seconds? Seems such a short timeout is way too impatient.. and possibly that is part of the problem?

This dump was actually done from the master proxy.. the packets all appear to be getting there in proper order for each register.

As the systems are also on the same local network segement I also see no issues in delivery between systems.

So I will bring up again my idea of a rate limiter to better handle large batches of registrations.. that or increasing the allowable timeout for register requests would probably clear this up.. but again that is just my personal speculation from the discussion in this report so far.

By: Olle Johansson (oej) 2005-01-08 04:04:55.000-0600

"why does asterisk not send authentication details in the initial register request if it is provided on the sip register line? Seems like the registration sequence could be cut in half (sip message wise) if asterisk did so.. but then again, that's just my personal speculation."
SIP digest auth requires a challenge (nonce) from the server. We don't know the realm, we don't know the nonce the first time we send a register - in fact, we don't know that we have to authenticate at all at that time.

Please upload a full SIP debug output from your asterisk system so we can see everything from Asterisk's perspective. A TCPdump does not show anything of what is happening in asterisk. set the debug level to at least 4 and verbosity to at least 4, turn on SIP debug and do a "sip reload" to force initial registrations.

If you do not provide this, we can't help you and I will have to close this bug report.

By: mxhess (mxhess) 2005-01-08 12:23:49.000-0600

Attached is a sip debug of the ip that all the registrations go to..
the debug was done from the asterisk console with a verbosity of 15 and debug 15

By: Olle Johansson (oej) 2005-01-08 14:21:11.000-0600

mxhess: THank you. Please give me a moment to look at this.

By: Olle Johansson (oej) 2005-01-08 14:25:52.000-0600

Hmmm. Did you do a "sip debug" without arguments? There seems to be missing packets here...

By: Olle Johansson (oej) 2005-01-08 14:49:48.000-0600

There is definitely something fishy going on on both sides. I can see that the proxy sends multiple "200 OK" on the same call with only one "Register" sent from us...

Also, the "Got 200 OK on a register..." seems strange in this situation. Could you start the debug at a "sip reload" so I can see what happens at reload, because that's when we are parsing all those register= lines. Something seems to go wrong there...

Also, Either the console output is out of sync or you have a sip debug filter... There seems to be a lot of transactions going on without any packets shown.

By: mxhess (mxhess) 2005-01-08 22:02:58.000-0600

I'm not totally sure what you are asking..

To obtain the last dump I did the following:
sip debug ip (ip of registration server)
set verbose 15
set debug 15

I then cleared my scroll back buffer of my terminal and did a sip reload and captured the full console output..

By: Olle Johansson (oej) 2005-01-09 03:29:26.000-0600

...you did not follow my instructions then: "set the debug level to at least 4 and verbosity to at least 4, turn on SIP debug and do a "sip reload" to force initial registrations."

* start asterisk with "asterisk -r | tee /tmp/sipdebug.txt"
* set debug and verbosity
* turn on SIP debug WITHOUT a filter, just "SIP DEBUG"
* do a "sip reload"
* after a couple of failures, exit asterisk
Upload the file

By: mxhess (mxhess) 2005-01-09 11:40:10.000-0600

Your request was not indicative of needing things done in a very specific way.
check the latest file to see if it is more to your liking..

By: Olle Johansson (oej) 2005-01-09 12:30:17.000-0600

Sorry, I still can't see any debug, notices or warnings from the reload. Are those maybe disabled for the console in logger.conf ? Please check and try again.

By: mxhess (mxhess) 2005-01-09 13:00:36.000-0600

ok.. set the logger to console => debug,notice,warning,error and reloaded logger.. then did a sip reload saving console output to file.

By: Mark Spencer (markster) 2005-01-16 02:23:33.000-0600

Where does this stand now, and does this problem exhibit itself on linux or just on *bsd?

By: mxhess (mxhess) 2005-01-16 19:36:19.000-0600

All I can attest to is the bsd side where it happens for me.. I think the linux items mentioned above were a misunderstanding.. however this could be more of a channel problem then a portability issue.. *shrug*

By: Mark Spencer (markster) 2005-03-08 23:00:36.000-0600

Modified Asterisk to space out the registraions, this should fix it.  If you have more trouble, find a bug marshall or reopen.

By: Russell Bryant (russell) 2005-04-05 04:10:08

Alright folks, I have written a patch for 1.0 to address this issue.  However, chan_sip in CVS HEAD is drastically different from chan_sip in 1.0 so I'd like for someone to take a look at it before I put it in.

It is currently 5 AM and I have been staying up all night to work on the bug tracker, so that is probably another good reason why I shouldn't commit a change like this quite yet.  :)

By: Kevin P. Fleming (kpfleming) 2005-04-05 17:09:58

No, that's not quite right. What you are doing there is computing a random offset from now(), and sending all registers at that time in the future. What the CVS HEAD patch does (IIRC) is to randomly space out the individual registers, so they don't happen at the same time, and stay offset from each other as time goes on.

By: Russell Bryant (russell) 2005-04-05 20:34:33

Yeah, I just realized that same thing before even seeing your comment.  I was also calling __sip_do_register() in there, which kind of defeats the whole purpose of scheduling them.  :)

It looks like the offset isn't updated in the HEAD version as well.  I have updated my patch, and included a patch for HEAD as well.

By: Russell Bryant (russell) 2005-04-05 21:08:20

uploading new patches that are touch more simple and efficient

By: Russell Bryant (russell) 2005-04-06 12:04:36

disclaimer is on file!  :)

By: Russell Bryant (russell) 2005-04-06 12:16:32

Fixed in CVS HEAD and 1.0

By: Digium Subversion (svnbot) 2008-01-15 15:31:05.000-0600

Repository: asterisk
Revision: 5427

U   trunk/channels/chan_sip.c

------------------------------------------------------------------------
r5427 | russell | 2008-01-15 15:31:04 -0600 (Tue, 15 Jan 2008) | 2 lines

actually space out the registrations :)  (bug ASTERISK-3047)

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=5427

By: Digium Subversion (svnbot) 2008-01-15 15:31:06.000-0600

Repository: asterisk
Revision: 5428

U   branches/v1-0/channels/chan_sip.c

------------------------------------------------------------------------
r5428 | russell | 2008-01-15 15:31:05 -0600 (Tue, 15 Jan 2008) | 2 lines

space out initial registrations (bug ASTERISK-3047)

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=5428