|Summary:||ASTERISK-03164: SIP registrations fail|
|Date Opened:||2005-01-02 11:55:15.000-0600||Date Closed:||2011-06-07 14:05:31|
|Environment:||Attachments:||( 0) diff_v1.604_v1.605.txt|
|Description:||This may be related to bug ASTERISK-3073104. I have tested as low as 5 and up to 200 with same results. Noticed that SIP phones seem to disappear then reappear when device is able to register again. I have tested this on two other boxes all running current Slack 10 distro. Below is errors after a few minutes with 7 registrations. Tested with 2 Poly 500's, 1 poly 300, 3 grandstreams, and 1 softphone firefly. The only one that seems to stay subscribed is the softphone. Not sure the error and the subscription is related but I've noticed the errors at the same time I noticed subscriptions getting lost. Basically it creates a problem in that SIP phones can register and all calls start going to voicemail until the problem works itself out.|
Jan 1 23:33:25 WARNING: chan_sip.c:703 retrans_pkt: Maximum retries excee
ded on call email@example.com for seqno 102 (Non-cr
Jan 1 23:33:45 WARNING: chan_sip.c:703 retrans_pkt: Maximum retries excee
ded on call firstname.lastname@example.org for seqno 102 (Non-cr
Jan 1 23:36:41 WARNING: chan_sip.c:703 retrans_pkt: Maximum retries excee
ded on call email@example.com for seqno 102 (Non-cr
Jan 1 23:39:48 WARNING: chan_sip.c:703 retrans_pkt: Maximum retries excee
ded on call firstname.lastname@example.org for seqno 102 (Non-cr
With SIP debug things look normal when the phone is working. After 10 minutes it will then be unable to INVITE and will retry a number of times then CANCEL a number of times.
|Comments:||By: khb (khb) 2005-01-02 13:11:21.000-0600|
I give the same initial advice as in the other report, run a debug trace on the SIP channel and on the wire to see if the packets acutally go out when they should, and see if there is a server response when they do go out, but fail.
Is the registration period for the phones set to 10 minutes? What is the expiration time for the SUBSCRIBEs set to?
You say the softphone stays "subscribed". what is the softphone subscribing to? Wasn't aware that Firefly can subscribe to anything.
If you mean "registered" you need to be precise in what you describe.
By: syslod (syslod) 2005-01-02 13:36:10.000-0600
Wire is seeing packets. Registration expire is listed as 3600 in sip show peer. The phones that don't have subscribe seem to work without error (Softphone-Firefly). In a nutshell, the poly and grandtreams will loose their ability to be called within about 10 minutes of reseting everything. The firefly softphone always works. I'll gather more info shortly.
By: Brian West (bkw918) 2005-01-02 14:02:32.000-0600
try with qualify=yes and qualify=no
I suspect its an issue with sip_poke_peer
By: Olle Johansson (oej) 2005-01-02 14:32:29.000-0600
Syslod: how current is your cvs?
By: Olle Johansson (oej) 2005-01-02 14:34:26.000-0600
Please explain "Noticed that SIP phones seem to disappear then reappear when device is able to register again. " - how did you notice this?
By: khb (khb) 2005-01-02 14:39:13.000-0600
If you are actually seeing packets on the wire without response (do you have a response?), then you have to look at the packet to make sure the addresses are correct and at network issues downline. Are the phones behind a NAT? If so do you know what the binding time is for your NAT? Set the phones' expiration times to 1 Minute to make sure the NAT stays open.
Do the 3600s expires come from the phones or from sip.conf? Anything configured for 10 Minutes?
By: syslod (syslod) 2005-01-02 14:42:12.000-0600
qualify=yes doesn't resolve. I can definatly see errors (as orginally posted) once deadlocked as I attempt to make calls to SIP devices.
By: syslod (syslod) 2005-01-02 14:46:35.000-0600
CVS is HEAD as of today.
By: syslod (syslod) 2005-01-02 14:48:11.000-0600
Noticed SIP phones disappearing when during testing cycles you would goto voicemail with no rings. After a couple of hours it would just start working again AKA(reappear).
By: syslod (syslod) 2005-01-02 14:51:03.000-0600
I believe the expire comes from the phone but sip.conf matchs the 3600 but doesn't increase when I change it. I've tried on both routed and NAT with the same result. I'll be back in the lab shortly to do further testing.
By: Olle Johansson (oej) 2005-01-02 15:05:37.000-0600
syslod: You need to be more specific. When they "disappear" - what is the status in "sip show peers" or "sip show peer"?
We also need SIP debug output for SIP bugs.
The errors above just tell us that some SIp packets we send didn't get any answer, so we cancelled the transaction. To help you, we need to know more, detailed information.
By: Fernando Romo (el_pop) 2005-01-02 16:02:58.000-0600
Yep. i testing with a Polycom IP 300 and have the same problem, i think the use of same IP in two or more extension is making the register of SIP extension erratical.
I make a call to the voice mail with the polycom and with the cahnnel open and sending DTMF to the aap_voicemail, sundenly, the server send to the phone a busy header, and the phone produce a busy signal merging the sound with the voicemail app.
The version 1.609 of chan_sip.c produce this error, i start to testing with the prior version.
By: Olle Johansson (oej) 2005-01-02 16:06:23.000-0600
I will repeat this until I am going insane: Please provide a SIP debug of a failed transaction (or several) so we can see what is happening in your network!
By: Brian West (bkw918) 2005-01-02 17:16:46.000-0600
oej I don't think you can get a sip debug out of this. I think the whole thing gets blown up like the fix in bug 3217 the whole do_monitor thread was getting clobbered and once that took place all hell broke loose. To truely get the info needed once the box is in this state do this:
gdb /usr/sbin/asterisk `cat /var/run/asterisk.pid`
gdb asterisk PIDOFASTERISK
Once connected you'll do this:
thread apply all bt
attach that output.
PS only do this when its in this mucked up state.
By: Mark Spencer (markster) 2005-01-02 17:52:01.000-0600
Uh, if do_monitor isn't running, you won't get retransmissions anymore (duh!)
Please do what everyone has asked you to do and submit the sip debug if you want us to spend our time on this. Also updating to latest CVS head will clear out that reload issue anyway.
By: Brian West (bkw918) 2005-01-02 18:22:27.000-0600
I think Moc is having a similar issue but he has yet to get me the info... Three people are having this problem on IRC but not one can get me either access to the box nor can they get me the bt to look at... :(
By: Fernando Romo (el_pop) 2005-01-02 18:55:37.000-0600
Ok, I back in time version to version to test the sip channel. i start from version 1.609 and back until work in version 1.604
With version 1.604 of chan_sip.c recompile * and work fine, from version 1.605 the sip registration fails.
i make a "cvs diff -r 1.604 -r 1.605 chan_sip.c" (i attach the file "diff_v1.604_v1.605.txt")
By: Brian West (bkw918) 2005-01-02 20:50:33.000-0600
I knew it was somewhere in the subscribe code because Moc has phones that do that but I don't so I couldn't reproduce the problem.
By: Kevin P. Fleming (kpfleming) 2005-01-02 23:02:10.000-0600
So it appears that this problem only occurs with SIP phones that send SUBSCRIBE for MWI? If so, we need to change the problem summary, as it is not 'registrations' that are failing.
By: Kevin P. Fleming (kpfleming) 2005-01-02 23:42:17.000-0600
Please try applying the patch from bug ASTERISK-3193221 and see if you can still reproduce the problem.
edited on: 01-03-05 19:06
By: syslod (syslod) 2005-01-03 17:40:41.000-0600
Patch did make the phones seem to stay up longer and come back quicker but problem still exsists. We were thinking it may be a NAT issue so we moved everything to a private subnet. Phones and * on the same network and the problem still exsists.
By: Kevin P. Fleming (kpfleming) 2005-01-03 18:01:39.000-0600
OK, JerJer noticed some of the same problems, and also applied the patch from 3234 to see if that would help (which was just posted this morning). If you can, please try that one as well.
By: syslod (syslod) 2005-01-03 18:37:27.000-0600
I've been playing for over 20 minutes with no failures which is way past the point it used to fail. Looks like it may be fixed. I'll make another post after I setup a stress test.
By: syslod (syslod) 2005-01-03 18:58:31.000-0600
Testing on the large platform with success and all seems well now. Applied numerous patchs listing in this and other tweaks that may or not of been needed. Those also having the problem I'll be glad to walk thru what I have. Thanks.
By: Mark Spencer (markster) 2005-01-04 13:20:36.000-0600
Was it a deadlock or not? If it's a deadlock, we need to find the source of the deadlock -- it's not sufficient to simply add some other patches.
By: syslod (syslod) 2005-01-04 15:04:13.000-0600
I've tested all day without any problems after having problems over a week. I'm really not sure what "deadlock" was or if it was even my problem, but after applying patches recommended in this and getting CVS HEAD as of last night my SIP problems are gone. Some folks are rolling back to determine the cause but I don't seem to have a thread aware GDB on the affected equipment. If I can help in any way to figure out what patch fixed it let me know but at this point it appears to be fixed.
By: Mark Spencer (markster) 2005-01-05 16:21:38.000-0600
Does the problem occur in unpatched CVS?
By: syslod (syslod) 2005-01-06 20:18:20.000-0600
Seems to work without this patch running CVS head.