[Home]

Summary:ASTERISK-09063: High latency (satellite) hangup on internal calls
Reporter:Tim McKee (trm3)Labels:
Date Opened:2007-03-21 08:07:35Date Closed:2011-06-07 14:03:26
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Channels/chan_sip/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) phone-with-problem.pcap
( 1) poly-ok
( 2) qual-yes.rtf
( 3) sat-poly
( 4) sip.conf
( 5) trouble.log
( 6) verbosedebug.txt
Description:Calls from polycom 601 phones with high latency (700ms) to voicemail or echo time out after 15 seconds.  There are a series of INVITE retries followed by timer expiration.

****** ADDITIONAL INFORMATION ******

Identical setup was fully functional and worked on 8/1/04 HEAD.  4 port T1 card failure required upgrade to use a new card, so upgrade was precipitous.  Normal calls appear to work properly.  Calls from a phone that is on a high latency link to an internal PBX phone number such as voicemail or echo exhibit this failure.  Low latency phones continue to work just fine.  

I upped the DEFAULT_T1MIN to 1200 without effect.  Using ANSWER prior to VOICEMAILMAIN has no impact.

SIP Debug attached.
Comments:By: Tim McKee (trm3) 2007-03-21 09:01:54

just uploaded sip.conf.

the phone is a polycom 601, it is actually my desk phone moved to a satellite link.

As a workaround, I have the customers dialing LD to their autoattendants and entering the extension for their voicemail...   This appears to work just fine, it is only a connection directly to a local extension that terminates to an app on the asterisk box that exhibits the problem.

By: Serge Vecher (serge-v) 2007-03-21 09:17:28

did you try 1.4.1?

By: Serge Vecher (serge-v) 2007-03-21 09:19:08

Also, please redo the log as per following:
1) Prepare test environment (reduce the amount of unrelated traffic on the server);
2) Make sure your logger.conf has the following line:
  console => notice,warning,error,debug
3) restart Asterisk with the following command:
  'asterisk -Tvvvvvdddddngc | tee /tmp/verbosedebug.txt'
4) Enable SIP transaction logging with the following CLI commands (1.4/trunk commands in parenthesis):
set debug 4 (core set debug 4)
set verbose 4 (core set verbose 4)
sip debug (sip set debug)
5) Reproduce the problem
6) Trim startup information and attach verbosedebug.txt to the issue.

By: Tim McKee (trm3) 2007-03-21 09:25:30

unfortunately this in on a production box with hundreds of phones all over the world.  I am out of the office today, but will attempt to duplicate it on a test box either this evening or tomorrow morning to capture the info.

By: Tim McKee (trm3) 2007-03-21 16:04:50

Per query from Michael Owens at NetxUSA:

First let me see if I understand the problem correctly.  The way I am understanding the bug report is that after upgrading to Asterisk 1.2.16 you are having a problem checking voicemail locally from phones on a satellite link but if you dial into your auto attendant from the sip phone through your PSTN phone number you can check voicemail without a problem.  

Do you get the same problem if you access an auto attendant through an local extension and then going to the voicemail app?

================================

right, if I hairpin out via the PSTN I can dial the extension code that accesses the VM.  If I dial anything that is LOCAL to the pbx, including the autoattendant, it will time out after 15 seconds.  This includes dialing the autoattendant, then using it to call tghe VM extension.  Everything *works*, it just times out.

By: Tim McKee (trm3) 2007-03-21 16:25:04

ticket on hold.  was just able to test with non-polycom sip device.  initial indications are that the problem may be polycom specific.  more testing tonight.

By: Tim McKee (trm3) 2007-03-26 14:02:59

uploaded sat-poly, a tcpdump capture of a polycom experiencing the error across a satellite link

uploaded poly-ok, a tcpdump capture of the same polycom working fine on my lan

working on the remaining captures now

By: Tim McKee (trm3) 2007-04-06 09:31:34

Added capture from phone side of satellite link when problem occurs.

By: Tim McKee (trm3) 2007-04-07 11:53:55

Verbose SIP debug captured from a 1.4 platform.  Dialed exten 1990 (a simple extension that calls 'ECHO' application.

Please reactivate this bug as I've had no feedback from Polycom yet.  I will update as I see anything else of interest.

By: Tim McKee (trm3) 2007-04-10 21:52:40

Per Olle I set qualify=yes and posted a capture of sip show peer and a failed call at the standard verbosity level as qual-yes.rtf.

By: Tim McKee (trm3) 2007-04-18 21:16:30

Patched version 1.4 and 1.2 to restore old behavior.  It appears that the problem lies with the code that now *requires* an ACK for every OK.  I don't have the time or the resources to debug further at this time, may be able to get back to it later and make a determination.  It appears from the capture that the phone does send an ACK to the first INVITE with CSEQ 2, but * has already sent another INVITE prior to receiving this ACK.  Once the phone has send an ACK it will apparently not re-ACK the same INVITE CSEQ number.  This race condition would not be detected except on high latency links.

Changed chan_sip.c, function sip_answer(), line:

     res = transmit_response_with_sdp(p, "200 OK", &p->initreq, 2);

to:

     res = transmit_response_with_sdp(p, "200 OK", &p->initreq, 1);

Perhaps a switch in sip.conf named 'permissive-acks' or something like that?

I don't run a development environment at this time, so I can't publish a decent patch set.

tim



By: Olle Johansson (oej) 2007-05-09 08:56:01

Then it all comes back to the ability to handle multiple transactions. The issue here is that we should not send another invite until we have an ACK on the first one, and that's an important issue.

By: Olle Johansson (oej) 2007-12-16 03:18:52.000-0600

Sorry. When upgrading mantis, this bug report lost the chan_sip category and ended up outside of my search pattern...

Please test with the latest 1.4 from svn, we have added some changes that may help you.

By: Joshua C. Colp (jcolp) 2008-02-14 14:03:49.000-0600

I agree with oej, the latest 1.4 should fix this so I've set this to feedback. Please do respond back with a yes or no.

By: Joshua C. Colp (jcolp) 2008-03-04 14:16:22.000-0600

Closing out due to lack of response, and I strongly suspect oej's changes fixed it.