[Home]

Summary:ASTERISK-12069: Dead air between answer and packet2packet bridge
Reporter:Kamil Czajko (kactus)Labels:
Date Opened:2008-05-22 20:26:52Date Closed:2008-08-07 14:28:24
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Channels/chan_sip/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) OK-SDP-delay.png
( 1) rtp_debug.txt
( 2) SipDebugCleaned.txt
( 3) sip_working_-_Cleaned.txt
( 4) svn1.6.0_debug
Description:Hi we have been testing asterisk 1.6 extensively as we intend to replace our long in the tooth 1.2 box that acts as our gateway between our offices and the switched telephony network.

Asterisk 1.6 talks to directly to a cisco call gateway via sip which talks to the out side world via PRI

One issue that we have noticed repeatedly is that there is a large delay between when a call is answered and when voice traffic actually flows. The delay is also asymmetrical and of the scope of about 2 seconds. This is very noticeable as calling someone generally misses the entire greeting.

Call flow essentially goes like this:
start call -> ringing -> answered (other party start talking “welcome to company this is Cameron”) -> their voice flows 2 seconds later and we hear “ameron”

If we talk they can't here anything either at the beginning.

I have been mainly testing this with a snom 190 (have also tried sp962) connected via sip to the 1.6 box (over nat).

We have also tested this by passing the voice out to one of the larger voice providers (who also use cisco equipment) and they have stated time and time again that it is not their end. Both Cisco gateways run unauthenticated accepting calls from particular ips automatically.

RTP debug information is attached (RTP stats attached to bottom of it.)

Please let me know if you need anything else. We have run this on two 1.6 boxes one running beta 8 the other running beta 9.


****** ADDITIONAL INFORMATION ******

Can not run svn due to issues raised in http://bugs.digium.com/view.php?id=12508
Comments:By: Kamil Czajko (kactus) 2008-06-24 04:50:41

Hello been playing around with this a little more to maybe make it less obscure an issue.

I've created some test cases to maybe highlight the problem and so maybe someone can suggest a further course of action/testing/debugging.

We have a couple of servers

A1.2 - old asterisk  1.2.9.1 current core of our network that we would ultimately like to replace. It has a digium card within that talks straight out one of our PRIs (PRI1). It has old hardware and has been running in place since 2004 or so.
ATB - trixbox 2.6.0.0 office pbx
A1.6 - New asterisk 1.6beta9 machine which is configured to talk sip out to our local cisco call gateway with a PRI wic (PRIcisco). This is by far the beefiest box.

Psip - Snom 190 test phone that talks sip
Ppstn- standard pstn phone as benchmark

We are calling a local isp with a three cylable name and their greeting kicks off as soon as their pbx picks up. Their greeting is "Welcome to ABC support. Please note that for quality ...."

The greeting kicks in so quick that Ppstn sounds like it drops the W ie “elcome to ABC ....” this is our bench mark against all others and I called it from each system several time to ascertain there was no variability.

Next we tested the setup as it is currently for the office.

Psip - SIP -> ATB – IAX2 -> A1.2 -> PRI1 = “elcome to ABC ...” ie this has no perceivable delay relative to the pstn line.

Next we swapped ATB with A1.6

Psip - SIP -> A1.6 – IAX2 -> A1.2 -> PRI1 = “(t)o ABC support. Please note....” ie it dropped the "welcome" and the “t” off “to”

Finally we removed the A1.2 box from the chain and the result was:

Psip - SIP -> A1.6 – SIP -> PRIcisco = “ort. Please note...” which was around 2 and a bit seconds of dead air before voice flowed.

It seems the more that A1.6 has to do the longer the delay.

What tests would you suggest I try to get around this or what debug information would be useful. I’m afraid I am unable to run a recent svn due to the bug lodged as 0012508.

I also thought it may be due to the dont_optimize compile flag so reran the tests with dont_optimize unticked with no difference.

All the best - Kactus

By: Jared Smith (jsmith) 2008-06-24 07:04:38

This issue will be *much* easier to track down if you can provide a SIP trace of the problem.

Please add "debug" to the "console" line in logger.conf, and then do the following from the Asterisk CLI:

core set verbose 9
core set debug 4
logger reload
sip set debug

Then go ahead and make your call, and copy/paste the information into a text file and attach it to this bug report.

By: Kamil Czajko (kactus) 2008-06-25 02:08:35

Hello JSmith

Please find the SipDebugCleaned.txt attached

This is the wosrt case of it talking to the cisco

Psip - SIP -> A1.6 – SIP -> PRIcisco

Let me know if you need any thing else, the 600+ lines of
[Jun 25 14:16:40] DEBUG[76060]: channel.c:1129 channel_find_locked: Avoiding initial deadlock for channel '0x8800000'
might have something to do with it if I was a betting man.

Cheers

By: Kamil Czajko (kactus) 2008-07-03 01:26:32

Hello again

Please note that I'm now in a position to set up several xen virtuals side by side so can do extensive testing of this issue if you require with multiple codes bases.

Please let me know what you would like me to provide you and what compiler flags you would like set.

Thank you.

By: Himal R Rana (ranahimal) 2008-07-03 03:29:41

Are u using realtime peers?
Here it showing lot of realtime peer lookups in the debug log.
U can enable   rtcachefriends=yes sip.conf so they will be cached in the peer list of chan_sip.

By: Kamil Czajko (kactus) 2008-07-03 04:33:48

Hello Ranahimal

Yes we are using realtime peers through odbc to a MSSQL database with freetds 0.64 as our driver.

Unfortunately some of the things we are doing do not allow us to run rtcachefriends=yes as we need to be able to programatically manage end users sip connections (such as password and context) on the fly without requiring a reload.

Also our sip service interconnect provider does not take registration, just accepts connections from an ip address so even if we wanted to we couldn't cache the connection. (they are also our ISP so this method is secure)

I didn't notice until you pointed it out that it appears to be querying the database several times after it begins to dial, but this does not explain why when it talks through the 1.2 box there is much less delay (as this connection is not cached either) unless SIP and IAX code differences are causing this to happen.

What would you like to to try for you, to help you diagnose the cause of this?

Thanks you for your time in investigating this issue.

By: Kamil Czajko (kactus) 2008-07-10 03:53:23

Hello Ranahimal

I installed 1.4.17 on one of the boxes and no longer had a delay. I am updating the sip debug.

Please let me know how you would like me to proceed

By: Kamil Czajko (kactus) 2008-07-21 01:14:01

Hello everyone

I am now able to run with SVN and can confirm that the delay is in Asterisk SVN-trunk-r132312 but not in 1.4.17 (but that has its own issue)

I have only tested with the xen box with the Psip - SIP -> ASVN – IAX2 -> A1.2 -> PRI1 configuration which produces “to ABC support. Please note....” ie it dropped the "welcome"

Any progress or advice would be greatly appreciated.

By: Himal R Rana (ranahimal) 2008-07-21 06:49:23

Hello!
Can i have look at ur dial plan here?
I m also seeing lot of channel_deadlock which occurs in asterisk 1.6 due to new extensions.conf look up engine i think!

By: Kamil Czajko (kactus) 2008-07-21 09:51:08

Hello Ranahimal

For this one box I have kept things as simple as I can so my dial plan is:

[from-sip55]
exten => h,1,Hangup()
exten => _.,1,Dial(IAX2/Account:Pass@GW/${EXTEN})

I have now set up a couple of different tests and have had a longer play with the systems and am perplexed as for the first time I am experiencing variable performance including occasionally better than PSTN.

ATB = Trixbox 2.6
SVN1 = SVN-trunk-r132312 on xen sitting on same network as ATB
A1.2 = current core at datacentre with quadspan e1 card
SVN2 = SVN-trunk-r132312 on freebsd at datacentre talking to sip provider (67ms)

snom -> ATB -> A1.2 is pstn speed ("elcome")
snom -> ATB -> SVN2 looses the first second or so (ie the starting "welcome")
snom -> SVN1 -> A1.2 is variable from pstn speed to dropping a second of voice (this is one or the other not a range ie its occasionally quick but usually drops the full "welcome").
snom -> SVN2 is very quick, faster than pstn the W is quite clearly made out in "Welcome" but first second is slightly garbled
snom -> SVN1 ->SVN2 is faster than pstn but first second is much more garbled. (this might be due to the sip provider though)

It seems svn playing with svn is very quick, but mixing older boxes and newer boxes causes more delays. I'm not sure if this is due to different sip handling code but I can confirm that there is significant improvement over the previous 1.6 beta9 which experienced at worst over 2 seconds of dead air consitently.

I know the logical thing to test would be the A1.2 to the sip provider to confirm that its responsible for the garbling but unfortunately I'd need to change the ip of the A1.2 for it to authenticate and the box is too busy to take off line even now at night.

I'll contact the provider to add a couple of IP's to the range and update once they have allowed it.

Regardless it is much better than beta 9.

By: Kamil Czajko (kactus) 2008-07-21 10:01:43

Also please note I was unable to test svn to local cisco with pri card as that span is currently in maintenance mode. When that is resolved I will include those tests.

Thank you - Kactus

By: Kamil Czajko (kactus) 2008-07-27 22:29:55

Hello again

I have tested with both Asterisk SVN-branch-1.6.0-r133300 and SVN-branch-1.6.0-r134006 both result in the same (new) behaviour in the

Snom -SIP> SVN -IAX2> A1.2 test, namely voice will begin to flow stop and then start again. ("Welcome <long silence> ort Please note")this misses "to ABC Supp"

I will upload the debug in a second, it has RTP,RTCP,SIP and IAX debug turned on.

Thank you

By: pj (pj) 2008-07-28 08:59:42

Asterisk SVN-trunk-r133579
I have similar issue, sip2sip call, same codecs, both peers behind NAT, canreinvite=no, delay about two sec, until both party hears each other.
Maybe this problem can be related to bug 0013115, almost every call generates locks erros after packet2packet bridging:

[Jul 28 15:49:34]     -- SIP/286-0ab9d2d8 answered SIP/ipbx-gw-b5b44e98
[Jul 28 15:49:35]     -- Packet2Packet bridging SIP/ipbx-gw-b5b44e98 and SIP/286-0ab9d2d8
[Jul 28 15:49:35] ERROR[19413]: chan_sip.c:19128 handle_request_do: We could NOT get the channel lock for SIP/ipbx-gw-b5b44e98!
[Jul 28 15:49:35] ERROR[19413]: chan_sip.c:19129 handle_request_do: SIP transaction failed: 2d578b5958a6934d0345434b02483e1a@192.168.40.4



By: pj (pj) 2008-07-28 09:12:48

I should also notice my voip topology:
sip phone - asteriskA(trunk-r133579) - asteriskB(trunk-r114064)-phone
Locks errors appears on asteriskA console
I will try to update asteriskB to fresh svn checkout and will report, if things will be better.



By: pj (pj) 2008-08-06 16:14:19

I found potential source of delay until both party hears each other after call is connected. I found, that after one party answers call, it sends SIP/OK with SDP to asterisk, but asterisk forwards this OK/SDP to other party with about 500ms delay! I think, there is no reason for this. My asterisk is completelly idle, only one call precessed, when this debug was taken.
For ilustration attaching picture with graph of sip message exchange between two parties.



By: pj (pj) 2008-08-07 02:34:15

I think, that category for this issue should be changed from RTP to SIP, because problem is caused probably in signaling control (sip), not transport media (rtp).
kactus, if you can, please upload packet dump from asterisk (packet capture when communicating affected parties), that should give better overview when opened eg. with wireshark, sip debugs from asterisk are not providing an easy survey, not contain timestamps etc.

By: Leif Madsen (lmadsen) 2008-08-07 08:17:44

Changing category to Channels/chan_sip/General per pj's comments.

By: Steve Murphy (murf) 2008-08-07 11:50:05

Hopefully, the changes we made to trunk/1.6.0 to solve 12772 (and others) clears up this problem for you. Please retest with the latest svn and report back.


By: pj (pj) 2008-08-07 11:55:41

I'm using Asterisk SVN-trunk-r135717 (updated about two days ago)
sip graph, that shows this SIP/OK lag is also from this revision

By: Steve Murphy (murf) 2008-08-07 12:27:02

I added relationship to 12772, the chan_sip slowdown in trunk bug, but pj says
that this bug does not go away when the newest trunk version is tested. Sigh.


By: Mark Michelson (mmichelson) 2008-08-07 13:05:57

Repost of what I said in response to PJ on the asterisk-users list:

"If you check ast_answer in channel.c of trunk, you can see that it calls __ast_answer(chan, 500). The 500 there is a 500 ms delay that occurs before calling the channel's answer callback. In the case of SIP, this would indeed mean that there is a 500 ms delay between receiving the 200 OK from the callee and sending a 200 OK to the caller."

The initial commit that added this delay was 50538, and it was modified slightly with commit 50571.

Then issue ASTERISK-12255 came up and the delay behavior was changed again with commits 127113 and 127157.

The reason for adding the delay in the first place was to allow time for the media to be set up after the call had been answered. To be honest, with where the delay is now, it really doesn't make a lot of sense for it to be there since it's not actually allowing for media to be set up (since Asterisk hasn't even sent a 200 OK to the caller yet).

By: pj (pj) 2008-08-07 13:42:06

Mark, thanks for explanation, if I understand your last sentence correctly, you will try to find some better solution to solve issues about your wrote, than adding this ugly 500ms delay?
Keep in mind, that adding any delays to setup call process have negative impact, especially, when users are behind NAT device. If you look at sip graph picture, that I posted here, overall end-end setup delay from one party answers and starts sending rtp, to time when both rtp streams are established is about 1,2s! It's time of initial greetings in conversation, if it's lost it's very confusing and annoying for voip users.

By: Mark Michelson (mmichelson) 2008-08-07 14:12:37

Yes, I have written a patch which addresses this issue. It will be committed very shortly.

By: Digium Subversion (svnbot) 2008-08-07 14:28:21

Repository: asterisk
Revision: 136631

U   trunk/main/channel.c

------------------------------------------------------------------------
r136631 | mmichelson | 2008-08-07 14:28:20 -0500 (Thu, 07 Aug 2008) | 13 lines

Scrap the 500 ms delay when Asterisk auto-answers a channel.
Instead, poll the channel until receiving a voice frame. The
cap on this poll is 500 ms.

The optional delay is still allowable in the Answer() application,
but the delay has been moved back to its original position, after
the call to the channel's answer callback. The poll for the voice
frame will not happen if a delay is specified when calling Answer().

(closes issue ASTERISK-12069)
Reported by: kactus


------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=136631