Summary:ASTERISK-01673: [patch] Lag responses cannot go through jitter buffer
Reporter:stevekstevek (stevekstevek)Labels:
Date Opened:2004-05-21 15:58:55Date Closed:2008-01-15 14:56:52.000-0600
Versions:Frequency of
Environment:Attachments:( 0) lagpatch2.txt
Description:Currently, in chan_iax2, for two peers A, B, A will generate a Lagrq (using A's timestamps), and send the request to B.

B will run the frame through it's jitter buffer, and then send a LAGRP with the same timestamp (A's original timestamp), and send it back to A.

A will then try to send the frame back through it's _own_ jitter buffer, before updating the lag value.  This is broken.  It might have been done such that the lag measurement would measure the round-trip transit time, plus the average jitter buffer delay, but it doesn't do that correctly, because the timestamp which comes back is from A's reference.

In cases I've seen, the clocks on two hosts may drift [this is more of a case with some Win9x machines running libiax2], and then when the LAGRP comes back to A, it is way out-of line with timestamps that B normally sends.  If the packet is very late, it goes right through the jitter buffer, and all is OK.  But if the packet seems very early, it may be delayed for a very long time in the jitter buffer, and the LAG measurement will be incredibly large.

I've seen cases where the clocks drift by 1 second/minute:  After a 30 minute call, the lag measurement might be 30000ms  + the actual lag.

I do need to hunt down the source of the clock drift, but the current behavior is definately not doing what may have been intended.

This patch causes LAGRPs to be handled immediately.  The LAG is then measured as the RTT + the average delay imposed by the remote jitterbuffer.
Comments:By: Brian West (bkw918) 2004-05-21 17:46:00

Update the patch to use the coding guidlines.  We use tabs not spaces to indent code.


By: Mark Spencer (markster) 2004-05-22 11:37:29

I'm not sure i understand the "bug".  Of course the timestamp has to be from "A"'s timestamp because it has to measure it through the whole system, e.g.:

Measure time at instant 'q'.  Send message with timestamp 'q' to other side, go through its jitter buffer, receive, send back through our own.  Now measure the time with instant 'r'.  The "Lag" round trip is 'r' - 'q' since they were both measured off the same clock.

Am I missing something?

By: stevekstevek (stevekstevek) 2004-05-23 10:30:58

The problem is that when the frame comes back to us, we try to run it through _our_ jitterbuffer, with our timestamp.

Our jitterbuffer is set up for jitterbuffering frames that come to us with the remote timestamps; i.e. "min" and "max" are calculated using frames that come from the remote side.  So, when a frame comes to us with our timestamp, it may be way off from the timestamps we're set up to receive.

An example:

Assume almost no jitter and almost no network delay.

After having a call up for 5000 seconds, both timestamp values should be in the range of 5000s.  However, due to clock skewing, it might be that our timestamps might be in the 5000s range, while the remote timestamps might be in the 4950s range.  We send a LAGRQ with a timestamp of 5000s, and shortly thereafter, the remote sends back a LAGRP with that same timestamp.  

This LAGRP ends up being about 50s later than average, and what happens is that schedule_delivery will deliver the frame in 50s, and the LAG value shown in iax2 show channels will be ridiculously high.  This is clearly not representative of what our jitterbuffer is doing for the average frame sent by the remote side [it compensates for the clock skew because min and max will both be based on the remote timestamps].  

If the clock skew is in the other direction, schedule_delivery will end up delivering the frame immediately.  This will of course not lead to a high LAG value in iax2 show channels, but it still means that the local jitterbuffer is not used in the calculation.

So, unless we change the protocol such that LAGRPs have both the original (local) timestamp [which we can use in the end for lag calcultation, as well as a remote timestamp [used for jitter buffering], then the easiest solution to make the LAG calculation sensible seems to just process them immediately, and have them reflect just the network transmission time + the remote jitterbuffer.

In this case, it might make sense to also change the display of jitter to display instead the average jitterbuffer delay for incoming packets.  Currently, it displays the calculated jitter, which is generally the same, except in the cases of large clock skew.

By: Mark Spencer (markster) 2004-05-30 18:07:23

reasonging (and patch) accepted.  added to latest cvs head.

By: Digium Subversion (svnbot) 2008-01-15 14:56:52.000-0600

Repository: asterisk
Revision: 3115

U   trunk/channels/chan_iax2.c

r3115 | markster | 2008-01-15 14:56:52 -0600 (Tue, 15 Jan 2008) | 2 lines

Fix lag in diverging clocks (bug ASTERISK-1673)