[Home]

Summary:ASTERISK-10992: RTP timestamp skewed after return from hold.
Reporter:K Anderson (kanderson)Labels:
Date Opened:2007-12-07 12:08:43.000-0600Date Closed:2011-04-26 08:06:27
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Core/RTP
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) Bandwidth_1way.tgz
( 1) capture_with_patch_1.pcap
( 2) cli.vg01.gocentrix.net
( 3) cli.vg01.gocentrix.net.txt
( 4) issue-11491.diff
( 5) joint_filtered_capture.tgz
Description:This is the response from Level 3 on a one way audio issue I have.  For your reference I am including one of the captures.  'Bandwidth_1way' was preformed by Level 3 and 'filtered_joint_capture' was done on our box.  Both are for the same call but started at different times.  I have many other examples spanning several weeks and will upload on request.  If you use wireshark to look at the RTP jitter you will see the timestamp gaps cause large jitter numbers (sometimes as high as 30382910.59ms on other captures).

I believe the timestamps may be slipping when a call is placed on hold.  This behavior has been noted in several of our calls that worked as well as the ones that fail.  Furthermore, looking at the example skew we can see the timestamps increment by 41,120 while the wireshark capture time only increments by .123238.

As this is what I originally expected I have already tested it without NTP running and ensured the time is synchronized with Level 3, but I was still able to recreate the problem.

It may be worth mentioning this is on a Dell, which I have been told has a crummy RTC.  I dont know if that is true but it appears to have manifested itself if so.  

I have also noted two other recent tickets that deal with RTP timestamps, 0011429 and 0010355.  I am mentioning these because perhaps returning from hold uses the wrong clock or is related to the issue with incorrect timestamps after 200 ok (which I have also noticed).

Bellow is the email from Level 3:

This definitely appears to be an rtp issue based on the capture that
Marty performed. I have attached the capture that was taken in front of
our media gateway.
Based on the rtp stream, here is what I have found. If you look at the
packets from the customer up until packet number 7357, they are
correctly incrementing each packet sequence by 1 and each timestamp by
160.
Packet no. 7357 shows sequence no. 22225 and timestamp 2678222704.
Packet no. 7365 shows sequence no. 22226 and timestamp 2678263824.

This is the next packet in the sequence so the timestamp would increment
by 160 (at the most the timestamp would increment by 640(, however, the
timestamp that we receive for the next packet in sequence has
incremented by 41,120. You can see that in ethereal that this is
interpreted as a huge amount of jittter. Because our gateway is
receiving the next sequence number it also expects the timestamp to have
incremented by 160, the huge time delta causes the issue and the gateway
stops playing the audio. Certainly not an ideal behavior on our part,
however, the sequence number and timestamp should be tied together (rfc
3550 is the spec but the actual relationship between sequence and time
is relatively vague so this is how Lucent implemented).

Unfortunately there isn't much we can do on the gateway since it's a dsp
to jitter buffer relationship and would require a software patch. Is
there a way that the customer can send us re-invite for on-hold portion?
Of they could send a new SSRCID once the call comes off of hold which
the gateway would interpret as a new stream and should play out the
audio with no issues.....just a few tought. Let me know if you have any
questions.

****** ADDITIONAL INFORMATION ******


Just for the record my symptoms:
When calls are placed on hold and retrieved occasionally the PSTN side can not hear the IP or there is a 5 second delay. Half of the time it is fine.  This only happens when retrieving calls from hold.

Bellow are the statements that we (Centrix, Bandwidth.com, and Level 3) have made.

1) Audio packets are transmitted and received from Centrix equipment.
2) The issue can only be recreated if Bandwidth.com is utilized in the call path (ie. other carriers utilized by Centrix do not have the same issue).
3) The issue has effected all clients of Centrix, including those with no changes or interventions to their setup.
4) With or without the firewall active the problem still occurs.
5) The internal clock for the phone is within sync with the servers.
6) Disabling the network time protocol daemon (to eliminate any possible time shifts during a call) does not effect the issue.
7) A different pbx on a different subnet has the same issue.
8) A different phone on different Internet connection (ie DSL) has the same issue.
9) Limiting the codec to g711 or g729 has no effect.
10) The calling number or called number has no effect, issue occurs regardless.
11) Power cycling the phone or pbx has no effect.
12) There is no apparent break in the RTP stream.
13) Unassociated network traffic does not appear to be a factor.
14) There does not appear to be an correlation between CPU, memory,read/write cycles, or bandwidth utilization on the pbx.
15) Re-registration cycle during issue has no effect.
16) The no audio condition only occurs after a call is retrieved from hold.
17) Some calls suffer from a five second delay which adheres to all the above findings (only with Bandwidth.com, occurs on different pbx or phone, number called/calling has no effect, ect).
18) Assigning the phone a public IP thereby eliminating NAT does not correct the issue.
19) A phone on the same subnet as the pbx does not correct the issue.
20) The issue appeared 'suddenly' after months of service with no correlations to changes made by Centrix Networks.
21) The percentage of calls with issues fluctuates and has no known correlation to server functions or load.
22) Replacing or resetting modem cards where they hand off to the SS7 network has no effect (level 3).
23) NOC reset the modem card that the one way audio call hit with no effect
Comments:By: K Anderson (kanderson) 2007-12-07 12:21:58.000-0600

I tried to upload the captures but I am getting

APPLICATION ERROR # 401

Followed by the contents of the file in ascii.  I will tar them and try again even though they are both bellow the 2000k limit.



By: K Anderson (kanderson) 2007-12-07 12:41:06.000-0600

Managed to upload, probably had problems becuase they didnt have file extensions.  But anyway, I thought I should note that I have attempted to work on this issue as follows before opening this ticket:

1) IRC
2) Google
3) Bandwidth.com and Level 3
4) Asterisk guru I know at thevoipconnection.com
5) Email to an individual at digium
6) A previous bug report [0011400] by request of Level 3 that turned out to be erroneous and I regret submitting (I have tried to ensure this is a valid issue before I submitted this case).  Thanks file for helping me on that one.

Thank you everyone who has helped me so far and those who do.  I really appreciate it!

By: Martin Vit (festr) 2007-12-10 11:34:31.000-0600

We were having the same issue with some Chinese SIP phone. During the transfer, timestamp changed and there was one-way audio. Asterisk 1.4 put MARK to one of RTP packet (when transfering) and SIP device should recalculate its jitter. Asterisk 1.2 does not change timestamp during the transfer so there was no problem with our phones.

We have solved this upgrading firmware to new one based on our bug report.

By: Joshua C. Colp (jcolp) 2007-12-11 13:00:46.000-0600

Okay what I need is the complete console output from Asterisk with debug enabled to console in logger.conf, and a core set debug 9 with sip debug. Asterisk should be sending out the indication for the remote side to know to resync.

By: Martin Vit (festr) 2007-12-11 13:06:44.000-0600

asterisk alreaady indicates this change by marking RTP packet with MARK. We don't have these phones now so I cannot reproduce it anymore. Maybe kanderson could.

By: Joshua C. Colp (jcolp) 2007-12-11 13:07:17.000-0600

That is who I was talking to specifically.

By: K Anderson (kanderson) 2007-12-11 13:10:13.000-0600

I will upload as soon as I can but I may need to wait for the traffic to decrease (meaning after 18:00) before I can get usable logs.

By: K Anderson (kanderson) 2007-12-11 19:23:26.000-0600

Please disregard cli.vg01.gocentrix.net (it is the wrong one).  If you could delete it that would be good to avoid confusion.  

Correct caputure: cli.vg01.gocentrix.net.txt

Call from 14075469137 to 14073133136.  From the IVR the direct extension 120 was dialed ringing sip phone 001120 in context 001 at IP 72.188.148.139 and behind NAT.  The asterisk box is at 209.208.47.106 and public.  Upon answer two way audio was verified, then the caller was placed on hold, upon retrieval the caller 4075469137 could not hear the callee 4073133136; however, audio in the other direction (PSTN to IP) was fine.

If you need any other info please let me know.  Also as a heads up my dial plan is a little abnormal....  There are verbose messages that provide dial plan status, this regex should find them .*\s{8}\[[0-9]{1,4}\].*

By: Joshua C. Colp (jcolp) 2007-12-12 10:18:18.000-0600

Please give the attached patch a go.

By: K Anderson (kanderson) 2007-12-12 11:44:04.000-0600

I installed the patch but was still able to recreate the issue.  I will include a capture. In the capture packet 1652 was a comfort noise from asterisk to level 3.   That is the first time I have seen that, is it possible your patch did that?  Anyway, jitter is all over the place but going on hold/retrieving is setting the MARK and reseting the seq.  However, SSRC stays the same, shouldnt that change?  Bellow are the jitter calculations from wireshark.

Src IP addr->Dest IP addr : Max Delta (ms), Max Jitter (ms), Mean Jitter (ms)

209.208.47.106->209.247.22.194 : 119.44, 4365266.02, 68093.78
209.247.22.194->209.208.47.106 : 22.56, 1383218.64, 79433.77
209.208.47.106->72.188.148.139 : 6859.74, 1.32, 0.46
72.188.148.139->209.208.47.106 : 30.07, 374.30, 1.21

By: Joshua C. Colp (jcolp) 2007-12-12 14:14:01.000-0600

The marker bit should be enough of an indication that something changed and to start over again.

By: K Anderson (kanderson) 2007-12-12 14:27:51.000-0600

Ok, I wasn't sure.  Is there anyway we can address the timestamps as even with the MARK set in this way my underlying issue exists.  Even though this ticket is not to fix my one way audio, looking at the capture you can see large slides in the timestamps, beyond what would naturally occur.  I think these slides/skews are within the scope of the ticket or at least an indication asterisk has an issue with tracking timestamps, please advise.

By: K Anderson (kanderson) 2007-12-13 12:51:08.000-0600

I don't mean to sound ungrateful but what would you advise I do next?  Also, I noticed that since I added your patch I still get the one way audio but I can but the caller back on hold and retrieve them to correct it (something I could not do before).  This makes since I we are reseting the stream with Level 3 but I can not figure out why it would happen in the first place.  Once we have clean timestamps leading up to the hold I hope it will correct the issue.  Thank you for your help so far.

By: K Anderson (kanderson) 2007-12-17 12:24:08.000-0600

My timestamps are still out-of-wack and according to Level 3 this is causing one way audio. Can somebody please tell me what I need to do next?  Do I need to hire Digium to fix this problem?  Is this ticket considered closed? Thank you.

By: Joshua C. Colp (jcolp) 2007-12-17 12:28:29.000-0600

No, it is not considered closed. You just have to have patience. There are over 370 issues currently open and only a few of us work on bugs. As well some of us try not to work on the weekends :)

By: K Anderson (kanderson) 2007-12-17 13:46:46.000-0600

I apologize, I just didn't know what was going on.  I really do appreciate all your help so far and I will wait for a update.  Thank you.

By: jmls (jmls) 2008-02-17 14:36:57.000-0600

file, were you able to work some weekends ?

:)

By: jcomellas (jcomellas) 2008-03-04 11:56:53.000-0600

We have been experiencing the same problem with Asterisk 1.4.18 and Level 3. The suggestion from Level 3 to resolve the problem was:

1) Send us a new SSRC and we will treat this as a new RTP stream and there will be no audio loss.

2) Send the time in the expected next increment 160.  Packet 11623 Time=59200 and then packet 11632 Time=2568454632.  Packet 11632 should be 59360.

By: Digium Subversion (svnbot) 2008-03-04 12:02:32.000-0600

Repository: asterisk
Revision: 105674

U   branches/1.4/channels/chan_sip.c
U   branches/1.4/include/asterisk/rtp.h
U   branches/1.4/main/rtp.c

------------------------------------------------------------------------
r105674 | file | 2008-03-04 12:01:46 -0600 (Tue, 04 Mar 2008) | 8 lines

When a new source of audio comes in (such as music on hold) make sure the marker bit gets set.
(closes issue ASTERISK-10000)
Reported by: wdecarne
Patches:
     10355.diff uploaded by file (license 11)
(closes issue ASTERISK-10992)
Reported by: kanderson

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=105674

By: Digium Subversion (svnbot) 2008-03-04 12:05:23.000-0600

Repository: asterisk
Revision: 105675

_U  trunk/
U   trunk/channels/chan_sip.c
U   trunk/include/asterisk/rtp.h
U   trunk/main/rtp.c

------------------------------------------------------------------------
r105675 | file | 2008-03-04 12:05:15 -0600 (Tue, 04 Mar 2008) | 16 lines

Merged revisions 105674 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r105674 | file | 2008-03-04 14:05:28 -0400 (Tue, 04 Mar 2008) | 8 lines

When a new source of audio comes in (such as music on hold) make sure the marker bit gets set.
(closes issue ASTERISK-10000)
Reported by: wdecarne
Patches:
     10355.diff uploaded by file (license 11)
(closes issue ASTERISK-10992)
Reported by: kanderson

........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=105675

By: Digium Subversion (svnbot) 2008-03-04 12:51:42.000-0600

Repository: asterisk
Revision: 105728

_U  team/murf/bug6002/
U   team/murf/bug6002/CHANGES
U   team/murf/bug6002/channels/chan_local.c
U   team/murf/bug6002/channels/chan_sip.c
U   team/murf/bug6002/channels/chan_zap.c
U   team/murf/bug6002/funcs/func_version.c
U   team/murf/bug6002/include/asterisk/_private.h
U   team/murf/bug6002/include/asterisk/rtp.h
U   team/murf/bug6002/main/asterisk.c
U   team/murf/bug6002/main/autoservice.c
U   team/murf/bug6002/main/channel.c
U   team/murf/bug6002/main/hashtab.c
U   team/murf/bug6002/main/pbx.c
U   team/murf/bug6002/main/rtp.c
U   team/murf/bug6002/res/snmp/agent.c

------------------------------------------------------------------------
r105728 | murf | 2008-03-04 12:51:34 -0600 (Tue, 04 Mar 2008) | 228 lines

Merged revisions 105558,105561,105564,105566,105569,105571,105573-105574,105589-105590,105592-105595,105597,105675,105677 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
r105558 | file | 2008-03-03 08:16:57 -0700 (Mon, 03 Mar 2008) | 14 lines

Merged revisions 105557 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r105557 | file | 2008-03-03 11:15:39 -0400 (Mon, 03 Mar 2008) | 6 lines

Add a comment to describe some logic.
(closes issue ASTERISK-11558)
Reported by: flefoll
Patches:
     chan_sip.c.br14.patch-just-a-comment uploaded by flefoll (license 244)

........

................
r105561 | file | 2008-03-03 08:30:29 -0700 (Mon, 03 Mar 2008) | 15 lines

Merged revisions 105560 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r105560 | file | 2008-03-03 11:28:59 -0400 (Mon, 03 Mar 2008) | 7 lines

It is possible for no audio to pass between the current digit and next digit so expand logic that clears emulation to AST_FRAME_NULL.
(closes issue ASTERISK-11364)
Reported by: edgreenberg
Patches:
     v1-11911.patch uploaded by dimas (license 88)
Tested by: tbsky

........

................
r105564 | russell | 2008-03-03 08:59:50 -0700 (Mon, 03 Mar 2008) | 40 lines

3) In addition to merging the changes below, change trunk back to a regular
  LIST instead of an RWLIST.  The way this list works makes it such that
  a RWLIST provides no additional benefit.  Also, a mutex is needed for
  use with the thread condition.


Merged revisions 105563 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r105563 | russell | 2008-03-03 09:50:43 -0600 (Mon, 03 Mar 2008) | 24 lines

Merge in some changes from team/russell/autoservice-nochans-1.4

These changes fix up some dubious code that I came across while auditing what
happens in the autoservice thread when there are no channels currently in
autoservice.

1) Change it so that autoservice thread doesn't keep looping around calling
  ast_waitfor_n() on 0 channels twice a second.  Instead, use a thread condition
  so that the thread properly goes to sleep and does not wake up until a
  channel is put into autoservice.

  This actually fixes an interesting bug, as well.  If the autoservice thread
  is already running (almost always is the case), then when the thread goes
  from having 0 channels to have 1 channel to autoservice, that channel would
  have to wait for up to 1/2 of a second to have the first frame read from it.

2) Fix up the code in ast_waitfor_nandfds() for when it gets called with no
  channels and no fds to poll() on, such as was the case with the previous code
  for the autoservice thread.  In this case, the code would call alloca(0), and
  pass the result as the first argument to poll().  In this case, the 2nd
  argument to poll() specified that there were no fds, so this invalid pointer
  shouldn't actually get dereferenced, but, this code makes it explicit and
  ensures the pointers are NULL unless we have valid data to put there.

(related to issue ASTERISK-11555)

........

................
r105566 | russell | 2008-03-03 09:02:19 -0700 (Mon, 03 Mar 2008) | 11 lines

Merged revisions 105565 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r105565 | russell | 2008-03-03 10:01:50 -0600 (Mon, 03 Mar 2008) | 3 lines

Update the copyright information for autoservice.  Most of the code in this file
now is stuff that I have written recently ...

........

................
r105569 | russell | 2008-03-03 10:06:35 -0700 (Mon, 03 Mar 2008) | 11 lines

Merged revisions 105568 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r105568 | russell | 2008-03-03 11:05:16 -0600 (Mon, 03 Mar 2008) | 3 lines

Fix a potential memory leak of the local_pvt struct when ast_channel allocation
fails.  Also, in passing, centralize the code necessary to destroy a local_pvt.

........

................
r105571 | russell | 2008-03-03 10:17:27 -0700 (Mon, 03 Mar 2008) | 11 lines

Merged revisions 105570 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r105570 | russell | 2008-03-03 11:16:53 -0600 (Mon, 03 Mar 2008) | 3 lines

In the case of an ast_channel allocation failure, take the local_pvt out of the
pvt list before destroying it.

........

................
r105573 | qwell | 2008-03-03 11:08:05 -0700 (Mon, 03 Mar 2008) | 15 lines

Merged revisions 105572 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r105572 | qwell | 2008-03-03 12:06:52 -0600 (Mon, 03 Mar 2008) | 7 lines

Fix types for astNumChannels and astConfigCallsProcessed.

(closes issue ASTERISK-11553)
Reported by: jeffg
Patches:
     12114.patch uploaded by jeffg (license 192)

........

................
r105574 | russell | 2008-03-03 11:49:34 -0700 (Mon, 03 Mar 2008) | 4 lines

Fix some code that was improperly changed in revision 104866 from issue ASTERISK-11520.

(closes issue ASTERISK-11566, reported by elguero, patched by me)

................
r105589 | russell | 2008-03-03 21:26:39 -0700 (Mon, 03 Mar 2008) | 3 lines

Use ast_copy_string() instead of strncpy(), and use sizeof() instead of
a magic number

................
r105590 | russell | 2008-03-03 21:28:48 -0700 (Mon, 03 Mar 2008) | 3 lines

- Add curly braces around the while loop
- Properly break out of the loop on error when an included context is not found

................
r105592 | russell | 2008-03-03 21:31:53 -0700 (Mon, 03 Mar 2008) | 11 lines

Blocked revisions 105591 via svnmerge

........
r105591 | russell | 2008-03-03 22:31:29 -0600 (Mon, 03 Mar 2008) | 4 lines

Backport a minor bug fix from trunk that I found while doing random code
cleanup.  Properly break out of the loop when a context isn't found when
verify that includes are valid.

........

................
r105593 | russell | 2008-03-03 21:44:28 -0700 (Mon, 03 Mar 2008) | 2 lines

remove unnecessary casts

................
r105594 | russell | 2008-03-03 21:47:32 -0700 (Mon, 03 Mar 2008) | 2 lines

Make it so you don't have to cast away const in a couple places

................
r105595 | russell | 2008-03-03 21:57:29 -0700 (Mon, 03 Mar 2008) | 2 lines

Simplify a trivial snprintf() with ast_copy_string()

................
r105597 | russell | 2008-03-04 09:55:17 -0700 (Tue, 04 Mar 2008) | 2 lines

Update CHANGES heading

................
r105675 | file | 2008-03-04 11:08:42 -0700 (Tue, 04 Mar 2008) | 16 lines

Merged revisions 105674 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r105674 | file | 2008-03-04 14:05:28 -0400 (Tue, 04 Mar 2008) | 8 lines

When a new source of audio comes in (such as music on hold) make sure the marker bit gets set.
(closes issue ASTERISK-10000)
Reported by: wdecarne
Patches:
     10355.diff uploaded by file (license 11)
(closes issue ASTERISK-10992)
Reported by: kanderson

........

................
r105677 | file | 2008-03-04 11:11:38 -0700 (Tue, 04 Mar 2008) | 10 lines

Merged revisions 105676 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r105676 | file | 2008-03-04 14:10:34 -0400 (Tue, 04 Mar 2008) | 2 lines

In addition to setting the marker bit let's change our ssrc so they know for sure it is a different source.

........

................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=105728

By: Digium Subversion (svnbot) 2008-03-05 18:03:36.000-0600

Repository: asterisk
Revision: 106299

_U  branches/1.6.0/
U   branches/1.6.0/channels/chan_sip.c
U   branches/1.6.0/include/asterisk/rtp.h
U   branches/1.6.0/main/rtp.c

------------------------------------------------------------------------
r106299 | russell | 2008-03-05 18:03:32 -0600 (Wed, 05 Mar 2008) | 24 lines

Merged revisions 105675 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
r105675 | file | 2008-03-04 12:08:42 -0600 (Tue, 04 Mar 2008) | 16 lines

Merged revisions 105674 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r105674 | file | 2008-03-04 14:05:28 -0400 (Tue, 04 Mar 2008) | 8 lines

When a new source of audio comes in (such as music on hold) make sure the marker bit gets set.
(closes issue ASTERISK-10000)
Reported by: wdecarne
Patches:
     10355.diff uploaded by file (license 11)
(closes issue ASTERISK-10992)
Reported by: kanderson

........

................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=106299