|Summary:||ASTERISK-18964: Stuttering jittery audio after MOH|
|Reporter:||Kris Shaw (shawkris)||Labels:|
|Date Opened:||2011-12-05 07:32:39.000-0600||Date Closed:||2012-03-14 12:48:03|
|Environment:||Attachments:||( 0) hold-filtered.pcap|
( 1) iaxdebug.txt
( 2) iaxdebug-with-ts.txt
( 3) jitter_buffer_12_02_12.diff
( 4) jitter-1.patch
( 5) jitterbuffer-2012-02-23.diff
( 6) jitterbuffer-2012-02-26.diff
( 7) spa501-hold-decode.txt
|Description:||1000ms jitter buffer is enabled for a IAX2 trunk. One side of the trunk has Cisco Linksys SPA50x handsets (and ISDN hardware). The other side of the trunk has ISDN connection to a PBX. When a SPA50x puts acall on hold, the audio starts to become jittery for a few seconds on the other side of the trunk. The remote asterisk console shows a resync, as expected. If the jitter buffer is disabled, then the audio is fine.|
I believe the issue is caused by the change in timestamps between the handset audio source and the MOH source. I believe the jitter buffer is throwing away good frames. If I set the jitter buffer to a very large value (60000ms) then the audio is OK. I remember seeing something similar with SIP; a large ts change (e.g. if a high latency connection clears up) would result in stuttery audio for a few seconds. I found this was caused by jb_put not allowing frames outside of the threshold being added to the queue. However, these would have been 'good' frames to be played out a future time.
|Comments:||By: Kris Shaw (shawkris) 2011-12-05 12:39:48.913-0600|
I've done some experiementing with jitterbuf.c. I have found that if the jitterbuffer is emptied when a resync is required (history_put) everything works OK. This is similar to jb_empty_and_reset_adaptive in abstract_jb.c.
By: Leif Madsen (lmadsen) 2011-12-07 11:53:27.502-0600
Is it possible for you to provide a patch that demonstrates how this could potentially be resolved? Thanks!
By: Kris Shaw (shawkris) 2011-12-08 14:35:38.083-0600
This patch will empty and reset the jitter buffer on a resync. I don't know if it's the right approach, or if it will leak frames.
By: Kris Shaw (shawkris) 2011-12-14 13:30:45.080-0600
After looking again at the code, I think some of the problems are with r52494 (jitterbuf allows itself to be overfilled).
I think this has two effects:
- if there is a sudden decrease in latency, what would have been good frames are dropped which then causes audio drop-outs when their playback time is reached.
- if the jb is at its maximum and a frame that is outside of the resync threshold comes in, it cannnot reach history_put and a resync is not performed because it is dropped.
By: Kris Shaw (shawkris) 2012-01-31 11:52:21.688-0600
Update. I have been running with r52494 effectively reverted by commenting out the relevant 'if' statements in jitterbuf.c (and not using the attached jitter-1.patch). Everything is working OK, with the JB resyncing OK after MOH.
By: Matt Jordan (mjordan) 2012-02-01 08:48:04.294-0600
There may be that something needs to be tweaked in r52494, but it also existed for a reason. To get a better feel for the audio that is causing your particular scenario, can you attach a pcap containing the RTP stream of the audio?
By: Kris Shaw (shawkris) 2012-02-08 10:07:47.073-0600
Trace from SPA, repeatedly putting the call on/off hold. The capture shows the jump in timestamp coming off hold, but the sequence doesn't jump. e.g. seq 26409=960, seq 26410=1120.
By: Matt Jordan (mjordan) 2012-02-17 15:02:38.531-0600
I've been thinking through this for awhile and working through the pcap. What I can say is that removing r52494 would be a bad idea - the jitter buffer cannot let itself be filled beyond the amount of time its allowed to buffer. In general, its not preferable to let audio that occurred later in time appear to have occurred at the next possible time slot of the previous packet.
That being said, I'm confused by the pcap based on your previous comment. In your comment, you note that sequence 26409 has a time of 960 and sequence 26410 has a time of 1120. Unless I'm mistaken, however, that's not the problem - those look to be packets coming within a reasonable frame of time. I scanned through the pcap and found a few examples of what appear to be the handset switching between its audio and your MOH source. Looking at Sequence 26689 (packet #569), the next packet in the sequence is 26690. However, the time difference between 26689/26690 is huge - both in the timestamp that the packet capture observed the packets (approximately 4.4 seconds) and the difference between the timestamps in the packets themselves (45760 -> 81120). Nothing in the pcap indicates that those packets, regardless of the time they arrived on the network interface or the timestamps in the packets themselves, did not have a huge time gap between them.
Based on what the problem description reads, it appears as if the handsets and whatever the MOH source is have different clocks. If the sources of the data have inconsistent time sources, such that the audio stream sent to Asterisk appears to have jumps in timestamps, I'm not sure Asterisk should or try to correct that problem. I don't think there is anything that can be done that wouldn't introduce worse problems. If I'm misunderstanding something, please let me know.
By: Kris Shaw (shawkris) 2012-02-18 03:50:06.322-0600
If the SPA is on the local Asterisk, then the jittering audio is heard at the remote end of an IAX trunk. The MOH source is the local Asterisk.
With regards to the jb, I am not sure it can be overfilled. jb_put calls history_put where there are checks for changes in delay.
At the moment, if the jb is working at its maximum, nothing gets through to history_put as the frames are dropped in jb_put.
Perhaps these checks and resync should be moved to jb_put. e.g. 3 frames > threshold then empty and reset jb, as is done in history_put.
By: David Woolley (davidw) 2012-02-20 06:01:25.596-0600
As noted in an email to asterisk-dev, that is lost somewhere in lists.digium.com, the Asterisk frame structure doesn't track the synchronisation source, preferring to use the marker bit instead. Maybe all IAX devices cope with this convention, in which case it may be a red herring for this particular issue, but there are RTP devices that don't. Cisco phones respond particularly badly, but not 100% so, to timestamp changes without a corresponding source identifier change.
By: Matt Jordan (mjordan) 2012-02-20 09:39:04.239-0600
That's an interesting point, in that its missing the resync logic in history_put. I still don't think we can simply remove the overfill logic, but I've gone ahead and done a very simple patch to see what happens if we move the overfill logic to after the resync logic. This isn't sufficient for a fix, as non-VOICE packets now aren't considered for overfill, but its a start. Can you test it and see if it at least addresses the problem you have? If it does, I'll move towards making a real patch for review.
By: Kris Shaw (shawkris) 2012-02-21 05:31:30.981-0600
I've done a quick test and it wasn't successful. Repeatedly putting a call on/off hold produced progressively worse jittering.
I think this is because the jb isn't emptied by the reset, so the frames are still dropped by the overfill check as the timestamps are out of threshold.
This could probably be fixed with an empty/reset (as I tried in my original patch) or re-writing the timestamps (as done in fixedjitter.c).
By: Sean Bright (seanbright) 2012-02-21 13:00:15.888-0600
This may be completely unrelated, but a fix for IAX2 trunk timing went in a few days ago ([r355746|http://svnview.digium.com/svn/asterisk?view=revision&revision=355746]) that have resolved similar symptoms I was facing. You can try to apply [the patch|http://svnview.digium.com/svn/asterisk/branches/1.8/channels/chan_iax2.c?view=patch&r1=355746&r2=355745&pathrev=355746] on both sides of your IAX trunk and see if that helps you at all. Might be a red herring, but worth mentioning.
By: Kris Shaw (shawkris) 2012-02-22 12:50:28.131-0600
I think I have worked out where the problem is coming from. I added some debug code so I could see the timestamps as frames were entering the jb.
The issue is that control frames are not being considered for the re-sync logic, so they are added straight into the jb with the wrong resync_offset. However, this means that the jb can now contain frames with a large jump in timestamps, which breaks the overfill logic.
[2012-02-22 18:46:06.952] DEBUG jitterbuf.c: Head 11165, Tail 11205, TS 7103
[2012-02-22 18:46:06.953] DEBUG jitterbuf.c: Head 11165, Tail 11225, TS 7123
[2012-02-22 18:46:07.002] DEBUG jitterbuf.c: Head 11205, Tail 11245, TS 7143
[2012-02-22 18:46:07.002] DEBUG jitterbuf.c: Head 11205, Tail 11265, TS 7163
[2012-02-22 18:46:07.039] DEBUG jitterbuf.c: Control frame with ts 11260
[2012-02-22 18:46:07.040] DEBUG jitterbuf.c: Control frame with ts 11263
[2012-02-22 18:46:07.042] DEBUG jitterbuf.c: Control frame with ts 11266
[2012-02-22 18:46:07.051] DEBUG jitterbuf.c: Head 11265, Tail 15388, TS 7183
[2012-02-22 18:46:07.051] DEBUG jitterbuf.c: Attempting to exceed Jitterbuf max 1000 timeslots
[2012-02-22 18:46:07.051] DEBUG jitterbuf.c: Head 11265, Tail 15388, TS 11220
[2012-02-22 18:46:07.101] DEBUG jitterbuf.c: Head 15382, Tail 15388, TS 11240
[2012-02-22 18:46:07.101] DEBUG jitterbuf.c: Head 15382, Tail 15388, TS 11260
[2012-02-22 18:46:07.152] DEBUG jitterbuf.c: Head 15382, Tail 15388, TS 11280
[2012-02-22 18:46:07.152] WARNING chan_iax2.c: Resyncing the jb. last_delay 28, this delay -3969, threshold 1082, new offset -153
[2012-02-22 18:46:07.152] DEBUG jitterbuf.c: Head 11433, Tail 15388, TS 11300
[2012-02-22 18:46:07.152] DEBUG jitterbuf.c: Attempting to exceed Jitterbuf max 1000 timeslots
[2012-02-22 18:46:07.152] DEBUG jitterbuf.c: Head 11433, Tail 15388, TS 11320
[2012-02-22 18:46:07.202] DEBUG jitterbuf.c: Head 11433, Tail 15388, TS 11340
[2012-02-22 18:46:07.202] DEBUG jitterbuf.c: Head 11433, Tail 15388, TS 11360
[2012-02-22 18:46:07.253] DEBUG jitterbuf.c: Head 15382, Tail 15388, TS 11380
[2012-02-22 18:46:07.253] DEBUG jitterbuf.c: Head 11533, Tail 15388, TS 11400
[2012-02-22 18:46:07.253] DEBUG jitterbuf.c: Attempting to exceed Jitterbuf max 1000 timeslots
[2012-02-22 18:46:07.253] DEBUG jitterbuf.c: Head 11533, Tail 15388, TS 11420
[2012-02-22 18:46:07.302] DEBUG jitterbuf.c: Head 11533, Tail 15388, TS 11440
[2012-02-22 18:46:07.302] DEBUG jitterbuf.c: Head 11533, Tail 15388, TS 11460
[2012-02-22 18:46:07.352] DEBUG jitterbuf.c: Head 15382, Tail 15388, TS 11480
[2012-02-22 18:46:07.352] DEBUG jitterbuf.c: Head 11633, Tail 15388, TS 11500
[2012-02-22 18:46:07.352] DEBUG jitterbuf.c: Attempting to exceed Jitterbuf max 1000 timeslots
By: Kris Shaw (shawkris) 2012-02-23 14:51:17.294-0600
This patch attempts to resync the buffer immediately if a control frame is received. The jitter buffer is still checked for overfilling. I've done some testing and it appears to resolve the jittering audio after moh.
By: Kris Shaw (shawkris) 2012-02-26 17:52:39.423-0600
This patch doesn't count dropped frames twice.
By: Sean Bright (seanbright) 2012-02-27 05:33:40.841-0600
Did applying that patch from r355746 have no effect?
By: Kris Shaw (shawkris) 2012-02-27 05:47:12.282-0600
r355746 didn't improve things here. I've also noticed this problem with SIP, but it's only recently with I've been able to see what could be wrong with the JB.
By: Matt Jordan (mjordan) 2012-03-02 15:20:03.498-0600
I am still working on this. Can you attach a DEBUG log with iaxdebug turned on? If it also had your jitter buffer debug, that'd be great - I'm curious as to what chan_iax has to say about the control frames with the odd timestamps.
By: Kris Shaw (shawkris) 2012-03-07 14:14:17.261-0600
iaxdebug-with-ts.txt contains IAX with debug and additional ts logging. A call was placed over the IAX trunk and put on/off hold.
By: Matt Jordan (mjordan) 2012-03-13 12:47:49.900-0500
I've tested your patch and wrote some unit tests to verify the functionality. I'm putting it up on review board for peer review.
I actually added back in the frames_dropped counts - when I tested overfilling the jitter buffer, I found that the frames weren't being counted as dropped at all. If a check_resync fails, the frame is never put into the queue. When a jb_get is called for a frame that would occur at that time slot, you should get back either a NO_FRAME or an interpolated frame (if voice) - in either case, the dropped count wasn't incremented.