Summary:ASTERISK-16380: [patch] Using app_jack with JACK_HOOK will result in noise rather than the channel sound.
Reporter:Fabio Torchetti (mrwho)Labels:patch
Date Opened:2010-07-16 05:50:24Date Closed:
Versions:Frequency of
Environment:Attachments:( 0) app-jack.patch
( 1) app-jack-rev2.patch
( 2) asterisk_debug_version0
( 3) asterisk_jack_output_fix_version0.patch
Description:Using JACK_HOOK to manipulate the sound of a channel may result in noise rather than proper sound. It was quite hard to squash this bug, but it's all related to the way the ringbuffers behave.

When a channel is set up app_jack will start queuing frames from jackd to the ringbuffer to be sent to asterisk, problem being that at that point the channel may be not yet ready - say that you're doing a Dial next - and the buffer fills up.

Once the buffer is filled the next frames coming in may be cut. If a float - made of 4 bytes - could be only partially written, that would end up in a consequent succession of "wrong" samples, all off by one, in other words: noise.

A bit of ASCII art to make the point, say that we have three samples (A,B and C) coming from jack looking like:

------A------ ------B------ ------C------
|A1|A2|A3|A4| |B1|B2|B3|B4| |C1|C2|C3|C4|

 Now, on the ringbuffer we have space only for 7 bytes, what will be written in the buffer will look like:

------A------ ----B-----
|A1|A2|A3|A4| |B1|B2|B3|

When the next samples (D,E) come in - and say we have the space to write them completely - our ringbuffer will look like:

------A------ ------B------ ------D------ ----E-----
|A1|A2|A3|A4| |B1|B2|B3|D1| |D2|D3|D4|E1| |E2|E3|E4|

When those samples will be sent to Asterisk they will only be noise, since they're all off by one.

The same issue may happen if - for any reason - the buffer is filled by any other reason. This will happen often expecially if  - to have very low latency - you're using very small buffers in app_jack and short periods in jackd.

We have three possible solutions to this issue:

# We free, by reading, enough bytes from the ringbuffer and then write the new samples. So we would drop the older frames, we just have to make sure to drop entire frames and not only parts of them - else we are simply reintroducing the same problem;
# When we don't have space we will drop the whole buffer and start writing from the beginning again. With small buffers this is not really an issue and we avoid running into a case where we simply keep writing at the top of the buffer dropping each time a few frames from the bottom of it;
# We use (1) for a set amount of consecutive times, after which we do (2). In this way we have a chance to drop very few frames in case of a momentary buffer underrun.

The patch I'm attaching at the moment for testing uses (2), it's very well suited for small buffers. I'm working on a patch to allow (1) (2) and (3) to be chosen as parameters to the JACK_HOOK function.  There is (1) in the patch as well, but it has been commented out, you may want to test drive it to see how it works on your environment.

I started my experimentinig with app_jack and used for a while the patch attached to issue 0016023. However that patch has a flaw in the way it handles the AST_AUDIOHOOK_DIRECTION_READ, the fact that it does itself is wrong.

My understanding - and my usage so far - of app_jack is that it manipulates audio when it's being written on the channel, the calls to the functions that fill the jack output ringbuffer and that read from the input ringbuffer are correct in the original version of app_jack in they way they are.

Separating the two calls in two different directions makes it so that data written to the channel is not actually modified by the passage to jackd.

One last thing: the patch adds an option for app_jack to set the ringbuffer size from the dialplan. The option is "r(buffersize)".
Comments:By: Fabio Torchetti (mrwho) 2010-07-16 07:57:06

Added the a more "production ready" patch. It includes the fix discussed above and there are three added options to JACK and JACK_HOOK that allow to define the behaviour to have on a ringbuffer overrun:

 - 'd': 'drop', will drop the whole RB every time there is a full buffer;
 - 'e': 'empty', will empty enough bytes from the RB to allow for writing;
 - 't(<limit>)': 'tricky', will behave as 'e' for <limit> consecutive times,
   then it will drop the RB completely and start counting again.

By: Leif Madsen (lmadsen) 2010-07-16 10:55:13

Thanks for the patches and the descriptive overview! I'm marking this as Ready for Testing. I guess I can drop the first patch right?

By: Fabio Torchetti (mrwho) 2010-07-16 12:54:14

Yes, the first patch was a quick fix - the second is a real patch.

The app has still some issues - not related to my patch, fortunately - and I'm still looking into it, since I'm planning on using app_jack in a project and I'd like to have it more stable that it currently is.

By: Leif Madsen (lmadsen) 2010-07-16 13:13:31

mrwho: sounds good -- when you have solutions for the other issues (or at least an analysis of them) please feel free to open other issues.

By: Motiejus Jakstys (motiejus) 2010-08-25 09:00:49

I am eager to test this. Patch is not really ready:
patching file app_jack.c
Hunk #2 succeeded at 40 with fuzz 1.
Hunk ASTERISK-1 succeeded at 191 (offset -1 lines).
Hunk ASTERISK-2 FAILED at 262.
Hunk ASTERISK-3 succeeded at 292 (offset -1 lines).
Hunk ASTERISK-4 succeeded at 432 (offset -1 lines).
Hunk ASTERISK-5 FAILED at 458.
Hunk ASTERISK-6 succeeded at 491 (offset -1 lines).
Hunk ASTERISK-7 succeeded at 587 (offset -1 lines).
Hunk ASTERISK-8 succeeded at 709 (offset -1 lines).
Hunk ASTERISK-9 succeeded at 717 (offset -1 lines).
Hunk ASTERISK-10 succeeded at 751 (offset -1 lines).
Hunk ASTERISK-11 succeeded at 799 (offset -1 lines).
Hunk ASTERISK-12 succeeded at 838 (offset -1 lines).
Hunk ASTERISK-13 succeeded at 849 (offset -1 lines).
Hunk ASTERISK-14 succeeded at 858 (offset -1 lines).
Hunk ASTERISK-15 succeeded at 905 (offset 3 lines).
Hunk ASTERISK-16 succeeded at 969 (offset 3 lines).
Hunk ASTERISK-17 succeeded at 1078 (offset 8 lines).
Hunk ASTERISK-18 succeeded at 1100 (offset 8 lines).
2 out of 22 hunks FAILED -- saving rejects to file app_jack.c.rej

app_jack.c.rej: http://paste.ubuntu.com/483453/

Please review if possible.

By: Motiejus Jakstys (motiejus) 2010-08-26 01:20:45

Applied rejects manually.

This patch is a life-saver for those who use Jack. I use JACK_HOOK this way:

exten => s,n,Set(JACK_HOOK(manipulate,i(${CHANNEL}-${ARG1}:input),o(${CHANNEL}-${ARG1}:output),c(${CHANNEL}-${ARG1}))=on)

But still in rare cases get the garbled sound (1 call out of 200). Any way to debug this or change parameters?

But on the whole the system is _much_ more stable now. Thank you for great work!

By: Fabio Torchetti (mrwho) 2010-08-26 07:40:29

Hi Motiejus,

 the patch should be applied to the trunk (SVN) version of asterisk. What's the version you are patching? I think there is till some work to be done on this, at the moment the project I was working on is on hold.
 As soon as I get back to this I'm planning a radical rewrite.


By: Motiejus Jakstys (motiejus) 2010-08-26 09:34:02

I was patching against
Yes, SVN and differ. Not too much, but in some places significant:

I'll have to try the SVN version.


By: David Richards (rawdod) 2010-11-19 09:16:11.000-0600

I want to express that I am extremely interested in this patch. But I think there is some additional complications that should be looked into. Yesterday I started messing with asterisk and jack and ran into problems with the jack output app right away. The problem was that the audio input would go all screechy and drop out after a short time, usually less than a minute. It did atleast initially seem to have some correlation to higher volume and/or higher frequency, but I didn't go out of the way to prove that. The next thing I tested was changing the jack server from 44100 to 8000hz. This actually was completely stable. (Using the app_jack and asterisk system from ubuntu 10.10) I was happy that it was stable, but of course I need to run jack at 44100. I switched it back to 44100, and tried several different codecs and that did not seem to make a difference. So I jumped into the code. (I've been around a few jacks and rings before..) I cut out all the code dealing with jack_hook so I was left with just the app. I found that the function that seemed to be actually report getting mashed up from the issue was queue_voice_frame as it would report so once I changed the debug line to a log line.. so I knew there was a ringbuffer related issue at this point. I went down one path (not fully understanding the code at all at this point ) of changing handle_jack_audio to not write a ast_frame if (res < read_len). That might have been totally up the wrong tree tho. Then after the 10 hours of messing with it I found this patch :P

This is good stuff perhaps, but its not stable for me. I've tried the patch on the SVN revision mentioned in here, the latest SVN and 1.8. For me, I still get the same 'tried to write bytes but couldn't' error that I got before, however the audio won't go screechy and fail. However, the entire asterisk server will crash sometimes when the caller hangs up from the jack_app!  I have tried running jack at various period sizes and changing the ringbuffer size as advised in the patch, however nothing is totally stable or glitch free. The best number I got was something like 9600, it caused only 15 or so buffer glitches on a 10 minute call but when the call was over asterisk crashed.

At any rate, I definitely want to do whatever I can to get this working, If I have to go study up on asterisk internals and ringbuffer logic, I'll do it (could take a while tho...)  Yesterday was really the first time I had messed with asterisk in years, and I code Ruby mostly, so I'm not used to as much debugging off by one pointer to pointer arrays and such things, but I can cause some damnage ;P Hats off to anyone else working on this problem, any patches or comments you got I'm ready to test and listen. You can find me on freenode irc as oneman.

Cheers, David

By: David Richards (rawdod) 2010-11-19 09:29:47.000-0600

I want to mention also that the use of some ascii art here got me wanting to turn the whole app_jack into an ascii art to help fully understand it. Im a coder, but even for me this is a little down and dirty, so a good map might just help. It is my personal opinion that this code should ideally be extremely solid, in that, not matter what happens with asterisk, this is never at fault, we send blank samples to jackd if we have to, etc.  If its documented well it would help ensure we don't end up with another annoying situation, which is it works perfectly but crashes once a week or what not. Another thing that should be in this if needed (And I suspect it might be) Is an option for atleast a hard limiter, so that no samples clip. Clipping samples can cause some trouble down the line literally with jack apps listening to this. In my case I actually use the ladspa 5ms lookahead limiter right after asterisk and it works perfectly and is better than a hard limiter, but I think it should an option and probably on by default.

By: David Richards (rawdod) 2010-11-21 07:05:50.000-0600

For some reason the upload of things here is a bit jacked up so to speak.

Here is my patch: https://gist.github.com/708718 The license to which is whatever you want it to be, however it was developed with the help of satan himself, so if you use it lets just say its not going to win you any points with the big man.  

And just as importantly my debug output and commentary: http://media.rawdod.com/asterisk_debug_version0