ASTERISK-12324: Recording speed too fast (running BRI B410P)

[Home]

Summary: ASTERISK-12324: Recording speed too fast (running BRI B410P)

Reporter: Alex Barnes (alexb_uk) Labels:

Date Opened: 2008-07-07 05:12:37 Date Closed: 2008-10-15 09:39:46

Priority: Major Regression? No

Status: Closed/Complete Components: Applications/app_mixmonitor

Versions: Frequency of
Occurrence

Related
Issues:

Environment: Attachments: ( 0) 1221502040.50904.WAV
( 1) 13005_final.patch
( 2) 13005_tolerance.patch
( 3) 13005_try2.patch
( 4) 13005.patch
( 5) audiohook.patch

Description: Hi,

I have been asked by Digium support to raise a bug against Mixmonitor for making corrupt recordings in the latest version of Asterisk (1.4.21.1 and 1.4.21).

The bug does not appear when running 1.4.19.2 (with or without the latest Zaptel drivers).

The recordings are playing back at high speed and with very bad quality.

I'm not the only person seeing this issue: http://forums.digium.com/viewtopic.php?t=20985

Someone on the forum claims this bug has previously been raised and fixed but sorry I could not locate it in Mantis.

Just to note I have tested the original WAV files prior to converting them to MP3 and the issue remains.

The audio on the call itself is near perfect.

There are no errors in the logs / on the console.

Hardware:
Digium B410P card on BT ISDN2e lines (UK).

Example Command:
[Jul 3 11:06:20] VERBOSE[29391] logger.c: -- Executing [s@macro-record:19] MixMonitor("mISDN/11-u10", "/tmp/03072008-110620--IN.wav||/usr/local/bin/wav2mp3 "/tmp/03072008-110620--IN.wav" "03072008-110620--IN.mp3" "1215079580.4" "0" "X"") in new stack

If anyone is kind enough to look into this issue but doesnt have access to a BRI setup then please let me know and I can provide remote access to a server that does.

I could be incorrectly assuming this is related to BRI as I would have thought this issue would have been found already if it happened on T1/E1 lines. I do have access to an E1 line although it will take me a little time to update a server and test. Please let me know if this would be helpful.

Thanks for any help.

Comments: By: Mark Michelson (mmichelson) 2008-07-08 19:23:25

This is a very strange problem. I think the first step in figuring this out is to isolate the problem as either BRI-related or not. If you can test using either an E1 or with VoIP and see if the problem remains, that would help.

I tried reproducing using SIP phones and analog phones on a Digium TDM400P and had clear recordings with no problems. I used a very limited dialplan to test though (turn on Mixmonitor, dial an endpoint, hangup). I think there are some people here at the office who have a B410P set up as well. I can try using it tomorrow for MixMonitor recordings and see if I can reproduce the error then.
By: adria vidal (adriavidal) 2008-07-09 03:50:56

Same bug for us with Mixmonitor Asterisk 1.4.21.1 with a full sip instalation, no cards on it. Changing format of sound from wav, to WAV or gsm make no change, audio is speedup.
By: adria vidal (adriavidal) 2008-07-09 04:10:10

For us going back to a Asterisk 1.4.19.1 solved the bug too.
By: Mark Michelson (mmichelson) 2008-07-09 09:50:06

All right, it could be an audiohooks change that was made after 1.4.19.1 caused the speedup. I'll diff the code and see if there's anything that could point to the cause.
By: Mark Michelson (mmichelson) 2008-07-09 09:58:06

Out of curiosity, if you issue the following command in the CLI "core set debug 1 audiohook.c" does the following ever appear on the screen:

"Flushing audiohook <some hex value> so it remains in sync" ?

Unfortunately the CLI is going to get spammed with a bunch of other audiohook-related messages, so spotting the above message in real-time may not be easy. You'll probably want to grep the log file to see if it's there.
By: adria vidal (adriavidal) 2008-07-09 10:07:26

Sorry the machine was in production so we can't not upgrade it right now, maybe in 2 or 3 days.
By: Alex Barnes (alexb_uk) 2008-07-09 10:14:06

I enabled the debug "core set debug 1 audiohook.c" and you constantly get the following printed out in the "full" log file:

[Jul 9 16:19:29] DEBUG[22200] audiohook.c: Flushing audiohook 0x85fe738 so it remains in sync
[Jul 9 16:19:29] DEBUG[22201] audiohook.c: Read factory 0x85fe790 and write factory 0x85ff1ac both fail to provide 160 samples
[Jul 9 16:19:29] DEBUG[22201] audiohook.c: Read factory 0x85fe790 and write factory 0x85ff1ac both fail to provide 160 samples
[Jul 9 16:19:29] DEBUG[22201] audiohook.c: Write factory 0x85ff1ac was pretty quick last time, waiting for them.
By: Diogenes Eduardo Makibara (dioedu) 2008-07-10 16:45:53

I have the same problem, when i update the system from 1.4.19 to 1.4.21.1. In my case, ALL the mixmonitor calls have problems, then, i reproduce that easily.
One of the changes from those versions, is a change to correct a "out of sync" problem, that flush audios in audiohook.c. When i commented that, the problem was resolved.

I'll do the patch.

Diógenes Makibara
By: Mark Michelson (mmichelson) 2008-07-10 20:00:22

dioedu's license is still pending so his patch isn't visible yet. While looking into this issue more, I found a typo in audiohook code. I'm not 100% sure if fixing it will actually fix this speedup issue though.

I am attaching the fix as 13005.patch. Please test and let me know if it fixes the problem.
By: Alex Barnes (alexb_uk) 2008-07-11 04:27:50

Unfortunately putnopvut your patch hasnt fixed the fast audio problem on B410P/BRI.

On a side note I have tested using E1 lines and the recording appears to be ok, although I wasn't able to test with a real E1 line yet just loop back using Digium hardware:

SIP Phone -> Asterisk Server 1 (rec) -> Asterisk Server 2 -> Asterisk Server 1 (moh)

Thanks for the efforts though.
By: Mark Michelson (mmichelson) 2008-07-11 16:07:39

dioedu's patch comments out the section that I suspected was causing the trouble. However, that section was added for the specific purpose that mixmonitor files were recording audio which was out-of-sync.

See issue ASTERISK-11394 to see why that section was added. There could be a logical flaw there, though. I'll see if I can tweak some things and come up with something that makes sense and works.
By: Mark Michelson (mmichelson) 2008-07-11 16:10:21

Correction: That section of code was added to fix issue ASTERISK-11719. My mistake. It still had to do with out-of-sync audio, though.
By: Mark Michelson (mmichelson) 2008-07-14 09:55:21

I have uploaded a new patch (13005_try2.patch) which may also help in this issue. It and 13005.patch should *both* be applied.

One thing that definitely occurred on my system when I applied 13005_try2.patch was that the number of debug messages about "flushing audiohook" and "read/write factory was pretty quick" dropped to nearly zero. This is encouraging to me. Please try this and see if it helps. Thanks for your patience on this one.
By: Alex Barnes (alexb_uk) 2008-07-17 11:20:43

Sorry for the delayed response.

I have applied both patches and am still getting speedy recordings......

However the "Flushing audiohook " is now not being generated or at least not that I could see AND randomly some recordings are ok.

Occasionally recordings appear to start out sounding speeded up but very quickly (1-2seconds) slow down to the correct speed. This could be my imagination as the majority of recordings have ringing noise only for the first few seconds.

The ratio of broken to good is in the range of 1 good for 7 bad.

Thanks

Alex
By: Joel Vandal (jvandal) 2008-07-31 11:07:05

Currently testing these patches on 2 servers that have some problem with out of sync audio (can happen 1 call on 100, before 12996, we have about 95% out of sync).

For now, all look ok (i.e. no out of sync recording) but will continue to monitor for few days since the problem doesn't occur on each call.
By: Leif Madsen (lmadsen) 2008-08-07 07:40:35

I've added a relationship to bug 12837 as these issues may be related based on some experiences with a customer I'm working with. For now I will continue monitoring these issues and will update you should I make any progress.
By: Leif Madsen (lmadsen) 2008-08-07 07:41:32

Changing to 'Ready for Testing' as there are patches here to be tried and feedback on them required.
By: FACORAT Fabrice (dark_schneider971) 2008-08-07 08:27:09

I have also the same issue ( asterisk 1.4.21.1, 64bits, SIP calls ) for all recorded wav.
I'm ready to test a SVN revision to see if this is fixed ( however this box is used in production, so I hope it won't be broken too much ).

IMHO having all recordings too fast is worst than just having a out of sync issue with ringing in the recording ...
By: Alex Barnes (alexb_uk) 2008-08-07 14:41:47

I hope this doesnt sound antagonistic but I dont understand why this bug has been marked "Ready for Testing".

I have tried applying both of the two patches that putnopvut uploaded to a live server and it hasn't fixed the speeded up recording. I may have misunderstood putnopvut's comments but I thought the other patch uploaded by dioedu wasn't going to help.

Kind Regards

Alex
By: Leif Madsen (lmadsen) 2008-08-07 16:55:20

It is marked ready for testing because there are patches attached to this bug to be tested -- there is no guarantee patches will fix anything, but rather it means feedback as to what, if anything changed, is required.

This will be changed back to an appropriate status should this patch no resolve any of the issues presented here. At this point I am looking for some additional testers to verify if the patches resolve issues for anyone experiencing similar problems.

Thanks!
Leif.
By: Steve Davies . (stevedavies) 2008-08-20 02:56:56

Hi,
With my customers, I'd like to try these fixes, but I'm not sure which of the attached patches should be applied?
Steve
By: Alex Barnes (alexb_uk) 2008-08-22 09:21:48

Steve, the ones I applied were 13005.patch followed by 13005_try2.patch.

I have not tested audiohook.patch as I understood from the comments that this was included in 13005.patch, but I could be mistaken.
By: Mark Michelson (mmichelson) 2008-08-22 09:36:13

To alleviate confusion, I have uploaded 13005_final.patch. This is the only one that needs to be applied. The patch was created against the 1.4.21.2 tag. If you attempt to apply this to the 1.4 svn branch, you may encounter a slight problem since one part of the patch has already been committed.

There also is a slight change from the previous patches that was made after I had a discussion with file about this issue. So if any of you were not successful with the previous patches, please give 13005_final.patch a try.
By: c0rnoTa (c0rnota) 2008-08-30 07:46:02

I'll try 13005_final.patch according issue# 0013116 (there are some things about direct connection of this issues).
By: FACORAT Fabrice (dark_schneider971) 2008-09-12 05:37:06

I've tried 13005_final.patch with asterisk-1.4.21.2, and unfortunately this doesn't solve the speed recording issue. From users reports, things have improved only slightly.
This is asterisk on a 64bits box, running Mandriva Linux, SIP calls only.
By: Diogenes Eduardo Makibara (dioedu) 2008-09-15 15:14:06

In my case, i don't have any problem with MixMonitor in my dialplan. But when the call was monitor by the app_queue, the problem persist.
I know that the routine used in app_queue is the same of the dialplan MixMonitor call. Weird.
By: Sigitas (signer) 2008-09-26 07:02:34

I have tried 13005_final.patch for 1.4.21.2 but it didn't help. Fast recording speed problem occurs when call is received from b410p mISDN channel and bridged to SIP channel. I am running asterisk on Centos-5 (2.6.18-53.1.4.el5)
By: Alex Barnes (alexb_uk) 2008-09-26 11:26:52

It appears that many people have tried the patch and confirmed it does not help.

Please can we have the status reset.

Kind Regards

Alex
By: Leif Madsen (lmadsen) 2008-10-02 09:57:27

Changing status back to acknowledged as the patches here do not seem to help.
By: Sigitas (signer) 2008-10-02 11:02:56

I have tried asterisk 1.4.22-rc5 and this problem exists there as well.
By: Damian Adamski (sh0t) 2008-10-03 09:02:11

Hello, we are using asterisk 1.4.22, sip-sip calls with mixmonitor, and we have the same problems.. Some strange speedups, and sometimes recorded call file is a couple of seconds shorter than it should be (comparing to billsec from cdr)

It looks like sometimes mixmonitor doesn't record silence on both call legs.
For now I have to use 1.4.11 or something old like that to make it work.

By: Tony Russell (tonils) 2008-10-07 02:51:06

I have just upgraded to dahdi-linux-complete-2.0.0+2.0.0, asterisk-1.4.22 and mISDN-1_1_8 (for the B410P) and am still experiencing the problem with MixMonitor. Call recordings made with Monitor are OK.
By: Mark Michelson (mmichelson) 2008-10-07 15:03:44

I have finally reproduced this issue locally. This should make it much easier to locate the problem and get a fix in. I will update when I have pinpointed the cause.
By: Mark Michelson (mmichelson) 2008-10-08 10:07:41

Here's a detailed explanation of why this problem is occurring. It's going to require a bit of background about how audiohooks work in Asterisk, so bear with me here.

An audiohook contains two buffers where slinear samples are stored. These buffers are referred to as "slinfactories." One slinfactory is for audio read from the channel and the other is used for audio written to the channel. Each time audio is written into one of the slinfactories, the audiohook triggers a signal which tells MixMonitor to read from the factories. This read operation grabs 20 ms of audio from each factory (assuming that much is available in each slinfactory), mixes the audio, and applies any volume adjustments necessary. Then MixMonitor takes the audio and writes it to whatever file the recording is stored in.

Issue ASTERISK-11719 was opened because if one of the slinfactories was to build up many samples of audio before the other were to get any, then when MixMonitor was able to read 20 ms of audio from both slinfactories, the audio would be out of sync. The fix for this was to add logic when writing to a slinfactory such that if the time between now and the previous write to that factory is greater than the number of milliseconds of audio currently available in the other slinfactory, then all of the audio is removed from both slinfactories prior to writing the new audio in. These flushed samples are what is missing from the recording and causing it to sound "fast."

So why does this happen for some people (especially those with B410P's) and not others? Well, in the case of the B410P, we get somewhat "odd" packetization of audio. In my case, one of the slinfactories was fed a constant input of 20ms of samples, whereas the other would usually get 12 or 16 ms of audio. This meant that whenever we would write to the slinfactory getting 20ms of audio every time, this meant that the check I mentioned in the previous paragraph would cause the audio in both slinfactories to get flushed nearly every time. This is because in the 20 ms between writes to the constant slinfactory, we would see only 12 or 16 ms of audio in the other factory most of the time.

While I agree with the logic introduced in ASTERISK-11719 to some extent, the problem appears to be that it is too intolerant of small differences between the slinfactories and that a tolerance factor needs to be added to account for this. This is my plan for fixing the issue. I will do some local testing and when I have a patch ready for general testing, I will post it here.

Thanks for your patience on this one.
By: Leif Madsen (lmadsen) 2008-10-08 10:16:13

putnopvut: Now that you have seemed to narrow down to the why -- is this issue still related to 12837? Will a fix here positively affect that issue? Thanks!
By: Mark Michelson (mmichelson) 2008-10-08 14:24:19

blitzrage: I don't think this is related to issue 12837, since chanspy does not have a sync trigger on the audiohook in use, so this code path would not have been followed for it.

I have uploaded 13005_tolerance.patch. In this, I have defined a tolerance of 100 ms (thanks very much to Steve Davies for the suggestion) so that the streams may get out of sync by a maximum of 100 ms before the audiohooks get flushed and resync.

I have tested this on my setup where I was reproducing the problem at will and to my knowledge, the problem does not exist any more. I would like confirmation from those who have been experiencing this themselves, though, before I commit the changes to the 1.4 branch.

Speaking of which, the patch I have uploaded was made against the latest 1.4 svn , but given that I can't think of anything audiohook-related which has been done since 1.4.22 was tagged, the patch should apply cleanly to the 1.4.22 tag as well. Please le me know how testing goes with this.
By: Tony Russell (tonils) 2008-10-09 01:46:46

I have implemented 13005_tolerance.patch on our internal Asterisk server (dahdi-linux-complete-2.0.0+2.0.0, asterisk-1.4.22 and mISDN-1_1_8) and the call recordings are no longer choppy.

I will implement the patch at our two clients that are experiencing the problems and report back here in about 24 hours
By: Damian Adamski (sh0t) 2008-10-09 03:47:06

Asterisk 1.4.22 with 13005_tolerance.patch
mixmonitored sip-sip calls

It looks OK for now, so this patch probably fixed our problems with recording.
Thanks
By: Alex Barnes (alexb_uk) 2008-10-09 06:10:37

putnopvut you're a star it appears inbound and outbound calls are now recording correctly.

Tested using B410P (UK BT Lines), Asterisk 1.4.22, Libpri 1.4.7 and Zaptel 1.4.12.1.

I will monitor the recordings for the next few days but fingers crossed it's all sorted.

Thanks

Alex
By: c0rnoTa (c0rnota) 2008-10-10 02:05:46

Many thanks for you, putnopvut!! 13005_tolerance.patch solves my problem!
By: FACORAT Fabrice (dark_schneider971) 2008-10-10 08:36:43

Ok, I have a good news, and a bad news.
Good news : the 13005_tolerance.patch solves the recording speed issue

bad news : if the call last a long time ( more than about 10 minutes ), then at the end the record will be again too fast. I guess that caches are more subject to be flushed after a long time because of ressources issues.

Where is coming the 100 ms value ? Maybe this value should be somewhat dynamic ? It could be calculated from the sample rate, sample size and the frequency of the record ? And if possible, the tolerance should be increase under CPU or memory pressure.

Nonetheless, thanks for the hard work : good job :)
By: Mark Michelson (mmichelson) 2008-10-10 09:09:33

Dark_Schneider971: If you don't mind doing so, could you set the debug level to at least 1 and get the calls to go fast again? When this starts happening, check the console to see if there are a lot of debug messages that say "Flushing audiohook <some hex value> so it remains in sync." If these messages are appearing, then my fix needs some work because it's still the same area causing trouble. If not, then something else is contributing after a long time and I'll need to see the console output to try to figure out what's going wrong. Since I now have a setup which could cause the issue to start with, I can also attempt to reproduce the problem again.

In answer to where the 100 ms value comes from, it was a suggestion made to me by Steve Davies, who said that if audio becomes out of sync by 100 ms, you won't really notice it. The idea of dynamically setting the tolerance based on the packetization of the media seems like it could be a good idea, but it may require more overhead than is worth using in order to do it, especially if we use the suggestion of dynamically increasing the tolerance under load.

By the way, if I'm going to try to make this happen here, it would help to know what technologies were being used when you got the recordings to speed up as well as what codecs were being used.
By: FACORAT Fabrice (dark_schneider971) 2008-10-10 09:16:46

Asterisk 1.4.22 running on Mandriva Linux.
SIP calls using Thomson ST2030 phones. Calls are recorded in wav :
exten => _0.,4,MixMonitor(Table_3/${CALLFILENAME}.wav).
We are using G711 codecs
By: FACORAT Fabrice (dark_schneider971) 2008-10-10 09:30:43

BTW, without having a dynamic value, at least a value based on the packetization of the media could have less perf issues, as this value would have to be calculated only 1 time : at the beginning of the recording/call.
By: Steve Davies . (stevedavies) 2008-10-10 10:38:17

Hi,
I pulled 100msec out of my ear as a suggestion. It shouldn't happen that the two slinfactory queues drift out of step to such a large amount; the underlying problem is just that the packetisation of the BRI and the SIP side are different so a small amount of difference - less than 20msec - occurs.

If they really HAVE drifted a lot then it needs to be fixed by flushing the buffers or similar otherwise the call recording will have questions being answered before they are asked.

Mark - another approach to keeping things in sync is actually to reclock the audio coming out of the slinfactories by factors which actually are calculated by the average rate that audio is arriving on each queue. So you might have a ratio of say 1:1.001 and so you'll skip one sample in a thousand from the one buffer and thereby keep them as in sync as is possible.

Steve
By: Mark Michelson (mmichelson) 2008-10-10 11:19:10

stevedavies: Ah, I hadn't realized where the 100ms suggestion had come from :)

Dark_Schneider: um, right now since the value is a constant, there is no calculation at all, so you would actually be introducing a very very minute performance decrease by making the tolerance dynamic.

Anyway, I don't consider this issue to be completely solved as of now given Dark_Schneider971's report of the problem still existing on long calls. On the setup where I could reproduce the problem, I applied the patch and ran a call for approx. 16 minutes and did not encounter the sped-up recording. I'll try more tests though and will report back if I encounter the problem myself.
By: Martin Vit (festr) 2008-10-13 16:05:44

confirming taht tolerance patch resolves recording issue. I've not tested long recordings.
By: Mark Michelson (mmichelson) 2008-10-14 17:46:34

Okay, so it appears that everyone so far has reported that the tolerance patch has worked for them. The only exception to the good reports is Dark_Schneider971's report that the problem exists still on long calls.

I think it would be appropriate, then, for me to commit the changes so that the problem will, for the most part, be gone. I won't close this issue, though, until the problem is fully resolved. This will require either Dark_Schneider971 or someone else who still experiences the issue to upload console output as I instructed in note ~93469.

Also, I realize I never remarked regarding stevedavies' suggestion regarding reclocking the audio based on packetization rates of the channels. The truth is, this should not be necessary. Raw samples are stored in the slinfactories, and we always pull 20 ms of audio from both. If one or both does not yet have 20ms of audio, we delay until the next time mixmonitor triggers to attempt reading audio from both again. Reclocking shouldn't be necessary since we pretty much have a built-in smoother for the audio. We can afford to do this since it's not vital that the mixmonitor recordings are kept exactly up-to-date based on the audio which has been passed through Asterisk.
By: Digium Subversion (svnbot) 2008-10-14 17:49:29

Repository: asterisk
Revision: 149204

U branches/1.4/include/asterisk/audiohook.h
U branches/1.4/main/audiohook.c

------------------------------------------------------------------------
r149204 | mmichelson | 2008-10-14 17:49:29 -0500 (Tue, 14 Oct 2008) | 12 lines

Add a tolerance period for sync-triggered audiohooks
so that if packetization of audio is close (but not equal)
we don't end up flushing the audiohooks over small
inconsistencies in synchronization.

Related to issue ASTERISK-12324, and solves the issue
for most people who were experiencing the problem.
However, a small number of people are still experiencing
the problem on long calls, so I am not closing
the issue yet

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=149204
By: Mark Michelson (mmichelson) 2008-10-14 17:53:48

Since progress on this issue is predicated on receiving new console output, I'm changing the status back to "feedback"
By: FACORAT Fabrice (dark_schneider971) 2008-10-15 05:23:06

After further investigations with my users, it seems that there are no issue with long call. The issue was coming on the client audio reader application side.
You can close the issue as fixed. Thanks for your hard work.
By: Leif Madsen (lmadsen) 2008-10-15 08:16:48

Thanks to everyone who helped on this issue! You're all getting karma for helping test :)
By: Mark Michelson (mmichelson) 2008-10-15 09:39:33

Okay, the latest note here leads me to believe that this should be closed, so that's what I'm doing. The 13005_tolerance.patch patch has already been committed to the 1.4 branch.