[Home]

Summary:ASTERISK-07907: [patch] G.711 codec woes
Reporter:fossil (fossil)Labels:
Date Opened:2006-10-09 20:21:04Date Closed:2007-08-22 09:11:50
Priority:MinorRegression?No
Status:Closed/CompleteComponents:Core/CodecInterface
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 8126.complete.patch3
( 1) 8126.finalpatch-1.4
( 2) 8126.finalpatch-trunk
( 3) 8126-1.4.patch
( 4) 8126-1.4-patch1
( 5) a-u-Law-fails.txt
( 6) g711-complete.patch
( 7) g711-complete-v2.patch
( 8) g711-minimum.patch
( 9) g711-recommend.patch
(10) g711-tests.patch
Description:There is a *number* of problems in the a-law and u-law core transcoders (most severe first):

1. a-Law decoder does not add the rounding error to the linear samples output;
This results in a stable amplitude drop in the decoded signal overall, but the negative phase portion of the signal is even more adversely affected: the amplitude drop actually accumulates with consequtive transcodings (see attached test patch). If the call encounters 127 tandem a-law transcodings (a-alaw -> slin -> a-law -> slin -> ...), the entire negative portion will be reduced to ~0.

2. Lookup table-driven slin->law coding rounds the negative values the wrong way;
The breaks in linear value sequences do not happen where the table-driven slin->law system expect them to. This results in certain negative linear values to be encoded incorrectly (see attached test patch), which isn't such a *big* problem, but a problem nonetheless.
There is no one-liner fix for this issue. To fix this, for example, we could generate only half the slin->law table, for positive values only. This table would contain half-cooked law bytes, so that the sign could be added later to the values, along with the post-coding transform (NOT for u-law and XOR 0x55 for a-law). In this case, AST_LIN2MU() would look something like this:

inline unsigned char AST_LIN2MU(short sample)
{
 unsigned sign = ((unsigned)sample & 0x8000) >> 8;
 unsigned char law = __ast_lin2mu[(sample & 0x7fff) >> 2];
 return ~(law | sign);
}

3. slin->a-law and slin->u-law functions handle value -32768 incorrectly;
This is not really a problem when using a lookup table system because the slot of -32768 is overwritten later, but for the sake of correctness...

4. alaw.c:linear2alaw() is less than optimal;

5. slin->law lookup table generation code is less than optimal;
There is no reason to enumerate all the possible values between -32768 and 32767 when most of the results are overwritten later.


****** ADDITIONAL INFORMATION ******

Attached is a patch to test the x-law coding tables. Build asterisk with CFLAGS='-DTEST_CODING_TABLES -DTEST_TANDEM_TRANSCODING' to run the tests.
Comments:By: fossil (fossil) 2006-10-09 20:28:58

g711-minimum.patch addresses the most severe issue only: (1) a-Law decoder errors.

g711-recommend.patch addresses issues 1, 3, 4 and 5. In addition, it contains the test code from g711-tests.patch, and minor comment and code style fixes.

These patches will most likely apply cleanly to trunk/main as well.

By: jmls (jmls) 2006-11-12 02:48:53.000-0600

ping. housekeeping

By: fossil (fossil) 2006-11-12 14:19:32.000-0600

pong. "Can't you see that DO NOT DISTURB sign?!" :P

By: Olle Johansson (oej) 2007-01-23 02:56:00.000-0600

Which developer takes this on? We need to move forward with this.

By: Olle Johansson (oej) 2007-05-04 08:22:20

This issue really needs a developer who understands codecs. Ping, ping, ping.

By: Steve Murphy (murf) 2007-08-02 16:57:49

Hey, fossil-- Can you do the "license" thing? We won't even be be able to look at the stuff until you do. Shouldn't take but a moment. I had to do it myself. It might take a day or so to clear, tho.

Many thanks!

By: fossil (fossil) 2007-08-02 17:18:27

Ok, I've signed *this* license, and I've had the disclaimer on file for 4 years, IIRC. I guess I'll wait another 10 months, and probably sign the license agreement v4 at that time. Have fun.

By: Steve Murphy (murf) 2007-08-03 12:44:18

fossil-- First of all, many thanks for this submission. It's always neat to have solid improvements made to fundamental algorithms.

Next, sorry for the delay; last I measured, I estimated about 200 bug submissions per month, and we estimated about 40% of those are enhancements. 1.4 has been a focus for several months, trying to improve its quality. Somehow, your bug report has been lost in the "sea"; I'm happy to have bumped into it. I'll personally see it thru making it into the source. I'd like to have the "blessing" of at least one guy who is involved in codecs, and I'll run some tests myself, which hopefully will not take long.

Many thanks!

ten months? v4 should be out in six! (just kidding!)

By: Steve Murphy (murf) 2007-08-03 12:47:03

just patched this into 1.4, and got the messages in a-u-Law-fails.txt, which I've attached... perhaps I've patched wrongly?

By: Steve Murphy (murf) 2007-08-03 12:59:09

Ok, I've also attached the changes I made to 1.4; It looks like your patch; I did add 4 lines: 4 ast_log messages letting you know when the tests were finished, in case no errs were found. (just a temporary hack).

By: Steve Murphy (murf) 2007-08-03 13:06:32

fossil--

Just to demonstrate my ignorance, I've been looking at the code patterns,
and I see this:

+ if (sample < 0) {
+ mag = -sample;
+ sign = 0;
+ } else {
+ mag = sample;
+ sign = 0x80;
+ }

In other places, I see, when sample<0, sign= 0x80, for positive, sign=0, but here it's the opposite. I assume this is ok... but I scratch my head.

By: fossil (fossil) 2007-08-03 14:36:59

murf, a-Law and u-Law have their sign bits reversed from each other, so this is normal. The error that you see are the result of issue (2) which I did not address in any of the previous patches. The last g711-complete.patch addresses all the issues and should provide 100% correct transcoding.

The last version may look slower than the original blind table lookups, and it probably is slower on some systems indeed, but it is correct. This version also shrinks the linear->law tables to 1/4 of their original size, which puts a lot less strain on CPU's level 1 cache, so it is probably faster on some systems too (I did not run any real benchmarks).

You can enable the #define G711_REDUCED_BRANCHING for systems with expensive branching (put it in ASTCFLAGS if you like), but when building with -march=i686 it is not necessary because of gcc's use of wonderful CMOV opcodes.

By: Steve Murphy (murf) 2007-08-06 11:39:47

OK, I've uploaded 8126.finalpatch-1.4 and 8126.finalpatch-trunk for 1.4 and trunk, for those who would like to test on these versions.

They aren't really "final", in that the Makefile will NOT be set up to compile these in test mode by default. This is just for the sake of testing.

Looks good to me, and takes care of all the points in your list of problems!

(Although, the probability of having a situation that would require 256 transcodings (or more) that would demonstrate the transcoding deficiencies are pretty low, methinks, but still... having the codecs be lossless will make them of higher quality.)

By: Steve Murphy (murf) 2007-08-08 18:58:07

OK, I did a "core show translation" between the normal version of u/a law.c, and the "new" version of u/a law.c, and in both cases, the translation times were 1 msec for 1 sec of data; both ulaw and alaw, to and from slin; This is maybe a poor measure of performance, but at least in this case, it shows that the new algorithm is roughly the same speed as the old one. It's still amongst the fastest of the codecs.

By: fossil (fossil) 2007-08-08 23:53:19

"show translation" would have to be chaged to make it capable of displaying values  less than 1 (floating point perhaps?) in order to get a more accurate measure. As computers get faster we will need this anyway for other codecs too, or pretty soon they will all be showing "1", except for SpeeX ;-) But this is a separate bug item, of course.
And yes, u-Law and a-Law will always be the fastest codecs -- they are merely logarithmic representations of linear samples.

By: Steve Murphy (murf) 2007-08-09 13:01:00

That last benchmark was just not good enough! So I did another. This time,
I went into translate.c, and bumped the number of samples from 1 sec (8000 samples) to 100 sec (800,000 samples), so I get two digits of accuracy in the total.

I found:

Old algorithm:

ulaw -> slin = .08 msec / sec
alaw -> slin = .08 msec / sec
slin -> ulaw = .05 msec / sec
slin -> alaw = .06 msec / sec

New algorithm:

ulaw -> slin = .08 msec / sec
alaw -> slin = .08 msec / sec
slin -> ulaw = .20 msec / sec
slin -> alaw = .19 msec / sec

So, the new algorithm slows down translation into ulaw by a factor of 4x,
and into ulaw, by factor of 3x.

But--- it's still pretty fast, comparing to other algorithm costs.

By: fossil (fossil) 2007-08-10 01:19:08

This last benchmark mostly shows how inaccurate our translation times estimates are ;-). Seriously, ulaw->slin cannot be slower than slin->ulaw, yet the benchmark suggests that it is.
The inaccuracy also comes from the sample data used in the translation estimates -- both ulaw_slin_ex.h and slin_ulaw_ex.h contain sample data that consists of only zeros. This means that the size of the x-law translation tables is not taken into account in the estimates, since it only ever needs to lookup one value over and over. If you like, we can roll this as issue (6) right here. Sample data for some other codecs is also like this, and I bet it affects the result in many cases, but that is a separate issue.

By: fossil (fossil) 2007-08-10 04:34:48

I have repeated the benchmark experiment above on my system, with some changes.
System: Celeron 2.53GHz, 256K cache, gcc 4.1.2
Changes to get more accuracy:
1) in translate.c
MAX_RECALC = 1000
ast_translator_dir.cost changed to *micro*seconds
2) ulaw_slin_ex.h
table changed to contain 100 ms worth of assortment of values in 0x00 - 0xff
3) slin_ulaw_ex.h
copied half the table from slin_speex_ex (250 ms of data); this provides a better spread of memory accesses to the __ast_lin2X tables
4) in codecs.conf, [plc] section
genericplc => false (to prevent cost skewing due to extra data copying)

Tests ran with "show translation recalc 300".

Old algos:
 law -> slin = 26 usec/sec
 slin -> u-law = 21 usec/sec (slower than a-law: table is bigger)
 slin -> a-law = 19 usec/sec
(I have to rethink my earlier "ulaw->slin cannot be slower than slin->ulaw" comment. Clearly, the extra PLC branches in the law->slin translators are having an effect here.)

New algos:
 law -> slin = 26 usec/sec (unchanged, as expected)
 slin -> u-law = 74 usec/sec (several runs, lowest value)
 slin -> a-law = 78 usec/sec (why slower than u-law -- you got me)
(For comparison, with original all-zero sample data, slin->law is 54 usec/sec)

New algos with G711_REDUCED_BRANCHING:
 slin -> law = 47 usec/sec (several runs, 46-47)

Obviously, the new algos result in quite a bit more processing per linear sample (if not obvious, take my word for it, I've compared the asm code), but it looks like the reduction in level 1 cache pressure makes itself known on my system. Clearly, G711_REDUCED_BRANCHING provides a visible improvement with my builds -- it's only double the original time. I've checked, and for some reason gcc is not using the CMOV opcodes to eliminate the branches in the codec_xlaw.so, possibly because of the register pressure. You may get pretty different results with gcc 3.x.
This benchmark is still flawed, however, since we would not transcode this much data in one go during normal processing. To get better results, we need to flush the CPU cache after every 20 msecs of data or so to simulate normal operating conditions. However, I fear this is going too far. ;-)

For reference, the whole slin -> XXX translation line (all in microsecs):
        g723   gsm  ulaw  alaw  g726 adpcm  slin lpc10  g729 speex  ilbc
  slin     -  2142    47    46  1340   248     -  4756 17277     - 15161

And just for the completeness sake, I eliminated the slin->law translation tables and used the linear2Xlaw functions directly:
 slin -> u-law = 142 usec/sec
 slin -> a-law = 121 usec/sec
And with G711_REDUCED_BRANCHING:
 slin -> u-law = 116 usec/sec
 slin -> a-law = 108 usec/sec

By: fossil (fossil) 2007-08-13 13:03:50

Complete patch v2: I reworked the ast_Xlaw_get_sign_mag functions slightly. Their new form helps gcc eliminate the branch using the CMOV opcodes (verified in codec_Xlaw.so). Benchmark results, using the same premises as the previous ones, are below:

 slin -> law = 45 usec/sec

By: Steve Murphy (murf) 2007-08-18 17:08:41

OK, I've done this: Matt Fredrickson felt that the new algorithm should be a compile time option, especially since it may negatively impact the speed of the codec. So, I have made it so. The new and old algorithms coexist in the same files wrapped in #ifdefs to allow them to be compiled either way.

I have added the entries to the menuselect stuff, so you can turn on and off the options. I did this in trunk, which is a clear, no-brainer type solution. 1.4 is a little trickier. By default, the New Algorithm is turned off.  I don't know if it will be all right to add the same options to 1.4's menuselect. I'll ask Monday.

Now, I've done some more timing and comparing based on your last patch.

I see this

ulaw > slin  137    82/100    125/103/99    82/84/85
alaw > slin  83     82/82     82/95/95      82/111/81/300

slin > ulaw  59     64/65     203/202/243   202/203/287
slin > alaw  59     62        242/295/236   223/223/311
            OLD-A   OLD-B    NEW-noreduce   NEW-reducedbran


OLD-A is with the test using just zeros. OLD-B is the more realistic
test with values that vary across the test range. My patch includes the
test input files so modified.

Since the REDUCED_BRANCHING option seems to yield, on the average, faster
times, I make it the default. There's a bit of variance in the results, but
it seems to average out.

I've also included the changes I made to the translate.c stuff to make it show microseconds instead of milliseconds, and display 5 digits instead of 3.
I see that the new converter -> u/alaw is 3.1 to 3.6 x slower, using best times.
This seems really bad, but it's still very fast (in comparison to other converters), and it's more complete and correct. When multiple transcodings are possible, this algorithm should better results.

Please, double check my code and let me know if it's all correct. I am slightly concerned about having the table/transcode tests available in both old and new algorithms. (but, then, it would just say that the old has problems, and the new doesn't, wouldn't it?) So, for the moment, I only allow the tests if you select the new algorithm.

By: Steve Murphy (murf) 2007-08-20 11:55:05

There was a bug in the menuselect stuff; it wouldn't allow dependent options to go to the makeopts, so I could turn on the NEW_ALGORITHM, but not the reduced branching, etc. Russell has fixed this, so if anybody out there is trying this, try svn update and then the make menuselect should work for the next build.

I feel like this code is ready for trunk. The real question is, do I change the menuselect stuff in 1.4 to make this an option? Or make it so someone desparate can hand set the options in the source before the compile there (so as not to make any big, visible changes to 1.4)?

By: fossil (fossil) 2007-08-20 16:49:37

I honestly cannot believe anyone would willingly choose a broken but faster implementation of something as trivial as G.711, but if you must go the #ifdef way, all right then, let's at least merge the versions in a better way (and I promise to try to contain my rants re: correctness vs. speed). I will merge the code the way I think it should be and submit. If, however, you insist on merging the way you did it, please at least apply the fix for issue (1) to the original version.

In general, most of the issues raised here and fixed in the patch do not impact the codec speed at all. The exception is issue (2), the fix for which apparently slows down the slin->law translation. Taking this into account, we can merge the two versions quite elegantly, with minimal #ifdef sections.

I would turn on the tests in both versions, but in modified form -- type and amount of output controlled by verbose/debug options. I never really meant the tests to be used in their current form on production systems. The tests should *theoretically* be used once -- to confirm correctness, and then never again. Seeing coding tables warnings in the logs may encourage users to switch to the new version ;-).

As for menuselect stuff, I cannot weigh in, do whatever you feel is right. I would not turn on REDUCED_BRANCHING by default, however, because it ends up being slower on newer systems with newer gcc (as indicated by my system).

By: fossil (fossil) 2007-08-20 17:03:12

Since we are apparently ditching v1.2, and I have little hope of getting this patch into 1.2 branch, we should probably change the "SVN Branch" to something appropriate so that others affected by these patches notice the bug.

By: Steve Murphy (murf) 2007-08-20 17:40:27

This code merged into trunk via 80113. In a conf call with other developers, we decided that the impact on the current user crowd was minimal enough, that we are not going to bother putting this in 1.4; but rather, if a 1.4 user needs this, I have just attached a patch for 1.4, that could be applied (perhaps 1.2 also).

But an interesting question was asked, and perhaps fossil can supply an answer: How many transcodings will it take to make a discernable difference in sound quality? Before you can hear the difference? If the number is low enough, we might change our minds about 1.4.

By: Steve Murphy (murf) 2007-08-20 17:45:07

Fossil--

I'm closing this bug for now. Please, feel free, if it's not too much trouble for you, to add a note to this bug, and tell us what your estimate of the number of times sound would be passed thru the 'old' (current) ulaw/alaw codecs before an average human would be able to detect the degradation. If it's low enough, we'll port your changes to 1.4.

Many thanks for your hard work! I'm going to make sure you get some karma for this!

By: fossil (fossil) 2007-08-20 19:05:35

Reopening only to add a note.

The tandem transcoding degradation is only present in a-law -- issue (1). The other issues are independent of this. Issue (1) is fixed by a trivial one-liner patch -- g711-minimum.patch. This patch cannot possibly make anything worse, only better, just to let you know. It may even deserve a bug of its own.

Now to the effects: audibility is highly subjective, but I think the distortion would be audible to an average user after 4-6 tandem transcodings (alaw -> slin -> alaw), depending on the original signal quality. Recall that the coding is logarithmic -- mantissa + exponent, and the problem is such that during decoding the a-law byte is effectively reduced by 1 every time. This means that the amplitude of signal's negative portion is halved every 8 tandem transcodings. That is a pretty nasty entropy ;-).
Also, in my experience, depending on the strength of the original signal, just 2 tandem transcodings can prohibit faxing over a-law -- the accumulated distortion creates harmonics that screw up DSPs.
My 2 cents.

By: fossil (fossil) 2007-08-21 02:07:05

I am now not completely sure if I made this clear, so please let me reiterate.

Issue (1): a-Law decoder is broken, meaning not up to G.711 spec, meaning it has a bug.
g711-minimum.patch trivially fixes the worst badness, which not coincidentally brings a-Law decoder up to G.711 spec (and *decoder* only, the encoder is still broken, but a different issue).

By: Steve Murphy (murf) 2007-08-21 11:29:16

Fossil: gotcha. I've merged in the "minimal" fix into 1.4; This is clearly a bug. Sorry, I lost track of that. It's now committed in 1.4 via 80166 & 80167 (added changes to ulaw.h and alaw.h by accident, and then removed them again via 80167).

By: Steve Murphy (murf) 2007-08-21 11:32:13

PS. I'd make this patch to 1.2, but it's closed to anything but security fixes.

By: Steve Murphy (murf) 2007-08-21 13:31:23

OK, closing this again... Anyone is welcome to re-open it to comment!

By: fossil (fossil) 2007-08-21 13:52:25

All right, we are getting somewhere! ;) Now since we have two versions of the G.711 codecs in trunk, we need to apply the same g711-minimum.patch to trunk to fix the old code. I guess I have no choice but to let it go after this. I could still produce a much better merge of two versions, if you want -- the differences between versions are actually minimal.

It's a shame about v1.2, especially since I know a lot of people who are using 1.2 in production, including myself. I'll just have to maintain this patch in my Asterisk Survival Kit with others.

In retrospect, I think I should have opened two different bugitems.
Thanks for your help!

By: Steve Murphy (murf) 2007-08-22 09:11:47

In the immortal words of Homer Simpson (TV cartoon hero):  Doh!

OK, the rounding fix has been added to the 'old' algorithm in trunk via 80241.

I'n not too concerned about minimizing the diffs between the two;

It IS a shame about 1.2; I very, very much wish I'd spotted this bug a little earlier; might have been able to slip it in before the deadline.

Again, thanks for the contribution!