Summary:ASTERISK-07187: [patch] pointer problem in code2str() function
Reporter:Frederic LE FOLL (flefoll)Labels:
Date Opened:2006-06-16 08:59:55Date Closed:2006-07-07 10:37:37
Versions:Frequency of
Environment:Attachments:( 0) bug-static.txt
( 1) patch_cheat_chan_zap.c.txt
( 2) patch_static_q931.c_branches_1.2_rev347.txt
( 3) patch_static_q931.c_trunk_rev352.txt
Description:I have a weird problem with code2str() function. I'm not sure it is really "minor" or not.

Preliminary : code2str() is used for plenty of debug function in order to display text associated to integer codes, according to msgtype structures passed as a parameter.

Observation : code2str() works fine, except when called from pri_cause2str() : it seems to always display "unknown", whatever cause code is passed in parameters.

> Message type: RELEASE COMPLETE (90)
> [08 02 87 90]
> Cause (len= 4) [ Ext: 1  Coding: CCITT (ITU) standard (0) 0: 0   Location: International network (7)
>                  Ext: 1  Cause: Unknown (16), class = Normal Event (1) ]

Debug : I changed a little bit code2str() so that it displays the list of "legal" codes when the input code matches nothing, and I get a somewhat frightening (at first sight, at least) result :

> Message type: RELEASE COMPLETE (90)
> [08 02 87 90]
> Cause (len= 4) [ Ext: 1  Coding: CCITT (ITU) standard (0) 0: 0   Location: International network (7)
>                  Ext: 1  Cause: code2str(code=16,msgtype*,max=44) code not in ( 1 17 28 41 52 69 98 111 0 0 1953720679 26723 134612576 134612576 1685021535 0 1 0 1600939374 1684300127 1970339947 134620386 1868918121 1702125923 1936029041 1702195571 1702195571 134631925 0 134635949 1685221222 1701733735 6516588 1633645683 1601467233 0 29295 544108393 1680154726 1953785203 2957344 1634890847 1634890847 1818845510  ) (16), class = Normal Event (1) ]

Ooops : "1, 17, 28, 41, ..." : the 'for' loop runs 44 times (correct) but goes way too fast in the array ... and beyond !

What differs from other code2str() calls that work fine ? Most often, msgtype structure is declared locally static in the function that calls code2str(). I notice that pri_cause2str() *AND* FUNC_DUMP(dump_network_spec_fac) are two exceptions, because they reference msgtype structures that are declared global in q931.c.

Workaround : I get a correct cause display if I add a 'static' ahead of 'struct msgtype causes[]' and 'struct msgtype facilities[]' declarations. But I have no idea, whether this problem may affect other parts of the code.

Environment :
CentOS 4.2, kernel 2.6.9-22.ELsmp
Asterisk libpri 1.2.3 with their original Makefiles.
Comments:By: Frederic LE FOLL (flefoll) 2006-06-19 05:02:17

More tests since friday. They all show that the binary code behaves quite insane, compared to what I expect it should do when I read libpri C source code !

By the way, I declared the bug in "libpri" category, because this is where I saw the bug, as far as I understand what I see (... ?), I find no problem in libpri C source code. Is it a problem in compiling/linking options from miscellaneous Asterisk/Libpri Makefiles ?

In order to do more tests on more platforms, I introduced a "cheat code" in chan_zap.c (for debug only !), so that one can enter a CLI command "pri debug span 1xxx" to display the result of pri_cause2str(xxx) : patch attached.

Tests on all platforms below show the same problem with, for instance, pri_cause2str(16) returning "Unknown" instead of "Normal Clearing" (the REAL PROBLEM being that the processor reads data far beyond the end of legal area !) :

CentOS 4.2 on an Intel P4 2.8 GHz, kernel 2.6.9-22.ELsmp, gcc-3.4.4-2
FC4 on an AMD XP 2800+, kernel 2.6.16-1.2111_FC4, gcc-4.0.2-8.fc4
FC5 on an Intel P3 Mobile, kernel 2.6.16-1.2111_FC5, gcc-4... (recent update)
Gentoo on a Intel P4 2.8 GHz, kernel 2.6.16-gentoo-r2, gcc 3.4.6

By: Frederic LE FOLL (flefoll) 2006-06-23 02:09:05

I found the problemS :

1 : Libpri q931.c *and* Asterisk channel.c both contain a global "causes" array declaration.
- causes was already declared in q931.c in 2001 (Libpri / trunk : rev 9)
- causes was introduced in channel.c in July 2005 (Asterisk / trunk : rev 6024)

2 : Unluckily, channel.c's causes array is very similar to q931.c's causes, since they both are arrays of structures that associate a string to a cause code (it looks like a copy/paste), but structures are *not* the same :
- q931.c : struct msgtype {int msgnum; char *name; int mandies[MAX_MAND_IES];}
- channel.c : struct ast_cause {int cause; const char *desc;}

3 : gcc does not notify any conflict when compiling and linking Asterisk, and what happens is :
- If both causes are declared global, Asterisk application's causes overrides Libpri library's causes, and Libpri finally work on Asterisk's version of causes (kind of OO overriding ???), but is not aware of structures mismatch.
- If Libpri's causes is declared static, Libpri works on its own local version of causes.

4 : It probably is not a good idea to declare global variables in a library, except if this is really what we want. Above all when these variables have very generic names (a rule could be to name libpri_<something> all variables that really have to be global).

I will propose a patch to declare "static" Libpri variables that don't have to be global.

By: Frederic LE FOLL (flefoll) 2006-06-26 04:26:53

patch_static_q931.c_branches_1.2_rev347.txt is for current branches/1.2 head.
patch_static_q931.c_trunk_rev352.txt is for current trunk head.

Both declare static causes, facilities, ies, and msgs arrays that formerly were global. No change is needed in other files of libpri.

Of course, don't apply patch_cheat_chan_zap.c.txt anywhere in SVN, since it was just a temporary change suggested for investigating the problem.

Another additional change could be to use a more optimal structure to store simple code/string associations, and keep "heavy" msgtype structure (int, char *, int[10]) dedicated to msgs array. What's your opinion ?

By: Matthew Fredrickson (mattf) 2006-07-07 10:37:22

Put into trunk and 1.2