Summary:ASTERISK-10480: double call to ast_frame_free
Reporter:Diego Iastrubni (diegoias)Labels:
Date Opened:2007-10-09 10:38:00Date Closed:2007-11-06 17:13:32.000-0600
Versions:Frequency of
Description:The scenario is easy to reproduce here, execute from the shell this command:

for i in `seq 60`; do asterisk -rx 'originate Local/19@from-internal-custom extension 21@from-internal-custom'; echo -n "$i "; done

The dialplan has 2 extensions:
;---- 19 - call the SIPp channel
exten => 19,1,Answer
exten => 19,2,Dial(SIP/100@turing:5061)

;---- 21 - looped music for an hour
exten => 21,1,Answer
exten => 21,2,Set(WAIT_TIME=7)
exten => 21,3,GotoIf($[${WAIT_TIME} < 2]?7)
exten => 21,4,PlayBack(/music-8khz-10min)
exten => 21,5,Set(WAIT_TIME=$[${WAIT_TIME} - 1])
exten => 21,6,Goto(3)
exten => 21,7,Hangup

in "turing", we have a SIPp server running, with this command:
./sipp -sn uas -p  5061  -d 100 -mp 12000  -l 100 -mi $IP -i $IP -rtp_echo

and /music-8khz-10min is a 10 minutes wav file (16 bits, 8khz).

In this scenario, asterisk dies, and gdb gives me this trace from the core dump:
(gdb) bt
#0  0x00c7382c in free () from /lib/libc.so.6
#1  0x0809f09a in ast_frame_free (fr=0xd45ff4, cache=1) at frame.c:360
#2  0x08086e0a in ast_channel_bridge (c0=0x98a2fc0, c1=0x98a2fc0, config=0xb78cdda0, fo=0xb78ccf18, rc=0xb78ccf14)
   at /root/bristuff-0.4.0-test4-xr3/asterisk-
#3  0x0059a08d in ast_bridge_call (chan=0x98a2fc0, peer=0x98b2ff0, config=0xb78cdda0) at res_features.c:1404
#4  0x00584791 in dial_exec_full (chan=0x98a2fc0, data=<value optimized out>, peerflags=0xb78cde64, continue_exec=0x0) at app_dial.c:1656
ASTERISK-1  0x00586b02 in dial_exec (chan=0x98a2fc0, data=0xb78d1f38) at app_dial.c:1710
ASTERISK-2  0x080c751e in pbx_extension_helper (c=0x98a2fc0, con=0x0, context=0x98a3140 "from-internal-custom", exten=0x98a3190 "19", priority=2, label=0x0,
   callerid=0x0, action=E_SPAWN) at pbx.c:532
ASTERISK-3  0x080c9206 in __ast_pbx_run (c=0x98a2fc0) at pbx.c:2288
ASTERISK-4  0x080ca1ae in pbx_thread (data=0x98a2fc0) at pbx.c:2603
ASTERISK-5  0x080f616b in dummy_start (data=0x9872fa8) at utils.c:793
ASTERISK-6 0x00d8016a in start_thread () from /lib/libpthread.so.0
ASTERISK-7 0xb78d4480 in ?? ()
ASTERISK-8 0xb78d4480 in ?? ()
ASTERISK-9 0xb78d4480 in ?? ()
ASTERISK-10 0xb78d4480 in ?? ()
ASTERISK-11 0x00000000 in ?? ()


What I see is this:

ast_channel_bridge -> ast_generic_bridge -> __ast_read -> ast_frfree -> ast_frame_free

The problem seems to be (IMHO) somewhere in ast_generic_bridge, which tries to free the same frame twice. I was able to reproduce this when running "normally", but not when running under GDB, so I cannot continue the report :)

The system is Centos5, Elastix 0.8.5. Asterisk is Asterisk, self compiled.
Comments:By: Joshua C. Colp (jcolp) 2007-10-09 11:16:06

I've tried numerous ways to on my own system to reproduce this using exactly the information provided (thanks for that!) but have been unable, which means this bug may be timing critical. To that extent would it be possible to get access to the Asterisk box where this happens so I can add additional debug and track it down?

By: Dmitry Andrianov (dimas) 2007-10-09 14:16:44

have you tried running under valgrind? It will catch most of memory manipulation problems.

By: Tzafrir Cohen (tzafrir) 2007-10-10 09:29:39

file has connected and found some strange things. e.g.: the crash seems to have been triggered by a local channel bridged to itself.

I sometimes had to run the above command several times to reproduce the crash.

file: Anything else you need from us? Any thoughts on this?

I'll try valgrid as well later.

By: Tzafrir Cohen (tzafrir) 2007-10-11 04:29:19

Please let us know if any further feedback is needed.

By: Russell Bryant (russell) 2007-10-11 11:42:48

file: Did you happen to see where the frame in question came from?  i.e. Did you print f->src, and if so, did it have anything in it?

By: Diego Iastrubni (diegoias) 2007-10-15 09:21:33

I have upgraded the machine to 1.4.13 and still I can reproduce. I upgraded now to SVN branch/1.4/r85550. I can still reproduce the problem.

If you want me to add debug prints, you will have to guide me more carefully, since I am quite new to Asterisk code. Which function, which variable (patch would be even cooler).

If you want another peak at the machine, please contact Tzafrir he will let you in once more. I assume you know his email.

By: Diego Iastrubni (diegoias) 2007-10-17 03:23:20

I upgraded the source once more, and now Asterisk dies when generating a single FXS call.

Now the lame factor gets a new scale:
The original test was done on a VIA based machine. We decided to move the setup to an Intel based machine, and puf! Everything is working perfectly, and I do mean everything.

It seems that the trigger for this was the VIA based CPU, or gcc generating bad code for that platform (I don't think that you have any Intel only specific code in the source no...?).

How would you like me to continue this investigation?

By: Russell Bryant (russell) 2007-11-02 16:13:24

If you can't make it happen on the intel machine, then it was likely a via specific problem.  With those CPUs, I have heard people say that you have to make sure it doesn't use optimizations for anything above i386, or else things will not work properly.

On the via box, you can try changing the PROC variable in the makeopts file to i386, if it isnt already set to that.

By: Russell Bryant (russell) 2007-11-03 23:18:15

Also, the fix that I put in this branch may be related to this problem.  Give it a try ...

$ svn co http://svn.digium.com/svn/asterisk/team/russell/readq-1.4

By: Russell Bryant (russell) 2007-11-06 17:13:32.000-0600

I have merged in the changes from my readq branch.  There is a good chance that will fix this.  So, let me know if it happens again.  But for now, it's closed as fixed.