|Summary:||ASTERISK-10296: ooh323 heap corruption every 3 days.|
|Reporter:||P. Christeas (xrg)||Labels:|
|Date Opened:||2007-09-14 02:45:21||Date Closed:||2011-06-07 14:01:06|
|Environment:||Attachments:||( 0) asterisk-more-backtraces-no-opt.txt|
( 1) backtraces.txt
|Description:||I have been experiencing a crash of asterisk every 2-3 days (given my usage). Cores all suggest a heap corruption.|
The data I collected so far shouldn't be that useful, I know. But any suggestion you may have will be appreciated. Running with MALLOC_CHECK should be my last resort, since the machine is a production one.
****** ADDITIONAL INFORMATION ******
#0 0x00002aef39e2dd25 in raise () from /lib64/libc.so.6
#1 0x00002aef39e2f340 in abort () from /lib64/libc.so.6
#2 0x00002aef39e6410b in __gcc_personality_v0 () from /lib64/libc.so.6
#3 0x00002aef39e6b193 in __gcc_personality_v0 () from /lib64/libc.so.6
#4 0x00002aef39e6b314 in free () from /lib64/libc.so.6
ASTERISK-1 0x00002aaab42a123a in memHeapFreeAll ()
ASTERISK-2 0x00002aaab42a12cd in memHeapRelease ()
ASTERISK-3 0x00002aaab42996cc in freeContext () from /usr/lib64/asterisk/modules/chan_ooh323.so
ASTERISK-4 0x00002aaab42bd004 in ooCleanCall () from /usr/lib64/asterisk/modules/chan_ooh323.so
ASTERISK-5 0x00002aaab42bcb7a in ooEndCall () from /usr/lib64/asterisk/modules/chan_ooh323.so
ASTERISK-6 0x00002aaab42a38a6 in ooProcessFDSETsAndTimers ()
ASTERISK-7 0x00002aaab42a3aed in ooMonitorChannels ()
ASTERISK-8 0x00002aaab4291e01 in ooh323c_stack_thread ()
|Comments:||By: P. Christeas (xrg) 2007-09-27 02:29:43|
Up: for the last 5 days, asterisk hasn't crashed (yet).
There is two new conditions: I've been running asterisk with MALLOC_CHECK_=2
and my ISP's round trip times have drastically improved (from 150ms to 20 ms).
Malloc check means that I should have had an earlier crash, once the heap is about to get double-freed.
Both conditions, however, mean that timing wrt. opening/closing threads and unwanted timeouts have changed. I can still insist on the wild guess that the data corruption is associated to return paths of the functions.
By: Ezio Vernacotola (ezio) 2007-10-12 03:48:09
I have an almost immediate crash on an asterisk operating as pstn <-> voip gatewaway. Normally I have a full pri (30 channels) routed to sip and iax all goes well. When I begin to route another pri towards ooh323 the pbx crash after few seconds.
In this moment I can't reproduce all with a not optimized asterisk because can't stop the pbx anymore. I have only 2 backtraces of yesterday and 1 of today, If tonight I can install a not optimized build will send more of them.
Asterisk SVN-branch-1.4-r85280, asterisk-addons/branches/1.4 -r464
By: P. Christeas (xrg) 2007-10-14 02:55:04
ezio, FYI, it is not a matter of optimizing. With gcc, it is just if you have the symbols available (-g option, may be on a separate ELF).
If it crashes too early, it might be a different case. In mine, it *must* be related to h323 peer latency. Just the same binary hasn't crashed ever since my ISP gave me good round-trip to the h323 peer. Wild guess (again) some ooh323 timeout function, or some out-of sync reply does not clear the objects in a sane way.
By: Dmitry Andrianov (dimas) 2007-10-15 09:45:47
From my experience, running Asterisk under valgrind was so far the best way to catch these nasty memory related issues which are very difficult to track otherwise.
However it slows system down alot so maybe you won't be able doing it in production...
By: Ezio Vernacotola (ezio) 2007-10-24 05:24:56
I continue to have crashes that appears to be related to ooh323.
More backtraces added, asterisk compiled with DONT_OPTIMIZE MALLOC_DEBUG
asterisk/branches/1.4 rev 86278
asterisk-addons/branches/1.4 rev 471
I don't know if my problem is related to these of xrg, may be should open a new distinct bug.
By: Dmitry Andrianov (dimas) 2007-10-24 06:28:12
I stongly recommend you unning under valgrind if you really want to understand who corupts the memory...
By: Russell Bryant (russell) 2008-01-16 12:06:01.000-0600
This module has been unsupported for a long time. Now, it is officially marked as unsupported. So, only bug reports with patches will be accepted at this time.