[Home]

Summary:ASTERISK-15259: Crash due to fault about twice daily
Reporter:Dave Hawkes (hevad)Labels:
Date Opened:2009-12-02 12:05:31.000-0600Date Closed:2011-06-07 14:01:05
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Core/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) backtrace_all.txt
( 1) backtrace.txt
Description:Asterisk crashes sporadically about one or twice per day. back trace always very similar:

(gdb) bt full
#0  0x00002ab0b3dd2265 in raise () from /lib64/libc.so.6
No symbol table info available.
#1  0x00002ab0b3dd3d10 in abort () from /lib64/libc.so.6
No symbol table info available.
#2  0x00002ab0b3e0c84b in __libc_message () from /lib64/libc.so.6
No symbol table info available.
#3  0x00002ab0b3e142ef in _int_free () from /lib64/libc.so.6
No symbol table info available.
#4  0x00002ab0b3e1473b in free () from /lib64/libc.so.6
No symbol table info available.
ASTERISK-1  0x0000000000496f29 in frame_cache_cleanup (data=0x1caf4220) at frame.c:325
       frames = (struct ast_frame_cache *) 0x1caf4220
       f = (struct ast_frame *) 0x1cae7030
ASTERISK-2  0x00002ab0b4303ac9 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
No symbol table info available.
ASTERISK-3  0x00002ab0b43044b5 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
ASTERISK-4  0x00002ab0b3e75c2d in clone () from /lib64/libc.so.6
No symbol table info available.

Comments:By: Elazar Broad (ebroad) 2009-12-02 12:17:24.000-0600

Per the bug guidelines please recompile Asterisk with DONT_OPTIMIZE(make menuconfig) set and post a backtrace(see doc/backtrace.txt). Thanks!

By: Dave Hawkes (hevad) 2009-12-02 12:25:09.000-0600

This was already compiled with DONT_OPTIMIZE.

By: Dave Hawkes (hevad) 2009-12-03 08:37:49.000-0600

I have noticed that the crash invariably occurs immediately after the reception of a fax (using ReceiveFAX). The Fax is received and emailed correctly and asterisk then faults. I can tie in overnight core dumps times with received fax times...

By: Dave Hawkes (hevad) 2009-12-07 13:22:44.000-0600

I modified the source such that the backtrace would indicate where the frame was allocated that caused the heap to crash with this result:

5  0x0000000000496f89 in frame_cache_cleanup (data=0x1398a980) at frame.c:326
       src = 0x12d5bd50 "ulawtolin"
       frames = (struct ast_frame_cache *) 0x1398a980
       f = (struct ast_frame *) 0x12d5bb60

Which indicates the problem frame is allocated in ulawtolin for some reason.

By: Dave Hawkes (hevad) 2009-12-09 08:18:30.000-0600

This issue can be fixed by disabling the frame cache in frame.c by defining LOW_MEMORY near the top of the file. I can't see anything specifically wrong in the frame cache code and it doesn't crash with anything else but fax reception.

What appears to happen is an invalid frame is freed from the cache (causing a crash) which as far as I can tell from traces I put in the code never entered the cache.

By: Pavel Troller (patrol-cz) 2009-12-14 12:25:35.000-0600

I've tested the suggested fix by defining LOW_MEMORY, verified that the frame cache code is not compiled into frame.o, but Asterisk still crashes when the fax reception ends on a DAHDI channel (E1/PRI). Evidently the corrupted frames make harm at some other place, too. I'm afraid that it's necessary to find and fix a real cause of the problem.

By: Dave Hawkes (hevad) 2009-12-14 12:35:35.000-0600

Originally I tried the "fix" I suggested above as a way to narrow down the problem by eliminating the frame cache and getting a backtrace closer to the real problem. However for me I no longer get any crashes after doing this.

So if someone has a system that still crashes after disabling the frame cache can capture and post a backtrace themselves it may isolate the problem a little better.

By: Pavel Troller (patrol-cz) 2009-12-14 13:34:56.000-0600

I've found that I didn't have internal timing properly initialized on my system (-I flag to asterisk was missing by mistake). When I added this flag and "core show settings" now shows that internal timing is enabled, the crashes have gone! I've tested successfully on two systems, which were crashing on every fax reception, sent two faxes onto every of them, and no crash at all! So, maybe the timing changes in 1.6.1 made up this problem, and using proper timing interface (DAHDI in my case) rectifies the situation. What's the status of internal timing on your system ?

By: Dave Hawkes (hevad) 2009-12-15 07:58:42.000-0600

I can also confirm that enabling internal timing also prevents this issue from happening. I suspect the problem may still exist and could appear with different settings that result in a particular sequence of events occurring, but for now it works for me...

By: Benny Amorsen (amorsen) 2010-02-04 09:44:31.000-0600

This issue exists in 1.6.0.22 as well. Enabling dahdi_dummy plus internal_timing=yes mitigates the issue, just like the reports by hevad and patrol-cz.

By: Paul Belanger (pabelanger) 2010-07-24 22:06:10

Per the Asterisk maintenance timeline page at http://www.asterisk.org/asterisk-versions maintenance (bug) support for the 1.6.0 and 1.6.1 branches has ended. For continued maintenance support please move to the 1.6.2 branch.

More information on this change can be found in the release announcement: http://www.asterisk.org/node/49924


By: Paul Belanger (pabelanger) 2010-08-04 11:54:03

Suspended due to lack of activity. Please request a bug marshal in #asterisk-bugs on the IRC network irc.freenode.net to reopen the issue should you have the additional information requested.

Further information can be found at http://www.asterisk.org/developers/bug-guidelines