[Home]

Summary:ASTERISK-10544: Asterisk 1.4.13 segfaults at least once daily
Reporter:David Brillert (aragon)Labels:
Date Opened:2007-10-16 10:52:57Date Closed:2011-06-07 14:00:45
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Core/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:
Description:System segfaults at least once daily
Started segfaulting after upgrade from 1.2.24 to 1.4.11
Tried many upgrades in increments
1.4.11
1.4.12
1.4.12 SVN
1.4.12.1
1.4.13
Problem still exists in 1.4.13

I dont know what steps are needed to reproduce.

****** ADDITIONAL INFORMATION ******

Here is my backtrace

(gdb) set pagination off
(gdb) bt
#0  0x0073775e in _int_malloc () from /lib/tls/libc.so.6
#1  0x007396e1 in malloc () from /lib/tls/libc.so.6
#2  0x08067764 in ast_expr ()
#3  0x080eaccf in pbx_substitute_variables_helper_full ()
#4  0x080f14e0 in pbx_extension_helper ()
ASTERISK-1  0x080f27e5 in __ast_pbx_run ()
ASTERISK-2  0x080f4dc1 in pbx_thread ()
ASTERISK-3  0x08127272 in dummy_start ()
ASTERISK-4  0x008853cc in start_thread () from /lib/tls/libpthread.so.0
ASTERISK-5  0x0079dc3e in clone () from /lib/tls/libc.so.6

(gdb) bt full
#0  0x0073775e in _int_malloc () from /lib/tls/libc.so.6
No symbol table info available.
#1  0x007396e1 in malloc () from /lib/tls/libc.so.6
No symbol table info available.
#2  0x08067764 in ast_expr ()
No symbol table info available.
#3  0x080eaccf in pbx_substitute_variables_helper_full ()
No symbol table info available.
#4  0x080f14e0 in pbx_extension_helper ()
No symbol table info available.
ASTERISK-1  0x080f27e5 in __ast_pbx_run ()
No symbol table info available.
ASTERISK-2  0x080f4dc1 in pbx_thread ()
No symbol table info available.
ASTERISK-3  0x08127272 in dummy_start ()
No symbol table info available.
ASTERISK-4  0x008853cc in start_thread () from /lib/tls/libpthread.so.0
No symbol table info available.
ASTERISK-5  0x0079dc3e in clone () from /lib/tls/libc.so.6
No symbol table info available.
Comments:By: Michiel van Baak (mvanbaak) 2007-10-16 11:06:38

Can you recompile asterisk with the DONT_OPTIMIZE flag set and post a new backtrace ?

By: David Brillert (aragon) 2007-10-16 11:58:36

DONT_OPTIMIZE was enabled on this server.
I will have to find out why this is not working, recompile and upload a new backtrace after the next segfault.

By: Tilghman Lesher (tilghman) 2007-10-16 12:48:19

aragon:  if the next backtrace ends in a crash inside malloc(), there's no need to post another backtrace.  What we need, instead, is the log of exactly what happens just before the crash.  If you could post the last 5 seconds worth of activity from the "full" log, that should help.

The problem is one of memory corruption, so we'll need to track down exactly what is stomping on the malloc structures (which causes malloc to abort(3)).

By: Tilghman Lesher (tilghman) 2007-10-16 12:52:45

One more thing:  in compiler options, please turn on MALLOC_DEBUG prior to the next crash.

By: Dmitry Andrianov (dimas) 2007-10-16 14:00:08

valgrind also can help. And running asterisk under valgrind is even simpler than getting backtraces from coredumps :)



By: Volnikov Ivan (ivan) 2007-10-17 01:28:16

aragon -
 Can you look behind a memory size distributed for Asteresk (would be desirable before crashing)?
 Use command "top" in Linux OS command line. Interesting for "VIRT", "%MEM", "SWAP" and "DATA" attributes and detailed back-trace of cause...

By: David Brillert (aragon) 2007-10-17 17:57:20

New coredump this afternoon and I think this time DONT_OPTIMIZE was compiled correctly but not with MALLOC_DEBUG enabled.

The resulting backtrace does not look anything like my first backtrace but bears a striking resemblance to http://bugs.digium.com/view.php?id=10841
Tonight I plan to recompile with MALLOC_DEBUG enabled and wait for the next coredump.
If this anything in common with 10841 I whould be able to generate a segfault by sending some faxes to the main number.
I hope I am dealing with one bug and not two separate issues.
I guess the only way to know for sure is to remove the one problem and see if the site keeps crashing.

Here is todays backtrace
It was confirmed that at at 16h37:17, extension 721 received a faxtone
721 is my reception phone so I presume the inbound fax was an unsolicited fax to the main number.

The back trace timestamp is 16:37:46

#0  0x006bb7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
(gdb) set pagination off
(gdb) bt full
#0  0x006bb7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
No symbol table info available.
#1  0x006fc7a5 in raise () from /lib/tls/libc.so.6
No symbol table info available.
#2  0x006fe209 in abort () from /lib/tls/libc.so.6
No symbol table info available.
#3  0x00730a1a in __libc_message () from /lib/tls/libc.so.6
No symbol table info available.
#4  0x007372bf in _int_free () from /lib/tls/libc.so.6
No symbol table info available.
ASTERISK-1  0x0073763a in free () from /lib/tls/libc.so.6
No symbol table info available.
ASTERISK-2  0x080b9e69 in frame_cache_cleanup ()
No symbol table info available.
ASTERISK-3  0x00885258 in __nptl_deallocate_tsd () from /lib/tls/libpthread.so.0
No symbol table info available.
ASTERISK-4  0x008853da in start_thread () from /lib/tls/libpthread.so.0
No symbol table info available.
ASTERISK-5  0x0079dc3e in clone () from /lib/tls/libc.so.6
No symbol table info available.

By: David Brillert (aragon) 2007-10-17 19:00:19

Hi Ivan

top - 20:18:09 up 2 days, 10:51,  2 users,  load average: 0.35, 0.35, 0.29
Tasks:  71 total,   1 running,  70 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.2% us,  0.3% sy,  0.0% ni, 99.2% id,  0.3% wa,  0.0% hi,  0.0% si
Mem:   2074384k total,  1264592k used,   809792k free,   119856k buffers
Swap:   522040k total,        0k used,   522040k free,   975004k cached

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
   1 root      16   0  2020  548  468 S    0  0.0   0:02.47 init
   2 root      RT   0     0    0    0 S    0  0.0   0:01.15 migration/0
   3 root      34  19     0    0    0 S    0  0.0   0:00.09 ksoftirqd/0
   4 root      RT   0     0    0    0 S    0  0.0   0:01.19 migration/1
   5 root      34  19     0    0    0 S    0  0.0   0:00.05 ksoftirqd/1
   6 root       5 -10     0    0    0 S    0  0.0   1:57.17 events/0
   7 root       5 -10     0    0    0 S    0  0.0   1:53.09 events/1
   8 root       8 -10     0    0    0 S    0  0.0   0:00.01 khelper
   9 root      15 -10     0    0    0 S    0  0.0   0:00.00 kacpid
  52 root       5 -10     0    0    0 S    0  0.0   0:00.00 kblockd/0
  53 root       5 -10     0    0    0 S    0  0.0   0:00.00 kblockd/1
  54 root      15   0     0    0    0 S    0  0.0   0:00.00 khubd
  71 root      20   0     0    0    0 S    0  0.0   0:00.00 pdflush
  72 root      15   0     0    0    0 S    0  0.0   0:13.62 pdflush
  73 root      25   0     0    0    0 S    0  0.0   0:00.00 kswapd0
  74 root      11 -10     0    0    0 S    0  0.0   0:00.00 aio/0
  75 root      11 -10     0    0    0 S    0  0.0   0:00.00 aio/1

By: Volnikov Ivan (ivan) 2007-10-18 04:34:00

I think the last problem is in Addon modules.

By: David Brillert (aragon) 2007-10-18 07:43:33

Hi Ivan

Yes the last backtrace looks exactly like the backtrace in http://bugs.digium.com/view.php?id=10815

I am going to test the patch in 10815 today and keep monitoring this and another site for segfaults.
I'm hoping to see something like my original backtrace so I can identify each issue separately. Naturally I want to track down each bug and kill them all.

On a side note I was not able to compile with MALLOC_DEBUG last night and I am hoping to do so tonight.

By: David Brillert (aragon) 2007-10-24 07:59:08

I have upgraded this site to 1.4.13 SVN
I have also applied the patch from 10815
The segfaults are gone.
I think this ticket is a duplicate of 10875 and 10815 since there were two separate issues causing segfaults.
My sites have not segfaulted on 1.4.13 SVN and version was installed 10/19/2007

I think it is OK to close this ticket.
r85994 | russell | 2007-10-16 17:14:36 -0500 (Tue, 16 Oct 2007) | 16 lines
Appears to have solved my first segfault problem (first backtrace)

10815-2.patch [^] (34,206 bytes) 10-02-07 04:58 [License OK] from http://bugs.digium.com/view.php?id=10815
appears to have fixed the second segfault problem (second backtrace) asterisk/addon module

By: David Brillert (aragon) 2007-10-31 22:46:06

Seriously these issues are gone in SVN 1.4.13 and patch from 10815-2.patch [^] (34,206 bytes) 10-02-07 04:58 [License OK] from http://bugs.digium.com/view.php?id=10815
No segfaults and now it is now almost 2 weeks

I think it is ok to close this ticket.

By: Tilghman Lesher (tilghman) 2007-11-01 00:44:19

Closed at request of reporter.