[Home]

Summary:ASTERISK-16724: Crash on load
Reporter:Eldad Ran (eldadran)Labels:
Date Opened:2010-09-23 17:22:25Date Closed:2010-10-11 11:18:28
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Channels/chan_sip/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) asterisk_bt_260910.txt
( 1) backtrace.txt
( 2) bt_300910_1.txt
( 3) bt_300910_2.txt
( 4) bt031010.txt
( 5) bt041010.txt
( 6) bt051010.txt
( 7) live_crash_051010.txt
Description:The system just crash, see the bt output

****** ADDITIONAL INFORMATION ******

#0  0x0000003fd0e30265 in raise () from /lib64/libc.so.6
#1  0x0000003fd0e31d10 in abort () from /lib64/libc.so.6
#2  0x0000003fd0e6a84b in __libc_message () from /lib64/libc.so.6
#3  0x0000003fd0e723e5 in _int_free () from /lib64/libc.so.6
#4  0x0000003fd0e7273b in free () from /lib64/libc.so.6
ASTERISK-1  0x000000000048c765 in ast_rtp_destroy (rtp=0x2aaab84d8220) at rtp.c:2286
ASTERISK-2  0x00002aaaac92ee8b in __sip_destroy (p=0x2aaab84d52e0, lockowner=-1) at chan_sip.c:3396
ASTERISK-3  0x00002aaaac9354f5 in do_monitor (data=<value optimized out>) at chan_sip.c:17103
ASTERISK-4  0x00000000004b119c in dummy_start (data=<value optimized out>) at utils.c:856
ASTERISK-5  0x0000003fd1606617 in start_thread () from /lib64/libpthread.so.0
ASTERISK-6 0x0000003fd0ed3c2d in clone () from /lib64/libc.so.6
(gdb) bt full
#0  0x0000003fd0e30265 in raise () from /lib64/libc.so.6
No symbol table info available.
#1  0x0000003fd0e31d10 in abort () from /lib64/libc.so.6
No symbol table info available.
#2  0x0000003fd0e6a84b in __libc_message () from /lib64/libc.so.6
No symbol table info available.
#3  0x0000003fd0e723e5 in _int_free () from /lib64/libc.so.6
No symbol table info available.
#4  0x0000003fd0e7273b in free () from /lib64/libc.so.6
No symbol table info available.
ASTERISK-1  0x000000000048c765 in ast_rtp_destroy (rtp=0x2aaab84d8220) at rtp.c:2286
       __PRETTY_FUNCTION__ = "ast_rtp_destroy"
ASTERISK-2  0x00002aaaac92ee8b in __sip_destroy (p=0x2aaab84d52e0, lockowner=-1) at chan_sip.c:3396
       cur = <value optimized out>
       cp = <value optimized out>
       __PRETTY_FUNCTION__ = "__sip_destroy"
ASTERISK-3  0x00002aaaac9354f5 in do_monitor (data=<value optimized out>) at chan_sip.c:17103
       res = <value optimized out>
       t = 1285277441
       fastrestart = 0
       lastpeernum = -1
       curpeernum = <value optimized out>
       reloading = 1285277441
       __PRETTY_FUNCTION__ = "do_monitor"
ASTERISK-4  0x00000000004b119c in dummy_start (data=<value optimized out>) at utils.c:856
       __cancel_buf = {__cancel_jmp_buf = {{__cancel_jmp_buf = {77605424, 3131717165269338011, 0, 1096605696, 0, 4096, 3131717164206364427,
       3131717165264426521}, __mask_was_saved = 0}}, __pad = {0x415cd1d0, 0x0, 0x0, 0x0}}
       __cancel_arg = (void *) 0x415cd940
       not_first_call = <value optimized out>
       ret = <value optimized out>
ASTERISK-5  0x0000003fd1606617 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
ASTERISK-6 0x0000003fd0ed3c2d in clone () from /lib64/libc.so.6
No symbol table info available.    
Comments:By: Leif Madsen (lmadsen) 2010-09-24 15:21:58

Thank you for your bug report. In order to move your issue forward, we require a backtrace from the core file produced after the crash. Please see the doc/backtrace.txt file in your Asterisk source directory.

Also, be sure you have DONT_OPTIMIZE enabled in menuselect within the Compiler Flags section, then:

make install

after enabling, reproduce the crash, and then execute the instructions in doc/backtrace.txt.

When complete, attach that file to this issue report. Thanks!

By: Eldad Ran (eldadran) 2010-09-26 09:06:19

Files of backtrace as requested.

By: Eldad Ran (eldadran) 2010-09-30 12:36:27

the system crashes more then once a day. do you need more traces or do you need me to make more tests?

By: Jason Parker (jparker) 2010-09-30 12:45:52

How many channels are active when this happens?  It's failing to allocate memory...  Is there an adequate amount available?

By: Eldad Ran (eldadran) 2010-09-30 13:23:52

It varies, from few calls to up to 2000 channels, it crashes randomly, even when only few channels are active, it worked for about 230 days without any reset.
we have 8GB of memory and dual quad core Xeon CPU.
I've just reset the server to clear everything, and see if it crash again.

By: Eldad Ran (eldadran) 2010-09-30 16:39:33

2 crashes one after another at the same minute, memory status:
free -m
            total       used       free     shared    buffers     cached
Mem:          7974       1086       6887          0        128        702
-/+ buffers/cache:        256       7718
Swap:         4000          0       4000

see backtrace files attached.



By: Eldad Ran (eldadran) 2010-10-03 13:40:34

Yet another crash, plenty of memory on system:
free -m
            total       used       free     shared    buffers     cached
Mem:          7974       1569       6405          0        169       1115
-/+ buffers/cache:        284       7690
Swap:         4000          0       4000

By: Stefan Schmidt (schmidts) 2010-10-03 15:30:14

as i see you use a php agi to dial and except BT 300902 this agi allways shows up. do you have some other logs what happens to the agi when asterisk crashes?
maybe its a memory problem between php and asterisk which only occurs sometimes.

By: Eldad Ran (eldadran) 2010-10-03 16:41:59

The php process (PID=312) just hangs there consuming CPU, as the rest of the AGIs after asterisk crash, I had to send SIGTERM to kill it.
no logs found for PHP, as it didn't crash, i guess.

By: Eldad Ran (eldadran) 2010-10-04 10:26:10

This last trace (bt041010.txt) disconnects the relation of the segfault to the AGI, the call that crashed asterisk, started as AGI, which in turns set context and variables on context, AGI returned control back to the dialplan and then dialplan attempt to dial using the variables set previously by the AGI.
This is where it goes to after AGI, and this is where it crash:
exten => redirect,1,Dial(${TC_DSTR}|${TC_DTO}|${TC_DOPT})



By: Eldad Ran (eldadran) 2010-10-05 16:26:42

Take a look at the files marked as 051010, this crash happened while my terminal was open, so I got trace I usually don't get with gdb, maybe you can learn from it more then I can.

By: Jason Parker (jparker) 2010-10-06 12:11:59

What Linux distribution is this happening on?

By: Eldad Ran (eldadran) 2010-10-06 13:37:23

Linux  2.6.18-164.10.1.el5 #1 SMP Thu Jan 7 19:54:26 EST 2010 x86_64 x86_64 x86_64 GNU/Linux
CentOS release 5.4 (Final)

By: Eldad Ran (eldadran) 2010-10-11 10:55:04

I've upgrade the glibc libs on this server 6 days ago, and from that point in there where no crashes, I guess this was the problem on the first place.
The system had no updates from Jan 07 2010, and problems started on Sep 20 2010, so I had no way of knowing this was the problem.
The faulty version was glibc-2.5-42.el5_4.2.x86_64 upgrading to glibc-2.5-49.el5_5.4.x86_64 fixed it.
Thanks for all you help.