[Home]

Summary:ASTERISK-10779: Asterisk crash on non-responsive gateway
Reporter:Private Name (falves11)Labels:
Date Opened:2007-11-15 13:07:45.000-0600Date Closed:2011-06-07 14:00:23
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Core/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) crash.txt
( 1) crash1.txt
( 2) crash10.txt
( 3) crash11.txt
( 4) crash2.txt
( 5) crash3.txt
( 6) crash4.txt
( 7) crash5.txt
( 8) crash6.txt
( 9) crash7.txt
(10) crash9.txt
Description:I start my simulator and Asterisk sends the call to a SIP gateway that rejects all calls, and after 120 seconds it crashes.
Comments:By: Mark Michelson (mmichelson) 2007-11-15 18:17:52.000-0600

Please open a core from one of these crashes with gdb and issue the following commands:

f 0
p rtp->rtcp

By: Private Name (falves11) 2007-11-15 20:14:52.000-0600

f 0
0x08083f85 in ast_poll_channel_del (chan0=0xb18b8e60, chan1=0x0) at channel.c:1349
1349                    if (chan1->fds[i] == -1)
---------
in the file crash3 the command p rtp-rtcp fails because there is no symbol. I erased the other cores

By: Private Name (falves11) 2007-11-15 20:25:05.000-0600

I noticed that I had after a few hours of low traffic, only 19 open calls but 14448 sip channels. I type sip show channels and the whole list comes down. This is a sample
66.28.197.100    1956546818  1dccafa0580  00104/00000  unkn  No  (d)  Tx: BYE
66.28.197.100    1704262616  7302e0f1510  00104/00000  unkn  No  (d)  Tx: BYE
66.28.197.100    1636947561  5f7ffddf397  00104/00102  unkn  No  (d)  Tx: BYE
66.28.197.100    1440366651  2279c0c340b  00104/00000  unkn  No  (d)  Tx: BYE
38.102.64.95     1620389280  2cfdca4e5c2  00104/00000  unkn  No  (d)  Tx: BYE

In my opinion the SIP channels remain open and the handles never get relesed. Asterisk had 29000+ file handles open (lsof | grep asterisk | wc -l)

if somebody wants to log into my box and do some further analisys, please contact me at falves1 at hot mail

By: Private Name (falves11) 2007-11-16 01:59:00.000-0600

This what you needed. It corresponds to file crash7.txt. I can reproduce it at will.

(gdb) f 0
#0  0x080e3e39 in ast_rtcp_write (data=0x90e3290) at rtp.c:2906
2906            if (rtp->txcount > rtp->rtcp->lastsrtxcount)
(gdb) p rtp->rtcp
$1 = (struct ast_rtcp *) 0xd300d9
(gdb)

By: Private Name (falves11) 2007-11-16 08:52:00.000-0600

This information corresponds to file crash9
(gdb) f 0
#0  0x080e3e15 in ast_rtcp_write (data=0xabe90630) at rtp.c:2903
2903            if (!rtp || !rtp->rtcp)
(gdb) p rtp->rtcp
Cannot access memory at address 0xabe92fb8
(gdb)

By: Tilghman Lesher (tilghman) 2007-11-16 16:02:13.000-0600

In core 7, please try the following command:

p *(rtp->rtcp)

By: Private Name (falves11) 2007-11-16 16:23:20.000-0600

(gdb)
(gdb) p *(rtp->rtcp)
Cannot access memory at address 0xc380eb4a
(gdb) p rtp->rtcp
$1 = (struct ast_rtcp *) 0xc380eb4a
(gdb)
--------------------
#0  0x080e3e39 in ast_rtcp_write (data=0x9961bc0) at rtp.c:2906
2906            if (rtp->txcount > rtp->rtcp->lastsrtxcount)
(gdb) bt full
#0  0x080e3e39 in ast_rtcp_write (data=0x9961bc0) at rtp.c:2906
       rtp = (struct ast_rtp *) 0x9961bc0
       res = 136234256
#1  0x0810284e in ast_sched_runq (con=0x81ec510) at sched.c:371
       current = (struct sched *) 0x9472820
       tv = {tv_sec = 1195231216, tv_usec = 978519}
       numevents = 18
       res = 64
#2  0xb676e25a in do_monitor (data=0x0) at chan_sip.c:16805
       res = 1
       dialog = (struct sip_pvt *) 0x0
       t = 1195231216
       reloading = 0
       __PRETTY_FUNCTION__ = "do_monitor"
#3  0x08111777 in dummy_start (data=0xb6600fa8) at utils.c:858
       _buffer = {__routine = 0x806ca45 <ast_unregister_thread>,
 __arg = 0xb65ffba0, __canceltype = 0, __prev = 0x0}
       ret = (void *) 0x0
       a = {start_routine = 0xb676df3a <do_monitor>, data = 0x0,
 name = 0xb6600fb8 "do_monitor", ' ' <repeats 11 times>, "started at [16833] chan_sip.c restart_monitor()"}
#4  0xb7eaa3cc in start_thread () from /lib/tls/libpthread.so.0
No symbol table info available.
---Type <return> to continue, or q <return> to quit---
ASTERISK-1  0xb71ffc3e in clone () from /lib/tls/libc.so.6
No symbol table info available.

By: Olle Johansson (oej) 2007-11-18 11:48:40.000-0600

Please use informative summary lines, thank you.

By: Private Name (falves11) 2007-11-19 10:14:13.000-0600

These are the ulimits corresponding to crash11, just in case.

[root@Sipserver tmp]# ulimit -a
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
file size               (blocks, -f) unlimited
pending signals                 (-i) 400000
max locked memory       (kbytes, -l) 32
max memory size         (kbytes, -m) unlimited
open files                      (-n) 400000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 137216
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited



By: Tilghman Lesher (tilghman) 2007-12-14 17:20:22.000-0600

Please apply the patch from ASTERISK-10897 and attempt to reproduce this issue.

By: Private Name (falves11) 2007-12-14 19:29:43.000-0600

just do I apply the patch? can you please send me detailed instructions? Sorry for my lack of patching training.

By: Tilghman Lesher (tilghman) 2007-12-14 19:55:31.000-0600

cd /usr/src/asterisk
patch -p0 < /path/to/patch

By: Private Name (falves11) 2007-12-17 09:50:11.000-0600

I am setting up a test bed today. Tomorrow I will be able to reproduce it.

By: Private Name (falves11) 2008-01-06 14:10:35.000-0600

I aplied the patch to the current trunk version (96645) and it makes no difference. But I compiled it with all the debugging and I get thousand of lines like this
res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 75 sec for mutex '&(&class->odbc_obj)->lock'?
res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 5 sec for mutex '&(&class->odbc_obj)->lock'?
res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 5 sec for mutex '&(&class->odbc_obj)->lock'?
res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 70 sec for mutex '&(&class->odbc_obj)->lock'?
res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 40 sec for mutex '&(&class->odbc_obj)->lock'?
res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 80 sec for mutex '&(&class->odbc_obj)->lock'?
res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 85 sec for mutex '&(&class->odbc_obj)->lock'?
res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 25 sec for mutex '&(&class->odbc_obj)->lock'?
res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 90 sec for mutex '&(&class->odbc_obj)->lock'?
res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 90 sec for mutex '&(&class->odbc_obj)->lock'?
res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 90 sec for mutex '&(&class->odbc_obj)->lock'?
res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 10 sec for mutex '&(&class->odbc_obj)->lock'?
res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 90 sec for mutex '&(&class->odbc_obj)->lock'?
res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 65 sec for mutex '&(&class->odbc_obj)->lock'?
res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 90 sec for mutex '&(&class->odbc_obj)->lock'?
res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 75 sec for mutex '&(&class->odbc_obj)->lock'?
res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 90 sec for mutex '&(&class->odbc_obj)->lock'?
res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 90 sec for mutex '&(&class->odbc_obj)->lock'?
res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 30 sec for mutex '&(&class->odbc_obj)->lock'?
res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 30 sec for mutex '&(&class->odbc_obj)->lock'?

By: Tilghman Lesher (tilghman) 2008-01-07 12:01:57.000-0600

NEVER EVER EVER EVER EVER compile with all debug flags set unless you are prepared to deal with the consequences.  In this case, you enabled DETECT_DEADLOCKS, which you should not have done.

By: Tilghman Lesher (tilghman) 2008-01-15 20:36:09.000-0600

No response from reporter.