Summary: | ASTERISK-10779: Asterisk crash on non-responsive gateway | ||
Reporter: | Private Name (falves11) | Labels: | |
Date Opened: | 2007-11-15 13:07:45.000-0600 | Date Closed: | 2011-06-07 14:00:23 |
Priority: | Critical | Regression? | No |
Status: | Closed/Complete | Components: | Core/General |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) crash.txt ( 1) crash1.txt ( 2) crash10.txt ( 3) crash11.txt ( 4) crash2.txt ( 5) crash3.txt ( 6) crash4.txt ( 7) crash5.txt ( 8) crash6.txt ( 9) crash7.txt (10) crash9.txt | |
Description: | I start my simulator and Asterisk sends the call to a SIP gateway that rejects all calls, and after 120 seconds it crashes. | ||
Comments: | By: Mark Michelson (mmichelson) 2007-11-15 18:17:52.000-0600 Please open a core from one of these crashes with gdb and issue the following commands: f 0 p rtp->rtcp By: Private Name (falves11) 2007-11-15 20:14:52.000-0600 f 0 0x08083f85 in ast_poll_channel_del (chan0=0xb18b8e60, chan1=0x0) at channel.c:1349 1349 if (chan1->fds[i] == -1) --------- in the file crash3 the command p rtp-rtcp fails because there is no symbol. I erased the other cores By: Private Name (falves11) 2007-11-15 20:25:05.000-0600 I noticed that I had after a few hours of low traffic, only 19 open calls but 14448 sip channels. I type sip show channels and the whole list comes down. This is a sample 66.28.197.100 1956546818 1dccafa0580 00104/00000 unkn No (d) Tx: BYE 66.28.197.100 1704262616 7302e0f1510 00104/00000 unkn No (d) Tx: BYE 66.28.197.100 1636947561 5f7ffddf397 00104/00102 unkn No (d) Tx: BYE 66.28.197.100 1440366651 2279c0c340b 00104/00000 unkn No (d) Tx: BYE 38.102.64.95 1620389280 2cfdca4e5c2 00104/00000 unkn No (d) Tx: BYE In my opinion the SIP channels remain open and the handles never get relesed. Asterisk had 29000+ file handles open (lsof | grep asterisk | wc -l) if somebody wants to log into my box and do some further analisys, please contact me at falves1 at hot mail By: Private Name (falves11) 2007-11-16 01:59:00.000-0600 This what you needed. It corresponds to file crash7.txt. I can reproduce it at will. (gdb) f 0 #0 0x080e3e39 in ast_rtcp_write (data=0x90e3290) at rtp.c:2906 2906 if (rtp->txcount > rtp->rtcp->lastsrtxcount) (gdb) p rtp->rtcp $1 = (struct ast_rtcp *) 0xd300d9 (gdb) By: Private Name (falves11) 2007-11-16 08:52:00.000-0600 This information corresponds to file crash9 (gdb) f 0 #0 0x080e3e15 in ast_rtcp_write (data=0xabe90630) at rtp.c:2903 2903 if (!rtp || !rtp->rtcp) (gdb) p rtp->rtcp Cannot access memory at address 0xabe92fb8 (gdb) By: Tilghman Lesher (tilghman) 2007-11-16 16:02:13.000-0600 In core 7, please try the following command: p *(rtp->rtcp) By: Private Name (falves11) 2007-11-16 16:23:20.000-0600 (gdb) (gdb) p *(rtp->rtcp) Cannot access memory at address 0xc380eb4a (gdb) p rtp->rtcp $1 = (struct ast_rtcp *) 0xc380eb4a (gdb) -------------------- #0 0x080e3e39 in ast_rtcp_write (data=0x9961bc0) at rtp.c:2906 2906 if (rtp->txcount > rtp->rtcp->lastsrtxcount) (gdb) bt full #0 0x080e3e39 in ast_rtcp_write (data=0x9961bc0) at rtp.c:2906 rtp = (struct ast_rtp *) 0x9961bc0 res = 136234256 #1 0x0810284e in ast_sched_runq (con=0x81ec510) at sched.c:371 current = (struct sched *) 0x9472820 tv = {tv_sec = 1195231216, tv_usec = 978519} numevents = 18 res = 64 #2 0xb676e25a in do_monitor (data=0x0) at chan_sip.c:16805 res = 1 dialog = (struct sip_pvt *) 0x0 t = 1195231216 reloading = 0 __PRETTY_FUNCTION__ = "do_monitor" #3 0x08111777 in dummy_start (data=0xb6600fa8) at utils.c:858 _buffer = {__routine = 0x806ca45 <ast_unregister_thread>, __arg = 0xb65ffba0, __canceltype = 0, __prev = 0x0} ret = (void *) 0x0 a = {start_routine = 0xb676df3a <do_monitor>, data = 0x0, name = 0xb6600fb8 "do_monitor", ' ' <repeats 11 times>, "started at [16833] chan_sip.c restart_monitor()"} #4 0xb7eaa3cc in start_thread () from /lib/tls/libpthread.so.0 No symbol table info available. ---Type <return> to continue, or q <return> to quit--- ASTERISK-1 0xb71ffc3e in clone () from /lib/tls/libc.so.6 No symbol table info available. By: Olle Johansson (oej) 2007-11-18 11:48:40.000-0600 Please use informative summary lines, thank you. By: Private Name (falves11) 2007-11-19 10:14:13.000-0600 These are the ulimits corresponding to crash11, just in case. [root@Sipserver tmp]# ulimit -a core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited file size (blocks, -f) unlimited pending signals (-i) 400000 max locked memory (kbytes, -l) 32 max memory size (kbytes, -m) unlimited open files (-n) 400000 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 137216 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited By: Tilghman Lesher (tilghman) 2007-12-14 17:20:22.000-0600 Please apply the patch from ASTERISK-10897 and attempt to reproduce this issue. By: Private Name (falves11) 2007-12-14 19:29:43.000-0600 just do I apply the patch? can you please send me detailed instructions? Sorry for my lack of patching training. By: Tilghman Lesher (tilghman) 2007-12-14 19:55:31.000-0600 cd /usr/src/asterisk patch -p0 < /path/to/patch By: Private Name (falves11) 2007-12-17 09:50:11.000-0600 I am setting up a test bed today. Tomorrow I will be able to reproduce it. By: Private Name (falves11) 2008-01-06 14:10:35.000-0600 I aplied the patch to the current trunk version (96645) and it makes no difference. But I compiled it with all the debugging and I get thousand of lines like this res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 75 sec for mutex '&(&class->odbc_obj)->lock'? res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 5 sec for mutex '&(&class->odbc_obj)->lock'? res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 5 sec for mutex '&(&class->odbc_obj)->lock'? res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 70 sec for mutex '&(&class->odbc_obj)->lock'? res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 40 sec for mutex '&(&class->odbc_obj)->lock'? res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 80 sec for mutex '&(&class->odbc_obj)->lock'? res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 85 sec for mutex '&(&class->odbc_obj)->lock'? res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 25 sec for mutex '&(&class->odbc_obj)->lock'? res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 90 sec for mutex '&(&class->odbc_obj)->lock'? res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 90 sec for mutex '&(&class->odbc_obj)->lock'? res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 90 sec for mutex '&(&class->odbc_obj)->lock'? res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 10 sec for mutex '&(&class->odbc_obj)->lock'? res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 90 sec for mutex '&(&class->odbc_obj)->lock'? res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 65 sec for mutex '&(&class->odbc_obj)->lock'? res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 90 sec for mutex '&(&class->odbc_obj)->lock'? res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 75 sec for mutex '&(&class->odbc_obj)->lock'? res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 90 sec for mutex '&(&class->odbc_obj)->lock'? res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 90 sec for mutex '&(&class->odbc_obj)->lock'? res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 30 sec for mutex '&(&class->odbc_obj)->lock'? res_odbc.c line 451 (ast_odbc_request_obj): Deadlock? waited 30 sec for mutex '&(&class->odbc_obj)->lock'? By: Tilghman Lesher (tilghman) 2008-01-07 12:01:57.000-0600 NEVER EVER EVER EVER EVER compile with all debug flags set unless you are prepared to deal with the consequences. In this case, you enabled DETECT_DEADLOCKS, which you should not have done. By: Tilghman Lesher (tilghman) 2008-01-15 20:36:09.000-0600 No response from reporter. |