Summary: | ASTERISK-00051: Apparent bug in chan_sip.c register timer routines causes crash | ||
Reporter: | John Todd (jtodd) | Labels: | |
Date Opened: | 2003-08-05 18:43:35 | Date Closed: | 2011-06-07 14:00:32 |
Priority: | Critical | Regression? | No |
Status: | Closed/Complete | Components: | Core/General |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ||
Description: | I have 7 "register" lines in my sip.conf, and I have half a dozen or so SIP UA's pointed at my Asterisk server as well. I came home today and found Asterisk dead, with a corefile leftover. Anyone who wishes to have access to the corefile may mail me (jtodd@loligo.com) within a week or so, and I'll send it along on a website somewhere. ****** ADDITIONAL INFORMATION ****** [foo] # gdb asterisk /tmp/core.1071 . . . Reading symbols from /usr/lib/asterisk/modules/format_h263.so...done. Loaded symbols for /usr/lib/asterisk/modules/format_h263.so #0 sip_reg_timeout (data=0x815c898) at chan_sip.c:2747 2747 p->registry = NULL; (gdb) bt #0 sip_reg_timeout (data=0x815c898) at chan_sip.c:2747 #1 0x080519f5 in ast_sched_runq (con=0x80d96d8) at sched.c:355 #2 0x434555ae in do_monitor (data=0x0) at chan_sip.c:4993 #3 0x40033b9c in pthread_start_thread (arg=0x43c5ebe0) at manager.c:274 (gdb) bt full #0 sip_reg_timeout (data=0x815c898) at chan_sip.c:2747 r = (struct sip_registry *) 0x815c898 #1 0x080519f5 in ast_sched_runq (con=0x80d96d8) at sched.c:355 tv = {tv_sec = 1060120808, tv_usec = 675337} x = 0 #2 0x434555ae in do_monitor (data=0x0) at chan_sip.c:4993 res = 135108312 sip = (struct sip_pvt *) 0x80d96d8 peer = (struct sip_peer *) 0x80d96d8 t = 1060120807 fastrestart = 0 lastpeernum = -1 curpeernum = 15 #3 0x40033b9c in pthread_start_thread (arg=0x43c5ebe0) at manager.c:274 self = 0x43c5ebe0 request = {req_thread = 0x0, req_kind = REQ_CREATE, req_args = {create = {attr = 0x0, fn = 0, arg = 0x0, mask = {__val = {0 <repeats 32 times>}}}, free = {thread_id = 0}, exit = { code = 0}, post = 0x0}} outcome = (void *) 0xd060000 (gdb) ms1*CLI> show version Asterisk CVS-08/04/03-17:27:09 built by root@foo.somewhere.com on a i686 running Linux ms1*CLI> | ||
Comments: | By: Mark Spencer (markster) 2003-08-05 22:19:20 Does running it under valgrind provide any more information? By: John Todd (jtodd) 2003-08-06 15:26:22 can't say I'm familiar with how to get valgrind to work. Can you give me some more details and syntaxes, and I'll attempt to try. By: John Todd (jtodd) 2003-08-11 17:22:57 Again, I'd be happy to run under valgrind if I could get an example. By: Mark Spencer (markster) 2003-08-24 14:53:03 in /usr/src/asterisk: make clean ; make valgrind valgrind --gdb-attach=yes ./asterisk -vvvgc Just answer "yes" to attach *if* it looks like something sip related, or find me to analyze. By: John Todd (jtodd) 2003-09-04 03:03:46 I haven't had much time to get valgrind installed on my ancient 7.2 system; all the RPMs I seem to try have a million other dependencies. Still working on it... In other news, this is what I see after letting Asterisk CVS-08/29/03-17:14:03 run for about 24 hours. Obviously, I don't have 86 live calls - these seem to be some sort of "leftover" that isn't being cleared correctly in the case of (perhaps) no reply on a SIP REGISTER request. I am not 100% certain that the symptoms I describe below are related to my corefile output above, as I have not had a core dump in some weeks due to the problem. However, my * server locks up when it reaches 499 SIP channels in the "sip show channels" output, and my system runs out of file descriptors. Reload does not work, inbound/outbound calls do not work. Asterisk must then be shut down and restarted to clear the channels. soft hangup doesn't show the channels, so I can't even kill them off that way. ms1*CLI> sip show channels Peer User/ANR Call ID Seq (Tx/Rx) Lag Jitter Format 213.137.73.178 1415874724 0ee74ecc01b 00684/00000 00000ms 0000ms UNKN 204.61.208.90 174*627 51674d8f2dd 00103/00104 00000ms 0000ms UNKN 192.246.69.223 18438 467b13af3cc 00544/00000 00000ms 0000ms UNKN 192.246.69.223 18438 467b13af3cc 00543/00000 00000ms 0000ms UNKN 192.246.69.223 18438 467b13af3cc 00535/00000 00000ms 0000ms UNKN 192.246.69.223 18438 467b13af3cc 00468/00000 00000ms 0000ms UNKN 192.246.69.223 18438 467b13af3cc 00467/00000 00000ms 0000ms UNKN 192.246.69.223 18438 467b13af3cc 00466/00000 00000ms 0000ms UNKN 192.246.69.223 18438 467b13af3cc 00465/00000 00000ms 0000ms UNKN 192.246.69.223 18438 467b13af3cc 00464/00000 00000ms 0000ms UNKN 192.246.69.223 18438 467b13af3cc 00463/00000 00000ms 0000ms UNKN 192.246.69.223 18438 467b13af3cc 00462/00000 00000ms 0000ms UNKN 192.246.69.223 18438 467b13af3cc 00461/00000 00000ms 0000ms UNKN 192.246.69.223 18438 467b13af3cc 00460/00000 00000ms 0000ms UNKN 192.246.69.223 18438 467b13af3cc 00459/00000 00000ms 0000ms UNKN 192.246.69.223 18438 467b13af3cc 00390/00000 00000ms 0000ms UNKN 192.246.69.223 18438 467b13af3cc 00192/00000 00000ms 0000ms UNKN 192.246.69.223 18438 467b13af3cc 00191/00000 00000ms 0000ms UNKN 192.246.69.223 18438 467b13af3cc 00190/00000 00000ms 0000ms UNKN 192.246.69.223 18438 467b13af3cc 00189/00000 00000ms 0000ms UNKN 192.246.69.223 18438 467b13af3cc 00188/00000 00000ms 0000ms UNKN 192.246.69.223 18438 467b13af3cc 00187/00000 00000ms 0000ms UNKN 192.246.69.223 18438 467b13af3cc 00186/00000 00000ms 0000ms UNKN 192.246.69.223 18438 467b13af3cc 00185/00000 00000ms 0000ms UNKN 192.246.69.223 18438 467b13af3cc 00171/00000 00000ms 0000ms UNKN 195.37.77.101 (None) 73d5fc26-0@ 00101/00011 00000ms 0000ms UNKN 192.246.69.223 18438 4aaaaeb419d 01206/00000 00000ms 0000ms UNKN 192.246.69.223 18438 4aaaaeb419d 01203/00000 00000ms 0000ms UNKN 192.246.69.223 18438 4aaaaeb419d 01070/00000 00000ms 0000ms UNKN 192.246.69.223 18438 4aaaaeb419d 01069/00000 00000ms 0000ms UNKN 192.246.69.223 18438 4aaaaeb419d 01061/00000 00000ms 0000ms UNKN 192.246.69.223 18438 4aaaaeb419d 01059/00000 00000ms 0000ms UNKN 192.246.69.223 18438 4aaaaeb419d 01057/00000 00000ms 0000ms UNKN 192.246.69.223 18438 4aaaaeb419d 01056/00000 00000ms 0000ms UNKN 192.246.69.223 18438 4aaaaeb419d 01055/00000 00000ms 0000ms UNKN 213.137.73.178 1415874724 33eb4a4561d 00996/00000 00000ms 0000ms UNKN 192.246.69.223 18438 4aaaaeb419d 00998/00000 00000ms 0000ms UNKN 192.246.69.223 18438 4aaaaeb419d 00997/00000 00000ms 0000ms UNKN 192.246.69.223 18438 4aaaaeb419d 00996/00000 00000ms 0000ms UNKN 192.246.69.223 18438 4aaaaeb419d 00995/00000 00000ms 0000ms UNKN 213.137.73.178 1415874724 33eb4a4561d 00937/00000 00000ms 0000ms UNKN 192.246.69.223 18438 4aaaaeb419d 00980/00000 00000ms 0000ms UNKN 192.246.69.223 18438 4aaaaeb419d 00775/00000 00000ms 0000ms UNKN 192.246.69.223 18438 4aaaaeb419d 00774/00000 00000ms 0000ms UNKN 192.246.69.223 18438 4aaaaeb419d 00773/00000 00000ms 0000ms UNKN 192.246.69.223 18438 4aaaaeb419d 00772/00000 00000ms 0000ms UNKN 213.137.73.178 1415874724 33eb4a4561d 00622/00000 00000ms 0000ms UNKN 192.246.69.223 18438 4aaaaeb419d 00551/00000 00000ms 0000ms UNKN 204.61.208.90 3856*548@i 48d53c29681 00317/00000 00000ms 0000ms UNKN 204.61.208.90 0@inoc-dba 3bd028d64f1 00317/00000 00000ms 0000ms UNKN 204.61.208.90 11893@inoc 62b19b6e57f 00317/00000 00000ms 0000ms UNKN 204.61.208.90 42*548@ino 3ef8bd0c7b1 00317/00000 00000ms 0000ms UNKN 195.37.77.101 jtodd 56b5ddbb0dd 00525/00000 00000ms 0000ms UNKN 204.61.208.90 3856*548@i 48d53c29681 00316/00000 00000ms 0000ms UNKN 204.61.208.90 0@inoc-dba 3bd028d64f1 00316/00000 00000ms 0000ms UNKN 204.61.208.90 11893@inoc 62b19b6e57f 00316/00000 00000ms 0000ms UNKN 204.61.208.90 42*548@ino 3ef8bd0c7b1 00316/00000 00000ms 0000ms UNKN 195.37.77.101 jtodd 56b5ddbb0dd 00524/00000 00000ms 0000ms UNKN 204.61.208.90 3856*548@i 48d53c29681 00170/00000 00000ms 0000ms UNKN 204.61.208.90 0@inoc-dba 3bd028d64f1 00170/00000 00000ms 0000ms UNKN 204.61.208.90 11893@inoc 62b19b6e57f 00170/00000 00000ms 0000ms UNKN 204.61.208.90 42*548@ino 3ef8bd0c7b1 00170/00000 00000ms 0000ms UNKN 204.61.208.90 3856*548@i 48d53c29681 00169/00000 00000ms 0000ms UNKN 204.61.208.90 0@inoc-dba 3bd028d64f1 00169/00000 00000ms 0000ms UNKN 204.61.208.90 11893@inoc 62b19b6e57f 00169/00000 00000ms 0000ms UNKN 204.61.208.90 42*548@ino 3ef8bd0c7b1 00169/00000 00000ms 0000ms UNKN 204.61.208.90 3856*548@i 48d53c29681 00168/00000 00000ms 0000ms UNKN 204.61.208.90 0@inoc-dba 3bd028d64f1 00168/00000 00000ms 0000ms UNKN 204.61.208.90 11893@inoc 62b19b6e57f 00168/00000 00000ms 0000ms UNKN 204.61.208.90 42*548@ino 3ef8bd0c7b1 00168/00000 00000ms 0000ms UNKN 192.246.69.223 18438 4aaaaeb419d 00232/00000 00000ms 0000ms UNKN 192.246.69.223 18438 4aaaaeb419d 00229/00000 00000ms 0000ms UNKN 192.246.69.223 18438 4aaaaeb419d 00225/00000 00000ms 0000ms UNKN 192.246.69.223 18438 4aaaaeb419d 00223/00000 00000ms 0000ms UNKN 192.246.69.223 18438 4aaaaeb419d 00222/00000 00000ms 0000ms UNKN 192.246.69.223 18438 4aaaaeb419d 00218/00000 00000ms 0000ms UNKN 192.246.69.223 18438 4aaaaeb419d 00209/00000 00000ms 0000ms UNKN 192.246.69.223 18438 4aaaaeb419d 00208/00000 00000ms 0000ms UNKN 192.246.69.223 18438 4aaaaeb419d 00207/00000 00000ms 0000ms UNKN 192.246.69.223 18438 4aaaaeb419d 00201/00000 00000ms 0000ms UNKN 192.246.69.223 18438 4aaaaeb419d 00198/00000 00000ms 0000ms UNKN 192.246.69.223 18438 4aaaaeb419d 00197/00000 00000ms 0000ms UNKN 192.246.69.223 18438 4aaaaeb419d 00196/00000 00000ms 0000ms UNKN 192.246.69.223 18438 4aaaaeb419d 00195/00000 00000ms 0000ms UNKN 192.246.69.223 18438 4aaaaeb419d 00193/00000 00000ms 0000ms UNKN 192.246.69.223 18438 4aaaaeb419d 00187/00000 00000ms 0000ms UNKN 86 active SIP channel(s) ms1*CLI> By: John Todd (jtodd) 2003-09-04 03:07:08 This may be similar or identical to the bug reported in http://bugs.digium.com/bug_view_page.php?bug_id=0000188 - however, there are some differences so I won't close this one just yet. By: Brian West (bkw918) 2003-09-05 08:23:33 I see the same thing happening here also with my FWD registration. By: tekp (tekp) 2003-09-05 14:26:28 I have the exact same thing where sip calls stay "leftover" in the system even after they are disconnected. I don't know exactly how many it takes but just like jtodd my asterisk "locks" and no calls come in or out, sip clients can't register, and I can't reload asterisk. I have to stop asterisk. This happens on 3 different boxes and four different CVS versions all after 07/18/03. A version of asterisk I use from 07/18/03 does not lock up even though some of the sip calls stay "leftover". I use a cisco 3640 for my sip voice gateway and cisco sip ip phones By: paul_cheng (paul_cheng) 2003-09-07 14:33:51 I also had these "artifacts" of left over calls since June. After latest CVS from last week and after I changed the SIP default register interval to something higher than 240 (like 3500), the problem seems to have gone away--at least with FWD. I had register statements also with iConnect, but didn't have as many artifacts as with Freeworld. There are definitely still instances where after a call has ended, Asterisk does not close the SIP channel, but soft hangup doesn't work. I'll try to be more specific in my next post after I've had a chance to see what happened over the weekend. By: John Todd (jtodd) 2003-09-07 14:45:52 Changing the default register interval to a higher number just masks this problem, since there are fewer chances for REGISTER requests to go awry. I prefer to keep that number lower, so as to catch "broken" network and server configurations more quickly. Many of the remote servers I register with have failover paths based on my registration status, and a false registration status is worse than no registration status, thus I keep the numbers down around 120 seconds. By: John Todd (jtodd) 2003-09-08 15:14:16 I closed out bug 188, which was a duplicate report of this error. See http://bugs.digium.com/bug_update_page.php?bug_id=0000188 for some minor additional notes on symptoms. By: skandranon (skandranon) 2003-09-09 17:07:57 This is a fairly serious bug for us. Is there any progress since a month ago that we can test? I've looked at the code, but nothing jumps out at me. That's not shocking since my looking was perhaps 10 minutes. By: oliver (oliver) 2003-09-09 18:32:50 Hey skandranon, Same here. After it caused one of our production-systems to grind to a halt after one of it's SIP-peers was unresponsive for a while, I decided to remove all SIP-registrations from the config, but that's not an ideal solution, ofcourse. A few weeks ago, Mark told me on IRC that he could reproduce this problem, so it's either very hard to track down or he didn't get around to taking a good look at it yet. Anyways, I'm willing to help with debugging in any way I can. By: John Todd (jtodd) 2003-09-11 13:55:12 another duplicate bug for this problem closed out. For details, see: http://bugs.digium.com/bug_view_page.php?bug_id=0000241 By: John Todd (jtodd) 2003-09-15 14:07:19 From: Jayson Vantuyl <kagato@souja.net> To: asterisk-dev@lists.digium.com Subject: Re: [Asterisk-Dev] clearing 'stuck' channels Reply-To: asterisk-dev@lists.digium.com Date: Sun, 14 Sep 2003 19:38:11 -0500 On Thu, Aug 28, 2003 at 01:59:47PM -0700, John Todd wrote: > Did you find any solutions to this? I've recently discovered the > same problem with my server here. I had 499 "sip show channels" > entries, and then Asterisk refused to start up any new sessions due > to "Too many files open". In fact, 'reload' worked but didn't work, > due to the inability of the system to open any of the config files. > I had to shut Asterisk down to get things working again. My CVS > version was four days old. You may be able to temporarily avoid this problem (i.e. thus allowing you to plan an Asterisk restart every midnight) by cranking up your max-files in the kernel. Echo a larger value into /proc/sys/fs/file-max. Jayson By: skandranon (skandranon) 2003-09-23 19:30:17 Here's a sniplet of ls -l of the /proc thing: lrwx------ 1 root root 64 Sep 23 16:58 990 -> socket:[4913837] lrwx------ 1 root root 64 Sep 23 16:58 991 -> socket:[4913838] lrwx------ 1 root root 64 Sep 23 16:58 992 -> socket:[4913840] lrwx------ 1 root root 64 Sep 23 16:58 993 -> socket:[4913841] lrwx------ 1 root root 64 Sep 23 16:58 994 -> socket:[4913843] lrwx------ 1 root root 64 Sep 23 16:58 995 -> socket:[4913844] lrwx------ 1 root root 64 Sep 23 16:58 996 -> socket:[4913864] lrwx------ 1 root root 64 Sep 23 16:58 997 -> socket:[4913865] lrwx------ 1 root root 64 Sep 23 16:58 998 -> socket:[4913867] lrwx------ 1 root root 64 Sep 23 16:58 999 -> socket:[4913868] Nearly all 2048 FDs were sockets. I suspect an error condition in a SIP error path. --Michael By: Mark Spencer (markster) 2003-09-30 23:55:03 I believe I have fixes for both problems in CVS. Please try and let me know. By: John Todd (jtodd) 2003-10-01 04:21:05 seems to work for me - I no longer see sockets piling up due to SIP registration timeouts. I artificially induced bad registrations by giving both bogus (no answer) and invalid (ICMP invalid port) answers, and the system now seems to do the right thing. By: oliver (oliver) 2003-10-01 17:42:24 Same over here. Seems the fixes did it. By: Brian West (bkw918) 2003-10-01 18:12:51 nope sure doesn't.. signup for an FWD account.. mine fills right up. By: Brian West (bkw918) 2003-10-01 18:13:22 asterisk*CLI> sip show channels Peer User/ANR Call ID Seq (Tx/Rx) Lag Jitter Format 192.246.69.223 (None) 22C9719A-6D 00101/31104 00000ms 0000ms UNKN 192.246.69.223 (None) 32E81BAD-F4 00101/61778 00000ms 0000ms UNKN 192.246.69.223 (None) 333880C0-F4 00101/38681 00000ms 0000ms UNKN 192.246.69.223 (None) FF3E76F9-F4 00101/40697 00000ms 0000ms UNKN 192.246.69.223 (None) 250C0BA0-F4 00101/06944 00000ms 0000ms UNKN 5 active SIP channel(s) asterisk*CLI> show version Asterisk CVS-10/01/03-12:47:32 built by root@asterisk on a i686 running Linux By: Brian West (bkw918) 2003-10-01 18:15:25 going to cvs and try again.. seems alot of updates. By: John Todd (jtodd) 2003-10-01 21:09:58 Well, I don't know why your system is acting differently than mine, but I have several bogus SIP register statements to non-existant IP addresses (no answer) or to IP addresses that give back bad replies. It still appears that this patch is working correctly. Also: I have no problems registering with FWD at the IP addres you reference. Perhaps this is some different type of problem? There are intervals where I see something in "sip show channels" but that is only during an "active" registration attempt. In those instances, I see my FWD username in the User/ANR column, but your example shows "(None)" - is that a clue to perhaps some different problem? ms1*CLI> sip show channels Peer User/ANR Call ID Seq (Tx/Rx) Lag Jitter Format 0 active SIP channel(s) ms1*CLI> show version Asterisk CVS-10/02/03-01:41:50 built by root@somewhere.com on a i686 running Linux ms1*CLI> By: Brian West (bkw918) 2003-10-01 22:28:38 I answer the 55555 line for FWD sometimes.. and today I cvsed about noon and recompiled so we shall see how it goes .. I will let it go for the next few hours and see how it does. By: Brian West (bkw918) 2003-10-02 10:23:03 Ok seems to be gone now.... :P By: John Todd (jtodd) 2003-10-02 14:27:03 We have a quorum. Ticket resolved. |