[Home]

Summary:ASTERISK-00051: Apparent bug in chan_sip.c register timer routines causes crash
Reporter:John Todd (jtodd)Labels:
Date Opened:2003-08-05 18:43:35Date Closed:2011-06-07 14:00:32
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Core/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:
Description:I have 7 "register" lines in my sip.conf, and I have half a dozen or so SIP UA's pointed at my Asterisk server as well.  I came home today and found Asterisk dead, with a corefile leftover.  

Anyone who wishes to have access to the corefile may mail me (jtodd@loligo.com) within a week or so, and I'll send it along on a website somewhere.

****** ADDITIONAL INFORMATION ******

[foo] # gdb asterisk /tmp/core.1071
.
.
.
Reading symbols from /usr/lib/asterisk/modules/format_h263.so...done.
Loaded symbols for /usr/lib/asterisk/modules/format_h263.so
#0  sip_reg_timeout (data=0x815c898) at chan_sip.c:2747
2747                    p->registry = NULL;
(gdb) bt
#0  sip_reg_timeout (data=0x815c898) at chan_sip.c:2747
#1  0x080519f5 in ast_sched_runq (con=0x80d96d8) at sched.c:355
#2  0x434555ae in do_monitor (data=0x0) at chan_sip.c:4993
#3  0x40033b9c in pthread_start_thread (arg=0x43c5ebe0) at manager.c:274
(gdb) bt full
#0  sip_reg_timeout (data=0x815c898) at chan_sip.c:2747
       r = (struct sip_registry *) 0x815c898
#1  0x080519f5 in ast_sched_runq (con=0x80d96d8) at sched.c:355
       tv = {tv_sec = 1060120808, tv_usec = 675337}
       x = 0
#2  0x434555ae in do_monitor (data=0x0) at chan_sip.c:4993
       res = 135108312
       sip = (struct sip_pvt *) 0x80d96d8
       peer = (struct sip_peer *) 0x80d96d8
       t = 1060120807
       fastrestart = 0
       lastpeernum = -1
       curpeernum = 15
#3  0x40033b9c in pthread_start_thread (arg=0x43c5ebe0) at manager.c:274
       self = 0x43c5ebe0
       request = {req_thread = 0x0, req_kind = REQ_CREATE, req_args = {create = {attr = 0x0,
     fn = 0, arg = 0x0, mask = {__val = {0 <repeats 32 times>}}}, free = {thread_id = 0}, exit = {
     code = 0}, post = 0x0}}
       outcome = (void *) 0xd060000
(gdb)


ms1*CLI> show version
Asterisk CVS-08/04/03-17:27:09 built by root@foo.somewhere.com on a i686 running Linux
ms1*CLI>
Comments:By: Mark Spencer (markster) 2003-08-05 22:19:20

Does running it under valgrind provide any more information?

By: John Todd (jtodd) 2003-08-06 15:26:22

can't say I'm familiar with how to get valgrind to work.  Can you give me some more details and syntaxes, and I'll attempt to try.

By: John Todd (jtodd) 2003-08-11 17:22:57

Again, I'd be happy to run under valgrind if I could get an example.

By: Mark Spencer (markster) 2003-08-24 14:53:03

in /usr/src/asterisk:
make clean ; make valgrind
valgrind --gdb-attach=yes ./asterisk -vvvgc

Just answer "yes" to attach *if* it looks like something sip related, or find me to analyze.

By: John Todd (jtodd) 2003-09-04 03:03:46

I haven't had much time to get valgrind installed on my ancient 7.2 system; all the RPMs I seem to try have a million other dependencies.  Still working on it...

In other news, this is what I see after letting Asterisk CVS-08/29/03-17:14:03 run for about 24 hours.  Obviously, I don't have 86 live calls - these seem to be some sort of "leftover" that isn't being cleared correctly in the case of (perhaps) no reply on a SIP REGISTER request.  I am not 100% certain that the symptoms I describe below are related to my corefile output above, as I have not had a core dump in some weeks due to the problem.   However, my * server locks up when it reaches 499 SIP channels in the "sip show channels" output, and my system runs out of file descriptors.  Reload does not work, inbound/outbound calls do not work.  Asterisk must then be shut down and restarted to clear the channels.  soft hangup doesn't show the channels, so I can't even kill them off that way.

ms1*CLI> sip show channels
Peer             User/ANR    Call ID      Seq (Tx/Rx)  Lag      Jitter  Format
213.137.73.178   1415874724  0ee74ecc01b  00684/00000  00000ms  0000ms  UNKN  
204.61.208.90    174*627     51674d8f2dd  00103/00104  00000ms  0000ms  UNKN  
192.246.69.223   18438       467b13af3cc  00544/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       467b13af3cc  00543/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       467b13af3cc  00535/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       467b13af3cc  00468/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       467b13af3cc  00467/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       467b13af3cc  00466/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       467b13af3cc  00465/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       467b13af3cc  00464/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       467b13af3cc  00463/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       467b13af3cc  00462/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       467b13af3cc  00461/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       467b13af3cc  00460/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       467b13af3cc  00459/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       467b13af3cc  00390/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       467b13af3cc  00192/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       467b13af3cc  00191/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       467b13af3cc  00190/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       467b13af3cc  00189/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       467b13af3cc  00188/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       467b13af3cc  00187/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       467b13af3cc  00186/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       467b13af3cc  00185/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       467b13af3cc  00171/00000  00000ms  0000ms  UNKN  
195.37.77.101    (None)      73d5fc26-0@  00101/00011  00000ms  0000ms  UNKN  
192.246.69.223   18438       4aaaaeb419d  01206/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       4aaaaeb419d  01203/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       4aaaaeb419d  01070/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       4aaaaeb419d  01069/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       4aaaaeb419d  01061/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       4aaaaeb419d  01059/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       4aaaaeb419d  01057/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       4aaaaeb419d  01056/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       4aaaaeb419d  01055/00000  00000ms  0000ms  UNKN  
213.137.73.178   1415874724  33eb4a4561d  00996/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       4aaaaeb419d  00998/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       4aaaaeb419d  00997/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       4aaaaeb419d  00996/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       4aaaaeb419d  00995/00000  00000ms  0000ms  UNKN  
213.137.73.178   1415874724  33eb4a4561d  00937/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       4aaaaeb419d  00980/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       4aaaaeb419d  00775/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       4aaaaeb419d  00774/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       4aaaaeb419d  00773/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       4aaaaeb419d  00772/00000  00000ms  0000ms  UNKN  
213.137.73.178   1415874724  33eb4a4561d  00622/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       4aaaaeb419d  00551/00000  00000ms  0000ms  UNKN  
204.61.208.90    3856*548@i  48d53c29681  00317/00000  00000ms  0000ms  UNKN  
204.61.208.90    0@inoc-dba  3bd028d64f1  00317/00000  00000ms  0000ms  UNKN  
204.61.208.90    11893@inoc  62b19b6e57f  00317/00000  00000ms  0000ms  UNKN  
204.61.208.90    42*548@ino  3ef8bd0c7b1  00317/00000  00000ms  0000ms  UNKN  
195.37.77.101    jtodd       56b5ddbb0dd  00525/00000  00000ms  0000ms  UNKN  
204.61.208.90    3856*548@i  48d53c29681  00316/00000  00000ms  0000ms  UNKN  
204.61.208.90    0@inoc-dba  3bd028d64f1  00316/00000  00000ms  0000ms  UNKN  
204.61.208.90    11893@inoc  62b19b6e57f  00316/00000  00000ms  0000ms  UNKN  
204.61.208.90    42*548@ino  3ef8bd0c7b1  00316/00000  00000ms  0000ms  UNKN  
195.37.77.101    jtodd       56b5ddbb0dd  00524/00000  00000ms  0000ms  UNKN  
204.61.208.90    3856*548@i  48d53c29681  00170/00000  00000ms  0000ms  UNKN  
204.61.208.90    0@inoc-dba  3bd028d64f1  00170/00000  00000ms  0000ms  UNKN  
204.61.208.90    11893@inoc  62b19b6e57f  00170/00000  00000ms  0000ms  UNKN  
204.61.208.90    42*548@ino  3ef8bd0c7b1  00170/00000  00000ms  0000ms  UNKN  
204.61.208.90    3856*548@i  48d53c29681  00169/00000  00000ms  0000ms  UNKN  
204.61.208.90    0@inoc-dba  3bd028d64f1  00169/00000  00000ms  0000ms  UNKN  
204.61.208.90    11893@inoc  62b19b6e57f  00169/00000  00000ms  0000ms  UNKN  
204.61.208.90    42*548@ino  3ef8bd0c7b1  00169/00000  00000ms  0000ms  UNKN  
204.61.208.90    3856*548@i  48d53c29681  00168/00000  00000ms  0000ms  UNKN  
204.61.208.90    0@inoc-dba  3bd028d64f1  00168/00000  00000ms  0000ms  UNKN  
204.61.208.90    11893@inoc  62b19b6e57f  00168/00000  00000ms  0000ms  UNKN  
204.61.208.90    42*548@ino  3ef8bd0c7b1  00168/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       4aaaaeb419d  00232/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       4aaaaeb419d  00229/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       4aaaaeb419d  00225/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       4aaaaeb419d  00223/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       4aaaaeb419d  00222/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       4aaaaeb419d  00218/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       4aaaaeb419d  00209/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       4aaaaeb419d  00208/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       4aaaaeb419d  00207/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       4aaaaeb419d  00201/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       4aaaaeb419d  00198/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       4aaaaeb419d  00197/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       4aaaaeb419d  00196/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       4aaaaeb419d  00195/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       4aaaaeb419d  00193/00000  00000ms  0000ms  UNKN  
192.246.69.223   18438       4aaaaeb419d  00187/00000  00000ms  0000ms  UNKN  
86 active SIP channel(s)
ms1*CLI>

By: John Todd (jtodd) 2003-09-04 03:07:08

This may be similar or identical to the bug reported in http://bugs.digium.com/bug_view_page.php?bug_id=0000188   - however, there are some differences so I won't close this one just yet.

By: Brian West (bkw918) 2003-09-05 08:23:33

I see the same thing happening here also with my FWD registration.

By: tekp (tekp) 2003-09-05 14:26:28

I have the exact same thing where sip calls stay "leftover" in the system even after they are disconnected.  I don't know exactly how many it takes but just like jtodd my asterisk "locks" and no calls come in or out, sip clients can't register, and I can't reload asterisk.  I have to stop asterisk.  

This happens on 3 different boxes and four different CVS versions all after 07/18/03.  A version of asterisk I use from 07/18/03 does not lock up even though some of the sip calls stay "leftover".

I use a cisco 3640 for my sip voice gateway and cisco sip ip phones

By: paul_cheng (paul_cheng) 2003-09-07 14:33:51

I also had these "artifacts" of left over calls since June. After latest CVS from last week and after I changed the SIP default register interval to something higher than 240 (like 3500), the problem seems to have gone away--at least with FWD. I had register statements also with iConnect, but didn't have as many artifacts as with Freeworld.

There are definitely still instances where after a call has ended, Asterisk does not close the SIP channel, but soft hangup doesn't work. I'll try to be more specific in my next post after I've had a chance to see what happened over the weekend.

By: John Todd (jtodd) 2003-09-07 14:45:52

Changing the default register interval to a higher number just masks this problem, since there are fewer chances for REGISTER requests to go awry.  I prefer to keep that number lower, so as to catch "broken" network and server configurations more quickly.  Many of the remote servers I register with have failover paths based on my registration status, and a false registration status is worse than no registration status, thus I keep the numbers down around 120 seconds.

By: John Todd (jtodd) 2003-09-08 15:14:16

I closed out bug 188, which was a duplicate report of this error.  See http://bugs.digium.com/bug_update_page.php?bug_id=0000188  for some minor additional notes on symptoms.

By: skandranon (skandranon) 2003-09-09 17:07:57

This is a fairly serious bug for us.  Is there any progress since a month ago that we can test?

I've looked at the code, but nothing jumps out at me.  That's not shocking since my looking was perhaps 10 minutes.

By: oliver (oliver) 2003-09-09 18:32:50

Hey skandranon,

Same here.  After it caused one of our production-systems to grind to a halt after one of it's SIP-peers was unresponsive for a while, I decided to remove all SIP-registrations from the config, but that's not an ideal solution, ofcourse.

A few weeks ago, Mark told me on IRC that he could reproduce this problem, so it's either very hard to track down or he didn't get around to taking a good look at it yet.

Anyways, I'm willing to help with debugging in any way I can.

By: John Todd (jtodd) 2003-09-11 13:55:12

another duplicate bug for this problem closed out.  For details, see:  http://bugs.digium.com/bug_view_page.php?bug_id=0000241

By: John Todd (jtodd) 2003-09-15 14:07:19

From: Jayson Vantuyl <kagato@souja.net>
To: asterisk-dev@lists.digium.com
Subject: Re: [Asterisk-Dev] clearing 'stuck' channels
Reply-To: asterisk-dev@lists.digium.com
Date: Sun, 14 Sep 2003 19:38:11 -0500

On Thu, Aug 28, 2003 at 01:59:47PM -0700, John Todd wrote:
> Did you find any solutions to this?  I've recently discovered the
> same problem with my server here.  I had 499 "sip show channels"
> entries, and then Asterisk refused to start up any new sessions due
> to "Too many files open".   In fact, 'reload' worked but didn't work,
> due to the inability of the system to open any of the config files.
> I had to shut Asterisk down to get things working again.  My CVS
> version was four days old.
You may be able to temporarily avoid this problem (i.e. thus allowing
you to plan an Asterisk restart every midnight) by cranking up your
max-files in the kernel.  Echo a larger value into
/proc/sys/fs/file-max.

Jayson

By: skandranon (skandranon) 2003-09-23 19:30:17

Here's a sniplet of ls -l of the /proc thing:

lrwx------    1 root     root           64 Sep 23 16:58 990 -> socket:[4913837]
lrwx------    1 root     root           64 Sep 23 16:58 991 -> socket:[4913838]
lrwx------    1 root     root           64 Sep 23 16:58 992 -> socket:[4913840]
lrwx------    1 root     root           64 Sep 23 16:58 993 -> socket:[4913841]
lrwx------    1 root     root           64 Sep 23 16:58 994 -> socket:[4913843]
lrwx------    1 root     root           64 Sep 23 16:58 995 -> socket:[4913844]
lrwx------    1 root     root           64 Sep 23 16:58 996 -> socket:[4913864]
lrwx------    1 root     root           64 Sep 23 16:58 997 -> socket:[4913865]
lrwx------    1 root     root           64 Sep 23 16:58 998 -> socket:[4913867]
lrwx------    1 root     root           64 Sep 23 16:58 999 -> socket:[4913868]

Nearly all 2048 FDs were sockets.

I suspect an error condition in a SIP error path.

--Michael

By: Mark Spencer (markster) 2003-09-30 23:55:03

I believe I have fixes for both problems in CVS.  Please try and let me know.

By: John Todd (jtodd) 2003-10-01 04:21:05

seems to work for me - I no longer see sockets piling up due to SIP registration timeouts.  I artificially induced bad registrations by giving both bogus (no answer) and invalid (ICMP invalid port) answers, and the system now seems to do the right thing.

By: oliver (oliver) 2003-10-01 17:42:24

Same over here.  Seems the fixes did it.

By: Brian West (bkw918) 2003-10-01 18:12:51

nope sure doesn't.. signup for an FWD account.. mine fills right up.

By: Brian West (bkw918) 2003-10-01 18:13:22

asterisk*CLI> sip show channels
Peer             User/ANR    Call ID      Seq (Tx/Rx)  Lag      Jitter  Format
192.246.69.223   (None)      22C9719A-6D  00101/31104  00000ms  0000ms  UNKN
192.246.69.223   (None)      32E81BAD-F4  00101/61778  00000ms  0000ms  UNKN
192.246.69.223   (None)      333880C0-F4  00101/38681  00000ms  0000ms  UNKN
192.246.69.223   (None)      FF3E76F9-F4  00101/40697  00000ms  0000ms  UNKN
192.246.69.223   (None)      250C0BA0-F4  00101/06944  00000ms  0000ms  UNKN
5 active SIP channel(s)
asterisk*CLI> show version
Asterisk CVS-10/01/03-12:47:32 built by root@asterisk on a i686 running Linux

By: Brian West (bkw918) 2003-10-01 18:15:25

going to cvs and try again.. seems alot of updates.

By: John Todd (jtodd) 2003-10-01 21:09:58

Well, I don't know why your system is acting differently than mine, but I have several bogus SIP register statements to non-existant IP addresses (no answer) or to IP addresses that give back bad replies.  It still appears that this patch is working correctly.  Also: I have no problems registering with FWD at the IP addres you reference.  Perhaps this is some different type of problem?  There are intervals where I see something in "sip show channels" but that is only during an "active" registration attempt.  In those instances, I see my FWD username in the User/ANR column, but your example shows "(None)" - is that a clue to perhaps some different problem?

ms1*CLI> sip show channels
Peer             User/ANR    Call ID      Seq (Tx/Rx)  Lag      Jitter  Format
0 active SIP channel(s)
ms1*CLI> show version
Asterisk CVS-10/02/03-01:41:50 built by root@somewhere.com on a i686 running Linux
ms1*CLI>

By: Brian West (bkw918) 2003-10-01 22:28:38

I answer the 55555 line for FWD sometimes.. and today I cvsed about noon and recompiled so we shall see how it goes .. I will let it go for the next few hours and see how it does.

By: Brian West (bkw918) 2003-10-02 10:23:03

Ok seems to be gone now.... :P

By: John Todd (jtodd) 2003-10-02 14:27:03

We have a quorum.  Ticket resolved.