Summary: | ASTERISK-14217: [patch] asterisk lock in sipsock_read for several seconds and drop sip packets | ||
Reporter: | Stefan Schmidt (schmidts) | Labels: | |
Date Opened: | 2009-05-28 09:47:36 | Date Closed: | 2009-08-10 09:29:51 |
Priority: | Blocker | Regression? | No |
Status: | Closed/Complete | Components: | Channels/chan_sip/General |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) sip_inf_loop.patch | |
Description: | i have around 1600 sip peers registered on this server with 2 sip trunks to other asterisk server and after upgrading to 1.4 from 1.2 i had this problem that the asterisk freeze for several seconds and drop all sip packets. with netstat -su i see that the amount of packet receive errors increase by 800 pps while asterisk locks. normal callflow actions like mysql and so on runs without any problem and also active calls are not disconnected but new calls cant be opened. i ve compiled with debug thread and dont optimize and see with core show lock these lock helds: ======================================================================= === Currently Held Locks ============================================== ======================================================================= === === <file> <line num> <function> <lock name> <lock addr> (times locked) === === Thread ID: 1092675920 (do_monitor started at [16595] chan_sip.c restart_monitor()) === ---> Lock #0 (chan_sip.c): MUTEX 16264 sipsock_read &netlock 0x7f6d0e9d0700 (1) === ------------------------------------------------------------------- === ======================================================================= and when it run again it show this: ======================================================================= === Currently Held Locks ============================================== ======================================================================= === === <file> <line num> <function> <lock name> <lock addr> (times locked) === === Thread ID: 1092675920 (do_monitor started at [16595] chan_sip.c restart_monitor()) === ---> Lock #0 (chan_sip.c): MUTEX 16264 sipsock_read &netlock 0x7f6d0e9d0700 (1) === ---> Lock #1 (chan_sip.c): MUTEX 4728 find_call &p->lock 0x374ffd0 (1) === ---> Lock #2 (sched.c): MUTEX 220 ast_sched_add_variable &con->lock 0x1b2eee0 (1) === ------------------------------------------------------------------- === ======================================================================= the system is a HP proliant DL 380 G5 2xXeon 2,3 Ghz with 6 GB Ram OS: Debian Lenny v. 5.0.1 AMD 64 Bit Kernel 2.6.26-1 Asterisk version: 1.4.25 zaptel 1.4.12.1 is only need for meetme cause there is no isdn hardware in this server. libpri 1.4.10 i´ve allready tried to increase the linux udp buffer but without sucess. | ||
Comments: | By: David Brillert (aragon) 2009-05-29 10:13:39 I'm also seeing this issue in 1.4.25 === Thread ID: 3083189136 (do_monitor started at [16743] chan_sip.c restart_monitor()) === ---> Lock #0 (chan_sip.c): MUTEX 16412 sipsock_read &netlock 0xe2a780 (1) === ---> Lock #1 (chan_sip.c): MUTEX 4730 find_call &p->lock 0xb4828f30 (1) === ------------------------------------------------------------------- By: Joel Vandal (jvandal) 2009-05-29 10:14:35 Hi, I have the same issues when doing load test using 'pbx-test' with about 50 SIP extensions that do ~50 calls per seconds. Have this problem only with latest 1.4 SVN but only since I updated yesterday so maybe something changed recently, maybe related to r197588. Hmm, just curious, how to see a specific change ? ex. svn diff -r 197588 ? By: Joel Vandal (jvandal) 2009-05-29 10:28:14 If I check on my server, the working revision for is r197562 but fail with r197588 -ASTERISK_FILE_VERSION(__FILE__, "$Revision: 197562M $") +ASTERISK_FILE_VERSION(__FILE__, "$Revision: 197588M $") Maybe related to the yesterday changes on chan_sip and rtp for alternate source address ? By: Joel Vandal (jvandal) 2009-05-31 07:43:22 Related to 15143 ? locks look similar. By: Stefan Schmidt (schmidts) 2009-05-31 08:15:50 maybe but i dont have to kill the asterisk it just handle no sip packets for several seconds (10 to 30). By: Joshua C. Colp (jcolp) 2009-06-01 08:54:00 What is the configuration of the SIP peers? are you using qualify on them? Do you have any console output at the time the issue is happening? By: Stefan Schmidt (schmidts) 2009-06-01 13:37:19 sip peers are no realtime and qualify is just yes. console output is mostly just nothing, sometimes when a call is running an dialplan issue this runs normal through the console. mostly when this happens a huge amout of peers gets unreachable, but thats no suprise cause there is no sip handling at this time. something ive just found today is the following message shortly before this happens: Jun 1 19:51:54 multi-01 asterisk[28814]: WARNING[28834]: chan_sip.c:7155 in check_via: '[91.113.139.202]' is not a valid host this message comes several times a day, and mostly before the freeze is. if tried to check with sipsak -vvvv -s to the this address and on of the 16 clients which is behind this ip, and i just get back an received ICMP message: Ignoring (ICMP Data is not a UDP Packet) maybe this is important? By: David Brillert (aragon) 2009-06-17 23:27:56 Possibly related to this revision? I only began seeing this issue in 1.4.25, it did not occur in 1.4.24.1 2009-05-28 15:27 +0000 [r197588] Mark Michelson <mmichelson@digium.com> * main/rtp.c, channels/chan_sip.c, include/asterisk/rtp.h: Allow for media to arrive from an alternate source when responding to a reinvite with 491. When we receive a SIP reinvite, it is possible that we may not be able to process the reinvite immediately since we have also sent a reinvite out ourselves. The problem is that whoever sent us the reinvite may have also sent a reinvite out to another party, and that reinvite may have succeeded. As a result, even though we are not going to accept the reinvite we just received, it is important for us to not have problems if we suddenly start receiving RTP from a new source. The fix for this is to grab the media source information from the SDP of the reinvite that we receive. This information is passed to the RTP layer so that it will know about the alternate source for media. Review: https://reviewboard.asterisk.org/r/252 [^] I must concur with jvandal on his note https://issues.asterisk.org/view.php?id=15213#105717 jvandal (reporter) 2009-05-29 10:28 If I check on my server, the working revision for is r197562 but fail with r197588 -ASTERISK_FILE_VERSION(__FILE__, "$Revision: 197562M $") +ASTERISK_FILE_VERSION(__FILE__, "$Revision: 197588M $") By: Stefan Schmidt (schmidts) 2009-06-18 02:13:00 We have reinvite=no so i dont think it depends on a reinvite. What i´ve seen is that we also had this problem with check_via after we downgrade to version 1.2.31 again and also had one freeze like we had with 1.4. After we had blocked the ip with iptables there had been no hanging calls since 2 weeks and also no freezes with version 1.2. We will try an upgrade to 1.4 again in some time (after our customer calms down a little bit ;) so i have no chance to test it by now. By: Amilcar S Silvestre (amilcar) 2009-06-18 08:25:08 schmidts, what exactly did you block? Did you just block the IP (91.113.139.202) that chan_sip has warned to be invalid host, and this possible solved the problem? By: Stefan Schmidt (schmidts) 2009-06-18 08:33:49 yes, in first step we blocked this ip. after the customer had checked his firewall we reopen it, and after this (2 weeks ago) not even one check_via error appears. We had tried to rebuild this scenario with sipsak and Options package but couldn't see what could cause this check_via warning. By: Mark Michelson (mmichelson) 2009-06-18 12:50:18 Discussion on issue 15345 leads me to believe that the problem is due to an infinite loop under certain circumstances. Please try applying sip_inf_loop.patch and see if you still experience the problem. Thanks! By: Mark Michelson (mmichelson) 2009-06-22 09:28:43 Further discussion on issue ASTERISK-14328 makes me believe the patch fixes the problem. I am going to close this issue when I commit the patch. Should you find that the problem is not fixed for you, please feel free to reopen. By: Digium Subversion (svnbot) 2009-06-22 09:34:06 Repository: asterisk Revision: 202336 U branches/1.4/channels/chan_sip.c ------------------------------------------------------------------------ r202336 | mmichelson | 2009-06-22 09:34:05 -0500 (Mon, 22 Jun 2009) | 25 lines Fix a possible infinite loop in SDP parsing during glare situation. There was a while loop in get_ip_and_port_from_sdp which was controlled by a call to get_sdp_iterate. The loop would exit either if what we were searching for was found or if the return was NULL. The problem is that get_sdp_iterate never returns NULL. This means that if what we were searching for was not present, the loop would run infinitely. This modification of the loop fixes the problem. (closes issue ASTERISK-14217) Reported by: schmidts (closes issue ASTERISK-14332) Reported by: samy (closes issue ASTERISK-13569) Reported by: pj (closes issue ASTERISK-14328) Reported by: aragon Patches: sip_inf_loop.patch uploaded by mmichelson (license 60) Tested by: aragon ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=202336 By: Digium Subversion (svnbot) 2009-06-22 09:35:10 Repository: asterisk Revision: 202337 _U trunk/ U trunk/channels/chan_sip.c ------------------------------------------------------------------------ r202337 | mmichelson | 2009-06-22 09:35:10 -0500 (Mon, 22 Jun 2009) | 31 lines Merged revisions 202336 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r202336 | mmichelson | 2009-06-22 09:34:05 -0500 (Mon, 22 Jun 2009) | 25 lines Fix a possible infinite loop in SDP parsing during glare situation. There was a while loop in get_ip_and_port_from_sdp which was controlled by a call to get_sdp_iterate. The loop would exit either if what we were searching for was found or if the return was NULL. The problem is that get_sdp_iterate never returns NULL. This means that if what we were searching for was not present, the loop would run infinitely. This modification of the loop fixes the problem. (closes issue ASTERISK-14217) Reported by: schmidts (closes issue ASTERISK-14332) Reported by: samy (closes issue ASTERISK-13569) Reported by: pj (closes issue ASTERISK-14328) Reported by: aragon Patches: sip_inf_loop.patch uploaded by mmichelson (license 60) Tested by: aragon ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=202337 By: Digium Subversion (svnbot) 2009-06-22 09:35:36 Repository: asterisk Revision: 202338 _U branches/1.6.0/ U branches/1.6.0/channels/chan_sip.c ------------------------------------------------------------------------ r202338 | mmichelson | 2009-06-22 09:35:35 -0500 (Mon, 22 Jun 2009) | 38 lines Merged revisions 202337 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ................ r202337 | mmichelson | 2009-06-22 09:35:09 -0500 (Mon, 22 Jun 2009) | 31 lines Merged revisions 202336 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r202336 | mmichelson | 2009-06-22 09:34:05 -0500 (Mon, 22 Jun 2009) | 25 lines Fix a possible infinite loop in SDP parsing during glare situation. There was a while loop in get_ip_and_port_from_sdp which was controlled by a call to get_sdp_iterate. The loop would exit either if what we were searching for was found or if the return was NULL. The problem is that get_sdp_iterate never returns NULL. This means that if what we were searching for was not present, the loop would run infinitely. This modification of the loop fixes the problem. (closes issue ASTERISK-14217) Reported by: schmidts (closes issue ASTERISK-14332) Reported by: samy (closes issue ASTERISK-13569) Reported by: pj (closes issue ASTERISK-14328) Reported by: aragon Patches: sip_inf_loop.patch uploaded by mmichelson (license 60) Tested by: aragon ........ ................ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=202338 By: Digium Subversion (svnbot) 2009-06-22 09:36:00 Repository: asterisk Revision: 202339 _U branches/1.6.1/ U branches/1.6.1/channels/chan_sip.c ------------------------------------------------------------------------ r202339 | mmichelson | 2009-06-22 09:36:00 -0500 (Mon, 22 Jun 2009) | 38 lines Merged revisions 202337 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ................ r202337 | mmichelson | 2009-06-22 09:35:09 -0500 (Mon, 22 Jun 2009) | 31 lines Merged revisions 202336 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r202336 | mmichelson | 2009-06-22 09:34:05 -0500 (Mon, 22 Jun 2009) | 25 lines Fix a possible infinite loop in SDP parsing during glare situation. There was a while loop in get_ip_and_port_from_sdp which was controlled by a call to get_sdp_iterate. The loop would exit either if what we were searching for was found or if the return was NULL. The problem is that get_sdp_iterate never returns NULL. This means that if what we were searching for was not present, the loop would run infinitely. This modification of the loop fixes the problem. (closes issue ASTERISK-14217) Reported by: schmidts (closes issue ASTERISK-14332) Reported by: samy (closes issue ASTERISK-13569) Reported by: pj (closes issue ASTERISK-14328) Reported by: aragon Patches: sip_inf_loop.patch uploaded by mmichelson (license 60) Tested by: aragon ........ ................ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=202339 By: Digium Subversion (svnbot) 2009-06-22 09:36:27 Repository: asterisk Revision: 202340 _U branches/1.6.2/ U branches/1.6.2/channels/chan_sip.c ------------------------------------------------------------------------ r202340 | mmichelson | 2009-06-22 09:36:26 -0500 (Mon, 22 Jun 2009) | 38 lines Merged revisions 202337 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ................ r202337 | mmichelson | 2009-06-22 09:35:09 -0500 (Mon, 22 Jun 2009) | 31 lines Merged revisions 202336 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r202336 | mmichelson | 2009-06-22 09:34:05 -0500 (Mon, 22 Jun 2009) | 25 lines Fix a possible infinite loop in SDP parsing during glare situation. There was a while loop in get_ip_and_port_from_sdp which was controlled by a call to get_sdp_iterate. The loop would exit either if what we were searching for was found or if the return was NULL. The problem is that get_sdp_iterate never returns NULL. This means that if what we were searching for was not present, the loop would run infinitely. This modification of the loop fixes the problem. (closes issue ASTERISK-14217) Reported by: schmidts (closes issue ASTERISK-14332) Reported by: samy (closes issue ASTERISK-13569) Reported by: pj (closes issue ASTERISK-14328) Reported by: aragon Patches: sip_inf_loop.patch uploaded by mmichelson (license 60) Tested by: aragon ........ ................ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=202340 |