[Home]

Summary:ASTERISK-14217: [patch] asterisk lock in sipsock_read for several seconds and drop sip packets
Reporter:Stefan Schmidt (schmidts)Labels:
Date Opened:2009-05-28 09:47:36Date Closed:2009-08-10 09:29:51
Priority:BlockerRegression?No
Status:Closed/CompleteComponents:Channels/chan_sip/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) sip_inf_loop.patch
Description:i have around 1600 sip peers registered on this server with 2 sip trunks to other asterisk server and after upgrading to 1.4 from 1.2 i had this problem that the asterisk freeze for several seconds and drop all sip packets.

with netstat -su i see that the amount of packet receive errors increase by 800 pps while asterisk locks.

normal callflow actions like mysql and so on runs without any problem and also active calls are not disconnected but new calls cant be opened.

i ve compiled with debug thread and dont optimize and see with core show lock these lock helds:

=======================================================================
=== Currently Held Locks ==============================================
=======================================================================
===
=== <file> <line num> <function> <lock name> <lock addr> (times locked)
===
=== Thread ID: 1092675920 (do_monitor           started at [16595] chan_sip.c restart_monitor())
=== ---> Lock #0 (chan_sip.c): MUTEX 16264 sipsock_read &netlock 0x7f6d0e9d0700 (1)
=== -------------------------------------------------------------------
===
=======================================================================

and when it run again it show this:

=======================================================================
=== Currently Held Locks ==============================================
=======================================================================
===
=== <file> <line num> <function> <lock name> <lock addr> (times locked)
===
=== Thread ID: 1092675920 (do_monitor           started at [16595] chan_sip.c restart_monitor())
=== ---> Lock #0 (chan_sip.c): MUTEX 16264 sipsock_read &netlock 0x7f6d0e9d0700 (1)
=== ---> Lock #1 (chan_sip.c): MUTEX 4728 find_call &p->lock 0x374ffd0 (1)
=== ---> Lock #2 (sched.c): MUTEX 220 ast_sched_add_variable &con->lock 0x1b2eee0 (1)
=== -------------------------------------------------------------------
===
=======================================================================

the system is a HP proliant DL 380 G5 2xXeon 2,3 Ghz with 6 GB Ram
OS: Debian Lenny v. 5.0.1 AMD 64 Bit Kernel 2.6.26-1
Asterisk version: 1.4.25
zaptel 1.4.12.1 is only need for meetme cause there is no isdn hardware in this server.
libpri 1.4.10

i´ve allready tried to increase the linux udp buffer but without sucess.
Comments:By: David Brillert (aragon) 2009-05-29 10:13:39

I'm also seeing this issue in 1.4.25

=== Thread ID: 3083189136 (do_monitor started at [16743] chan_sip.c restart_monitor())
=== ---> Lock #0 (chan_sip.c): MUTEX 16412 sipsock_read &netlock 0xe2a780 (1)
=== ---> Lock #1 (chan_sip.c): MUTEX 4730 find_call &p->lock 0xb4828f30 (1)
=== -------------------------------------------------------------------

By: Joel Vandal (jvandal) 2009-05-29 10:14:35

Hi,

I have the same issues when doing load test using 'pbx-test' with about 50 SIP extensions that do ~50 calls per seconds.

Have this problem only with latest 1.4 SVN but only since I updated yesterday so maybe something changed recently, maybe related to r197588.

Hmm, just curious, how to see a specific change ? ex. svn diff -r 197588 ?

By: Joel Vandal (jvandal) 2009-05-29 10:28:14

If I check on my server, the working revision for is r197562 but fail with r197588

-ASTERISK_FILE_VERSION(__FILE__, "$Revision: 197562M $")
+ASTERISK_FILE_VERSION(__FILE__, "$Revision: 197588M $")

Maybe related to the yesterday changes on chan_sip and rtp for alternate source address ?

By: Joel Vandal (jvandal) 2009-05-31 07:43:22

Related to 15143 ? locks look similar.

By: Stefan Schmidt (schmidts) 2009-05-31 08:15:50

maybe but i dont have to kill the asterisk it just handle no sip packets for several seconds (10 to 30).

By: Joshua C. Colp (jcolp) 2009-06-01 08:54:00

What is the configuration of the SIP peers? are you using qualify on them? Do you have any console output at the time the issue is happening?

By: Stefan Schmidt (schmidts) 2009-06-01 13:37:19

sip peers are no realtime and qualify is just yes.

console output is mostly just nothing, sometimes when a call is running an dialplan issue this runs normal through the console.

mostly when this happens a huge amout of peers gets unreachable, but thats no suprise cause there is no sip handling at this time.

something ive just found today is the following message shortly before this happens:
Jun  1 19:51:54 multi-01 asterisk[28814]: WARNING[28834]: chan_sip.c:7155 in check_via: '[91.113.139.202]' is not a valid host

this message comes several times a day, and mostly before the freeze is. if tried to check with sipsak -vvvv -s to the this address and on of the 16 clients which is behind this ip, and i just get back an
received ICMP message: Ignoring (ICMP Data is not a UDP Packet)

maybe this is important?

By: David Brillert (aragon) 2009-06-17 23:27:56

Possibly related to this revision?
I only began seeing this issue in 1.4.25, it did not occur in 1.4.24.1

2009-05-28 15:27 +0000 [r197588] Mark Michelson <mmichelson@digium.com>

   * main/rtp.c, channels/chan_sip.c, include/asterisk/rtp.h: Allow
     for media to arrive from an alternate source when responding to a
     reinvite with 491. When we receive a SIP reinvite, it is possible
     that we may not be able to process the reinvite immediately since
     we have also sent a reinvite out ourselves. The problem is that
     whoever sent us the reinvite may have also sent a reinvite out to
     another party, and that reinvite may have succeeded. As a result,
     even though we are not going to accept the reinvite we just
     received, it is important for us to not have problems if we
     suddenly start receiving RTP from a new source. The fix for this
     is to grab the media source information from the SDP of the
     reinvite that we receive. This information is passed to the RTP
     layer so that it will know about the alternate source for media.
     Review: https://reviewboard.asterisk.org/r/252 [^]

I must concur with jvandal on his note
https://issues.asterisk.org/view.php?id=15213#105717
jvandal (reporter)
2009-05-29 10:28

If I check on my server, the working revision for is r197562 but fail with r197588

-ASTERISK_FILE_VERSION(__FILE__, "$Revision: 197562M $")
+ASTERISK_FILE_VERSION(__FILE__, "$Revision: 197588M $")



By: Stefan Schmidt (schmidts) 2009-06-18 02:13:00

We have reinvite=no so i dont think it depends on a reinvite.
What i´ve seen is that we also had this problem with check_via after we downgrade to version 1.2.31 again and also had one freeze like we had with 1.4.

After we had blocked the ip with iptables there had been no hanging calls since 2 weeks and also no freezes with version 1.2.

We will try an upgrade to 1.4 again in some time (after our customer calms down a little bit ;) so i have no chance to test it by now.

By: Amilcar S Silvestre (amilcar) 2009-06-18 08:25:08

schmidts, what exactly did you block? Did you just block the IP (91.113.139.202) that chan_sip has warned to be invalid host, and this possible solved the problem?

By: Stefan Schmidt (schmidts) 2009-06-18 08:33:49

yes, in first step we blocked this ip.

after the customer had checked his firewall we reopen it, and after this (2 weeks ago) not even one check_via error appears.

We had tried to rebuild this scenario with sipsak and Options package but couldn't see what could cause this check_via warning.

By: Mark Michelson (mmichelson) 2009-06-18 12:50:18

Discussion on issue 15345 leads me to believe that the problem is due to an infinite loop under certain circumstances. Please try applying sip_inf_loop.patch and see if you still experience the problem. Thanks!

By: Mark Michelson (mmichelson) 2009-06-22 09:28:43

Further discussion on issue ASTERISK-14328 makes me believe the patch fixes the problem. I am going to close this issue when I commit the patch. Should you find that the problem is not fixed for you, please feel free to reopen.

By: Digium Subversion (svnbot) 2009-06-22 09:34:06

Repository: asterisk
Revision: 202336

U   branches/1.4/channels/chan_sip.c

------------------------------------------------------------------------
r202336 | mmichelson | 2009-06-22 09:34:05 -0500 (Mon, 22 Jun 2009) | 25 lines

Fix a possible infinite loop in SDP parsing during glare situation.

There was a while loop in get_ip_and_port_from_sdp which was controlled
by a call to get_sdp_iterate. The loop would exit either if what we were
searching for was found or if the return was NULL. The problem is that
get_sdp_iterate never returns NULL. This means that if what we were searching
for was not present, the loop would run infinitely. This modification of the
loop fixes the problem.

(closes issue ASTERISK-14217)
Reported by: schmidts

(closes issue ASTERISK-14332)
Reported by: samy

(closes issue ASTERISK-13569)
Reported by: pj

(closes issue ASTERISK-14328)
Reported by: aragon
Patches:
     sip_inf_loop.patch uploaded by mmichelson (license 60)
Tested by: aragon


------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=202336

By: Digium Subversion (svnbot) 2009-06-22 09:35:10

Repository: asterisk
Revision: 202337

_U  trunk/
U   trunk/channels/chan_sip.c

------------------------------------------------------------------------
r202337 | mmichelson | 2009-06-22 09:35:10 -0500 (Mon, 22 Jun 2009) | 31 lines

Merged revisions 202336 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
 r202336 | mmichelson | 2009-06-22 09:34:05 -0500 (Mon, 22 Jun 2009) | 25 lines
 
 Fix a possible infinite loop in SDP parsing during glare situation.
 
 There was a while loop in get_ip_and_port_from_sdp which was controlled
 by a call to get_sdp_iterate. The loop would exit either if what we were
 searching for was found or if the return was NULL. The problem is that
 get_sdp_iterate never returns NULL. This means that if what we were searching
 for was not present, the loop would run infinitely. This modification of the
 loop fixes the problem.
 
 (closes issue ASTERISK-14217)
 Reported by: schmidts
 
 (closes issue ASTERISK-14332)
 Reported by: samy
 
 (closes issue ASTERISK-13569)
 Reported by: pj
 
 (closes issue ASTERISK-14328)
 Reported by: aragon
 Patches:
       sip_inf_loop.patch uploaded by mmichelson (license 60)
 Tested by: aragon
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=202337

By: Digium Subversion (svnbot) 2009-06-22 09:35:36

Repository: asterisk
Revision: 202338

_U  branches/1.6.0/
U   branches/1.6.0/channels/chan_sip.c

------------------------------------------------------------------------
r202338 | mmichelson | 2009-06-22 09:35:35 -0500 (Mon, 22 Jun 2009) | 38 lines

Merged revisions 202337 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
 r202337 | mmichelson | 2009-06-22 09:35:09 -0500 (Mon, 22 Jun 2009) | 31 lines
 
 Merged revisions 202336 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
   r202336 | mmichelson | 2009-06-22 09:34:05 -0500 (Mon, 22 Jun 2009) | 25 lines
   
   Fix a possible infinite loop in SDP parsing during glare situation.
   
   There was a while loop in get_ip_and_port_from_sdp which was controlled
   by a call to get_sdp_iterate. The loop would exit either if what we were
   searching for was found or if the return was NULL. The problem is that
   get_sdp_iterate never returns NULL. This means that if what we were searching
   for was not present, the loop would run infinitely. This modification of the
   loop fixes the problem.
   
   (closes issue ASTERISK-14217)
   Reported by: schmidts
   
   (closes issue ASTERISK-14332)
   Reported by: samy
   
   (closes issue ASTERISK-13569)
   Reported by: pj
   
   (closes issue ASTERISK-14328)
   Reported by: aragon
   Patches:
         sip_inf_loop.patch uploaded by mmichelson (license 60)
   Tested by: aragon
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=202338

By: Digium Subversion (svnbot) 2009-06-22 09:36:00

Repository: asterisk
Revision: 202339

_U  branches/1.6.1/
U   branches/1.6.1/channels/chan_sip.c

------------------------------------------------------------------------
r202339 | mmichelson | 2009-06-22 09:36:00 -0500 (Mon, 22 Jun 2009) | 38 lines

Merged revisions 202337 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
 r202337 | mmichelson | 2009-06-22 09:35:09 -0500 (Mon, 22 Jun 2009) | 31 lines
 
 Merged revisions 202336 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
   r202336 | mmichelson | 2009-06-22 09:34:05 -0500 (Mon, 22 Jun 2009) | 25 lines
   
   Fix a possible infinite loop in SDP parsing during glare situation.
   
   There was a while loop in get_ip_and_port_from_sdp which was controlled
   by a call to get_sdp_iterate. The loop would exit either if what we were
   searching for was found or if the return was NULL. The problem is that
   get_sdp_iterate never returns NULL. This means that if what we were searching
   for was not present, the loop would run infinitely. This modification of the
   loop fixes the problem.
   
   (closes issue ASTERISK-14217)
   Reported by: schmidts
   
   (closes issue ASTERISK-14332)
   Reported by: samy
   
   (closes issue ASTERISK-13569)
   Reported by: pj
   
   (closes issue ASTERISK-14328)
   Reported by: aragon
   Patches:
         sip_inf_loop.patch uploaded by mmichelson (license 60)
   Tested by: aragon
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=202339

By: Digium Subversion (svnbot) 2009-06-22 09:36:27

Repository: asterisk
Revision: 202340

_U  branches/1.6.2/
U   branches/1.6.2/channels/chan_sip.c

------------------------------------------------------------------------
r202340 | mmichelson | 2009-06-22 09:36:26 -0500 (Mon, 22 Jun 2009) | 38 lines

Merged revisions 202337 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
 r202337 | mmichelson | 2009-06-22 09:35:09 -0500 (Mon, 22 Jun 2009) | 31 lines
 
 Merged revisions 202336 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
   r202336 | mmichelson | 2009-06-22 09:34:05 -0500 (Mon, 22 Jun 2009) | 25 lines
   
   Fix a possible infinite loop in SDP parsing during glare situation.
   
   There was a while loop in get_ip_and_port_from_sdp which was controlled
   by a call to get_sdp_iterate. The loop would exit either if what we were
   searching for was found or if the return was NULL. The problem is that
   get_sdp_iterate never returns NULL. This means that if what we were searching
   for was not present, the loop would run infinitely. This modification of the
   loop fixes the problem.
   
   (closes issue ASTERISK-14217)
   Reported by: schmidts
   
   (closes issue ASTERISK-14332)
   Reported by: samy
   
   (closes issue ASTERISK-13569)
   Reported by: pj
   
   (closes issue ASTERISK-14328)
   Reported by: aragon
   Patches:
         sip_inf_loop.patch uploaded by mmichelson (license 60)
   Tested by: aragon
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=202340