[Home]

Summary:ASTERISK-14328: [patch] SIP deadlock in 1.4 revision 199472
Reporter:David Brillert (aragon)Labels:
Date Opened:2009-06-17 13:11:39Date Closed:2009-08-10 09:30:54
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Channels/chan_sip/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 06172009deadlock.txt
( 1) lotsofsiplockswithsipdebugenabled.zip
( 2) sip_inf_loop.patch
Description:After some brief time SIP will lock and no calls will process.

****** ADDITIONAL INFORMATION ******

core show locks and thread apply all bt attached
Comments:By: David Brillert (aragon) 2009-06-17 13:12:40

core show locks
lab*CLI>
=======================================================================
=== Currently Held Locks ==============================================
=======================================================================
===
=== <file> <line num> <function> <lock name> <lock addr> (times locked)
===
=== Thread ID: 3080715152 (do_monitor           started at [16743] chan_sip.c restart_monitor())
=== ---> Lock #0 (chan_sip.c): MUTEX 16412 sipsock_read &netlock 0x6cf780 (1)
=== ---> Lock #1 (chan_sip.c): MUTEX 4730 find_call &p->lock 0xb6eefa50 (1)

By: David Brillert (aragon) 2009-06-17 13:27:00

Looks like this older bug report
https://issues.asterisk.org/view.php?id=15213

By: David Brillert (aragon) 2009-06-17 21:01:59

This also looks like
https://issues.asterisk.org/view.php?id=14464

By: David Brillert (aragon) 2009-06-17 23:23:06

Possibly related to this revision?
I only began seeing this issue in 1.4.25, it did not occur in 1.4.24.1

2009-05-28 15:27 +0000 [r197588]  Mark Michelson <mmichelson@digium.com>

* main/rtp.c, channels/chan_sip.c, include/asterisk/rtp.h: Allow
 for media to arrive from an alternate source when responding to a
 reinvite with 491. When we receive a SIP reinvite, it is possible
 that we may not be able to process the reinvite immediately since
 we have also sent a reinvite out ourselves. The problem is that
 whoever sent us the reinvite may have also sent a reinvite out to
 another party, and that reinvite may have succeeded. As a result,
 even though we are not going to accept the reinvite we just
 received, it is important for us to not have problems if we
 suddenly start receiving RTP from a new source. The fix for this
 is to grab the media source information from the SDP of the
 reinvite that we receive. This information is passed to the RTP
 layer so that it will know about the alternate source for media.
 Review: https://reviewboard.asterisk.org/r/252



By: David Brillert (aragon) 2009-06-17 23:28:52

I must concur with jvandal on his note
https://issues.asterisk.org/view.php?id=15213#105717
jvandal (reporter)
2009-05-29 10:28

If I check on my server, the working revision for is r197562 but fail with r197588

-ASTERISK_FILE_VERSION(__FILE__, "$Revision: 197562M $")
+ASTERISK_FILE_VERSION(__FILE__, "$Revision: 197588M $")
jvandal (reporter)
2009-05-29 10:28

By: David Brillert (aragon) 2009-06-18 09:50:10

I find this very easy to reproduce in my lab.
I'm able to pass a lot of calls through 4 PRI interfaces in a non production environment...
The basis of my tests is to load 4 PRI interfaces and pass multiple calls to an ACD queue to logged agents. As a result some calls are answered and some are held with MOH. A percentage of ACD calls answered by agents are transferred to other extensions using Asterisk blind transfer.

I uploaded CLI output with sip debug enabled and each time I ran into the lock I issued the core show locks command.  At least one of the outputs of core show locks appears to show the same output as my first capture.  The main difference being that in this CLI trace session I did not have to restart Asterisk to recover from the lock.  Therefore I did not include output from gdb thread apply all bt in this attachment.  This is a pretty big text file but it only spans about a 10 minute test period.  I ran into at least 7 deadlocks in that time span.

My previous attachment 06172009deadlock.txt includes gdb thread apply all bt and core show locks output.

I do not use re-invites in my configuration.

By: Mark Michelson (mmichelson) 2009-06-18 12:42:17

06172009deadlock.txt shows that the sip monitor thread is currently executing the sscanf function in get_ip_and_port_from_sdp.

I wonder if perhaps the while loop is not terminating for some reason... I'll investigate further.

By: Mark Michelson (mmichelson) 2009-06-18 12:44:53

Yes, I think that is the problem. I have an idea for a patch and I will post it here as well as the other related bug reports as soon as I can.

By: Mark Michelson (mmichelson) 2009-06-18 12:48:56

Try sip_inf_loop.patch and see if you still experience the same problem. Thanks for the good debug info!

By: David Brillert (aragon) 2009-06-18 14:17:58

I have no problem reproducing the lock in my lab so I should be able to give test results quickly.
But I must wait for jvandal to produce an rpm with this patch so I can test...
Thanks for getting back to me so quickly, this bug has been driving me nuts.

By: David Brillert (aragon) 2009-06-18 16:25:16

I have this all labbed up and will test overnight and then again in the morning for locks.

By: David Brillert (aragon) 2009-06-19 08:06:24

I ran a pretty intense test on this overnight (about 14 hours) which would normally result in a deadlock when I came in to review status.
I also scripted a reload command every 5 minutes.
For good measure this morning I did everything in my power to confuse Asterisk by restarting the service with 73 active calls on the PRI's multiple times.

I could not reproduce a lock with the patch installed.

By: Digium Subversion (svnbot) 2009-06-22 09:34:14

Repository: asterisk
Revision: 202336

U   branches/1.4/channels/chan_sip.c

------------------------------------------------------------------------
r202336 | mmichelson | 2009-06-22 09:34:05 -0500 (Mon, 22 Jun 2009) | 25 lines

Fix a possible infinite loop in SDP parsing during glare situation.

There was a while loop in get_ip_and_port_from_sdp which was controlled
by a call to get_sdp_iterate. The loop would exit either if what we were
searching for was found or if the return was NULL. The problem is that
get_sdp_iterate never returns NULL. This means that if what we were searching
for was not present, the loop would run infinitely. This modification of the
loop fixes the problem.

(closes issue ASTERISK-14217)
Reported by: schmidts

(closes issue ASTERISK-14332)
Reported by: samy

(closes issue ASTERISK-13569)
Reported by: pj

(closes issue ASTERISK-14328)
Reported by: aragon
Patches:
     sip_inf_loop.patch uploaded by mmichelson (license 60)
Tested by: aragon


------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=202336

By: Digium Subversion (svnbot) 2009-06-22 09:35:13

Repository: asterisk
Revision: 202337

_U  trunk/
U   trunk/channels/chan_sip.c

------------------------------------------------------------------------
r202337 | mmichelson | 2009-06-22 09:35:10 -0500 (Mon, 22 Jun 2009) | 31 lines

Merged revisions 202336 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
 r202336 | mmichelson | 2009-06-22 09:34:05 -0500 (Mon, 22 Jun 2009) | 25 lines
 
 Fix a possible infinite loop in SDP parsing during glare situation.
 
 There was a while loop in get_ip_and_port_from_sdp which was controlled
 by a call to get_sdp_iterate. The loop would exit either if what we were
 searching for was found or if the return was NULL. The problem is that
 get_sdp_iterate never returns NULL. This means that if what we were searching
 for was not present, the loop would run infinitely. This modification of the
 loop fixes the problem.
 
 (closes issue ASTERISK-14217)
 Reported by: schmidts
 
 (closes issue ASTERISK-14332)
 Reported by: samy
 
 (closes issue ASTERISK-13569)
 Reported by: pj
 
 (closes issue ASTERISK-14328)
 Reported by: aragon
 Patches:
       sip_inf_loop.patch uploaded by mmichelson (license 60)
 Tested by: aragon
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=202337

By: Digium Subversion (svnbot) 2009-06-22 09:35:39

Repository: asterisk
Revision: 202338

_U  branches/1.6.0/
U   branches/1.6.0/channels/chan_sip.c

------------------------------------------------------------------------
r202338 | mmichelson | 2009-06-22 09:35:35 -0500 (Mon, 22 Jun 2009) | 38 lines

Merged revisions 202337 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
 r202337 | mmichelson | 2009-06-22 09:35:09 -0500 (Mon, 22 Jun 2009) | 31 lines
 
 Merged revisions 202336 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
   r202336 | mmichelson | 2009-06-22 09:34:05 -0500 (Mon, 22 Jun 2009) | 25 lines
   
   Fix a possible infinite loop in SDP parsing during glare situation.
   
   There was a while loop in get_ip_and_port_from_sdp which was controlled
   by a call to get_sdp_iterate. The loop would exit either if what we were
   searching for was found or if the return was NULL. The problem is that
   get_sdp_iterate never returns NULL. This means that if what we were searching
   for was not present, the loop would run infinitely. This modification of the
   loop fixes the problem.
   
   (closes issue ASTERISK-14217)
   Reported by: schmidts
   
   (closes issue ASTERISK-14332)
   Reported by: samy
   
   (closes issue ASTERISK-13569)
   Reported by: pj
   
   (closes issue ASTERISK-14328)
   Reported by: aragon
   Patches:
         sip_inf_loop.patch uploaded by mmichelson (license 60)
   Tested by: aragon
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=202338

By: Digium Subversion (svnbot) 2009-06-22 09:36:05

Repository: asterisk
Revision: 202339

_U  branches/1.6.1/
U   branches/1.6.1/channels/chan_sip.c

------------------------------------------------------------------------
r202339 | mmichelson | 2009-06-22 09:36:00 -0500 (Mon, 22 Jun 2009) | 38 lines

Merged revisions 202337 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
 r202337 | mmichelson | 2009-06-22 09:35:09 -0500 (Mon, 22 Jun 2009) | 31 lines
 
 Merged revisions 202336 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
   r202336 | mmichelson | 2009-06-22 09:34:05 -0500 (Mon, 22 Jun 2009) | 25 lines
   
   Fix a possible infinite loop in SDP parsing during glare situation.
   
   There was a while loop in get_ip_and_port_from_sdp which was controlled
   by a call to get_sdp_iterate. The loop would exit either if what we were
   searching for was found or if the return was NULL. The problem is that
   get_sdp_iterate never returns NULL. This means that if what we were searching
   for was not present, the loop would run infinitely. This modification of the
   loop fixes the problem.
   
   (closes issue ASTERISK-14217)
   Reported by: schmidts
   
   (closes issue ASTERISK-14332)
   Reported by: samy
   
   (closes issue ASTERISK-13569)
   Reported by: pj
   
   (closes issue ASTERISK-14328)
   Reported by: aragon
   Patches:
         sip_inf_loop.patch uploaded by mmichelson (license 60)
   Tested by: aragon
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=202339

By: Digium Subversion (svnbot) 2009-06-22 09:36:31

Repository: asterisk
Revision: 202340

_U  branches/1.6.2/
U   branches/1.6.2/channels/chan_sip.c

------------------------------------------------------------------------
r202340 | mmichelson | 2009-06-22 09:36:26 -0500 (Mon, 22 Jun 2009) | 38 lines

Merged revisions 202337 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
 r202337 | mmichelson | 2009-06-22 09:35:09 -0500 (Mon, 22 Jun 2009) | 31 lines
 
 Merged revisions 202336 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
   r202336 | mmichelson | 2009-06-22 09:34:05 -0500 (Mon, 22 Jun 2009) | 25 lines
   
   Fix a possible infinite loop in SDP parsing during glare situation.
   
   There was a while loop in get_ip_and_port_from_sdp which was controlled
   by a call to get_sdp_iterate. The loop would exit either if what we were
   searching for was found or if the return was NULL. The problem is that
   get_sdp_iterate never returns NULL. This means that if what we were searching
   for was not present, the loop would run infinitely. This modification of the
   loop fixes the problem.
   
   (closes issue ASTERISK-14217)
   Reported by: schmidts
   
   (closes issue ASTERISK-14332)
   Reported by: samy
   
   (closes issue ASTERISK-13569)
   Reported by: pj
   
   (closes issue ASTERISK-14328)
   Reported by: aragon
   Patches:
         sip_inf_loop.patch uploaded by mmichelson (license 60)
   Tested by: aragon
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=202340