| Summary: | ASTERISK-14328: [patch] SIP deadlock in 1.4 revision 199472 | ||
| Reporter: | David Brillert (aragon) | Labels: | |
| Date Opened: | 2009-06-17 13:11:39 | Date Closed: | 2009-08-10 09:30:54 | 
| Priority: | Major | Regression? | No | 
| Status: | Closed/Complete | Components: | Channels/chan_sip/General | 
| Versions: | Frequency of Occurrence | ||
| Related Issues: | |||
| Environment: | Attachments: | ( 0) 06172009deadlock.txt ( 1) lotsofsiplockswithsipdebugenabled.zip ( 2) sip_inf_loop.patch | |
| Description: | After some brief time SIP will lock and no calls will process. ****** ADDITIONAL INFORMATION ****** core show locks and thread apply all bt attached | ||
| Comments: | By: David Brillert (aragon) 2009-06-17 13:12:40 core show locks lab*CLI> ======================================================================= === Currently Held Locks ============================================== ======================================================================= === === <file> <line num> <function> <lock name> <lock addr> (times locked) === === Thread ID: 3080715152 (do_monitor started at [16743] chan_sip.c restart_monitor()) === ---> Lock #0 (chan_sip.c): MUTEX 16412 sipsock_read &netlock 0x6cf780 (1) === ---> Lock #1 (chan_sip.c): MUTEX 4730 find_call &p->lock 0xb6eefa50 (1) By: David Brillert (aragon) 2009-06-17 13:27:00 Looks like this older bug report https://issues.asterisk.org/view.php?id=15213 By: David Brillert (aragon) 2009-06-17 21:01:59 This also looks like https://issues.asterisk.org/view.php?id=14464 By: David Brillert (aragon) 2009-06-17 23:23:06 Possibly related to this revision? I only began seeing this issue in 1.4.25, it did not occur in 1.4.24.1 2009-05-28 15:27 +0000 [r197588] Mark Michelson <mmichelson@digium.com> * main/rtp.c, channels/chan_sip.c, include/asterisk/rtp.h: Allow for media to arrive from an alternate source when responding to a reinvite with 491. When we receive a SIP reinvite, it is possible that we may not be able to process the reinvite immediately since we have also sent a reinvite out ourselves. The problem is that whoever sent us the reinvite may have also sent a reinvite out to another party, and that reinvite may have succeeded. As a result, even though we are not going to accept the reinvite we just received, it is important for us to not have problems if we suddenly start receiving RTP from a new source. The fix for this is to grab the media source information from the SDP of the reinvite that we receive. This information is passed to the RTP layer so that it will know about the alternate source for media. Review: https://reviewboard.asterisk.org/r/252 By: David Brillert (aragon) 2009-06-17 23:28:52 I must concur with jvandal on his note https://issues.asterisk.org/view.php?id=15213#105717 jvandal (reporter) 2009-05-29 10:28 If I check on my server, the working revision for is r197562 but fail with r197588 -ASTERISK_FILE_VERSION(__FILE__, "$Revision: 197562M $") +ASTERISK_FILE_VERSION(__FILE__, "$Revision: 197588M $") jvandal (reporter) 2009-05-29 10:28 By: David Brillert (aragon) 2009-06-18 09:50:10 I find this very easy to reproduce in my lab. I'm able to pass a lot of calls through 4 PRI interfaces in a non production environment... The basis of my tests is to load 4 PRI interfaces and pass multiple calls to an ACD queue to logged agents. As a result some calls are answered and some are held with MOH. A percentage of ACD calls answered by agents are transferred to other extensions using Asterisk blind transfer. I uploaded CLI output with sip debug enabled and each time I ran into the lock I issued the core show locks command. At least one of the outputs of core show locks appears to show the same output as my first capture. The main difference being that in this CLI trace session I did not have to restart Asterisk to recover from the lock. Therefore I did not include output from gdb thread apply all bt in this attachment. This is a pretty big text file but it only spans about a 10 minute test period. I ran into at least 7 deadlocks in that time span. My previous attachment 06172009deadlock.txt includes gdb thread apply all bt and core show locks output. I do not use re-invites in my configuration. By: Mark Michelson (mmichelson) 2009-06-18 12:42:17 06172009deadlock.txt shows that the sip monitor thread is currently executing the sscanf function in get_ip_and_port_from_sdp. I wonder if perhaps the while loop is not terminating for some reason... I'll investigate further. By: Mark Michelson (mmichelson) 2009-06-18 12:44:53 Yes, I think that is the problem. I have an idea for a patch and I will post it here as well as the other related bug reports as soon as I can. By: Mark Michelson (mmichelson) 2009-06-18 12:48:56 Try sip_inf_loop.patch and see if you still experience the same problem. Thanks for the good debug info! By: David Brillert (aragon) 2009-06-18 14:17:58 I have no problem reproducing the lock in my lab so I should be able to give test results quickly. But I must wait for jvandal to produce an rpm with this patch so I can test... Thanks for getting back to me so quickly, this bug has been driving me nuts. By: David Brillert (aragon) 2009-06-18 16:25:16 I have this all labbed up and will test overnight and then again in the morning for locks. By: David Brillert (aragon) 2009-06-19 08:06:24 I ran a pretty intense test on this overnight (about 14 hours) which would normally result in a deadlock when I came in to review status. I also scripted a reload command every 5 minutes. For good measure this morning I did everything in my power to confuse Asterisk by restarting the service with 73 active calls on the PRI's multiple times. I could not reproduce a lock with the patch installed. By: Digium Subversion (svnbot) 2009-06-22 09:34:14 Repository: asterisk Revision: 202336 U branches/1.4/channels/chan_sip.c ------------------------------------------------------------------------ r202336 | mmichelson | 2009-06-22 09:34:05 -0500 (Mon, 22 Jun 2009) | 25 lines Fix a possible infinite loop in SDP parsing during glare situation. There was a while loop in get_ip_and_port_from_sdp which was controlled by a call to get_sdp_iterate. The loop would exit either if what we were searching for was found or if the return was NULL. The problem is that get_sdp_iterate never returns NULL. This means that if what we were searching for was not present, the loop would run infinitely. This modification of the loop fixes the problem. (closes issue ASTERISK-14217) Reported by: schmidts (closes issue ASTERISK-14332) Reported by: samy (closes issue ASTERISK-13569) Reported by: pj (closes issue ASTERISK-14328) Reported by: aragon Patches: sip_inf_loop.patch uploaded by mmichelson (license 60) Tested by: aragon ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=202336 By: Digium Subversion (svnbot) 2009-06-22 09:35:13 Repository: asterisk Revision: 202337 _U trunk/ U trunk/channels/chan_sip.c ------------------------------------------------------------------------ r202337 | mmichelson | 2009-06-22 09:35:10 -0500 (Mon, 22 Jun 2009) | 31 lines Merged revisions 202336 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r202336 | mmichelson | 2009-06-22 09:34:05 -0500 (Mon, 22 Jun 2009) | 25 lines Fix a possible infinite loop in SDP parsing during glare situation. There was a while loop in get_ip_and_port_from_sdp which was controlled by a call to get_sdp_iterate. The loop would exit either if what we were searching for was found or if the return was NULL. The problem is that get_sdp_iterate never returns NULL. This means that if what we were searching for was not present, the loop would run infinitely. This modification of the loop fixes the problem. (closes issue ASTERISK-14217) Reported by: schmidts (closes issue ASTERISK-14332) Reported by: samy (closes issue ASTERISK-13569) Reported by: pj (closes issue ASTERISK-14328) Reported by: aragon Patches: sip_inf_loop.patch uploaded by mmichelson (license 60) Tested by: aragon ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=202337 By: Digium Subversion (svnbot) 2009-06-22 09:35:39 Repository: asterisk Revision: 202338 _U branches/1.6.0/ U branches/1.6.0/channels/chan_sip.c ------------------------------------------------------------------------ r202338 | mmichelson | 2009-06-22 09:35:35 -0500 (Mon, 22 Jun 2009) | 38 lines Merged revisions 202337 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ................ r202337 | mmichelson | 2009-06-22 09:35:09 -0500 (Mon, 22 Jun 2009) | 31 lines Merged revisions 202336 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r202336 | mmichelson | 2009-06-22 09:34:05 -0500 (Mon, 22 Jun 2009) | 25 lines Fix a possible infinite loop in SDP parsing during glare situation. There was a while loop in get_ip_and_port_from_sdp which was controlled by a call to get_sdp_iterate. The loop would exit either if what we were searching for was found or if the return was NULL. The problem is that get_sdp_iterate never returns NULL. This means that if what we were searching for was not present, the loop would run infinitely. This modification of the loop fixes the problem. (closes issue ASTERISK-14217) Reported by: schmidts (closes issue ASTERISK-14332) Reported by: samy (closes issue ASTERISK-13569) Reported by: pj (closes issue ASTERISK-14328) Reported by: aragon Patches: sip_inf_loop.patch uploaded by mmichelson (license 60) Tested by: aragon ........ ................ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=202338 By: Digium Subversion (svnbot) 2009-06-22 09:36:05 Repository: asterisk Revision: 202339 _U branches/1.6.1/ U branches/1.6.1/channels/chan_sip.c ------------------------------------------------------------------------ r202339 | mmichelson | 2009-06-22 09:36:00 -0500 (Mon, 22 Jun 2009) | 38 lines Merged revisions 202337 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ................ r202337 | mmichelson | 2009-06-22 09:35:09 -0500 (Mon, 22 Jun 2009) | 31 lines Merged revisions 202336 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r202336 | mmichelson | 2009-06-22 09:34:05 -0500 (Mon, 22 Jun 2009) | 25 lines Fix a possible infinite loop in SDP parsing during glare situation. There was a while loop in get_ip_and_port_from_sdp which was controlled by a call to get_sdp_iterate. The loop would exit either if what we were searching for was found or if the return was NULL. The problem is that get_sdp_iterate never returns NULL. This means that if what we were searching for was not present, the loop would run infinitely. This modification of the loop fixes the problem. (closes issue ASTERISK-14217) Reported by: schmidts (closes issue ASTERISK-14332) Reported by: samy (closes issue ASTERISK-13569) Reported by: pj (closes issue ASTERISK-14328) Reported by: aragon Patches: sip_inf_loop.patch uploaded by mmichelson (license 60) Tested by: aragon ........ ................ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=202339 By: Digium Subversion (svnbot) 2009-06-22 09:36:31 Repository: asterisk Revision: 202340 _U branches/1.6.2/ U branches/1.6.2/channels/chan_sip.c ------------------------------------------------------------------------ r202340 | mmichelson | 2009-06-22 09:36:26 -0500 (Mon, 22 Jun 2009) | 38 lines Merged revisions 202337 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ................ r202337 | mmichelson | 2009-06-22 09:35:09 -0500 (Mon, 22 Jun 2009) | 31 lines Merged revisions 202336 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r202336 | mmichelson | 2009-06-22 09:34:05 -0500 (Mon, 22 Jun 2009) | 25 lines Fix a possible infinite loop in SDP parsing during glare situation. There was a while loop in get_ip_and_port_from_sdp which was controlled by a call to get_sdp_iterate. The loop would exit either if what we were searching for was found or if the return was NULL. The problem is that get_sdp_iterate never returns NULL. This means that if what we were searching for was not present, the loop would run infinitely. This modification of the loop fixes the problem. (closes issue ASTERISK-14217) Reported by: schmidts (closes issue ASTERISK-14332) Reported by: samy (closes issue ASTERISK-13569) Reported by: pj (closes issue ASTERISK-14328) Reported by: aragon Patches: sip_inf_loop.patch uploaded by mmichelson (license 60) Tested by: aragon ........ ................ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=202340 | ||