Summary: | ASTERISK-13569: [patch] lock during simple call processing | ||
Reporter: | pj (pj) | Labels: | |
Date Opened: | 2009-02-12 06:27:37.000-0600 | Date Closed: | 2009-08-10 09:30:19 |
Priority: | Major | Regression? | No |
Status: | Closed/Complete | Components: | Channels/chan_sip/General |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) gdb.txt ( 1) showlocks.txt ( 2) sip_inf_loop.patch | |
Description: | During normal operation, pure sip-sip calls, without special features (eg. transfer), asterisk locks. Asterisk server had very small utilization - three concurrent sip calls at maximum. After this lockout, all peers becomed unreachable and asterisk must be manually restarted. CLI was not locked. "core show locks" attached ****** ADDITIONAL INFORMATION ****** === <pending> <lock#> (<file>): <lock type> <line num> <function> <lock name> <lock addr> (times locked) === === Thread ID: -1216238704 (do_monitor started at [21149] chan_sip.c restart_monitor()) === ---> Lock #0 (chan_sip.c): MUTEX 20656 handle_request_do &netlock 0xb78e18a0 (1) asterisk(ast_bt_get_addresses+0x19) [0x80fe866] /usr/lib/asterisk/modules/chan_sip.so [0xb785b6a9] /usr/lib/asterisk/modules/chan_sip.so [0xb78b4227] /usr/lib/asterisk/modules/chan_sip.so [0xb78b3fa9] asterisk(ast_io_wait+0x14d) [0x80f4581] /usr/lib/asterisk/modules/chan_sip.so [0xb78b5bf0] asterisk [0x8175261] /lib/i686/libpthread.so.0 [0xb7c694e2] /lib/i686/libc.so.6(clone+0x5e) [0xb7ec047e] === ---> Lock #1 (astobj2.c): MUTEX 198 ao2_trylock &p->priv_data.lock 0xb7351928 (1) asterisk(ast_bt_get_addresses+0x19) [0x80fe866] asterisk [0x808266a] asterisk(ao2_trylock+0x4e) [0x80825d5] /usr/lib/asterisk/modules/chan_sip.so [0xb786f0b0] /usr/lib/asterisk/modules/chan_sip.so [0xb78b4243] /usr/lib/asterisk/modules/chan_sip.so [0xb78b3fa9] asterisk(ast_io_wait+0x14d) [0x80f4581] /usr/lib/asterisk/modules/chan_sip.so [0xb78b5bf0] asterisk [0x8175261] /lib/i686/libpthread.so.0 [0xb7c694e2] /lib/i686/libc.so.6(clone+0x5e) [0xb7ec047e] | ||
Comments: | By: Mark Michelson (mmichelson) 2009-02-12 15:46:18.000-0600 This is a case where it is very difficult to tell where the problem is just based on the core show locks output. If the problem happens again, could you upload both core show locks and a thread apply all bt full from gdb? That way, we can see the state of all the threads on the system when the lock occurs. Thanks! By: pj (pj) 2009-02-12 15:58:30.000-0600 OK, when next lock will appear, I will try to do: gdb /usr/sbin/asterisk <pid of main * process> and then: thread apply all bt full core show locks so, please keep this bugreport open for some time By: pj (pj) 2009-03-17 08:32:25 locked again, uploaded gdb debug output from locked asterisk, and "core show locks" Asterisk SVN-trunk-r176174 By: pj (pj) 2009-03-22 04:34:50 feedback and required traces was given, please change status of this bugreport. By: pj (pj) 2009-04-10 07:35:47 Do you have sufficient information to find source of this lockout yet? I posted gdb, that you require in the past. If you can't find where problem is, according to supplied debugs, you can close this bugreport, because I can't supply better outputs, like valgrind, because I don't know, how to invoke this lock. And also can't run from valgrind all the time, for eg. until another lock appears again. thanks for understanding! By: Leif Madsen (lmadsen) 2009-06-10 12:42:31 Assigned to Mark to review the information. If this is useful information, but you don't have the time to resolve this issue, please un-assign yourself and set status to Acknowledged. Thanks! By: Mark Michelson (mmichelson) 2009-06-10 17:28:47 I just took a look at this, and it has me a bit baffled. I can see from the "core show locks" output that the SIP monitor thread is blocking at some point. The problem is that the corresponding thread in the gdb output (thread 6) has a bunch of ?? frames. In addition, the portion of the stack that is not ?? has garbage (e.g. free() does not call poll()). It's very frustrating that the most important thread from the gdb output is the only one that has garbage data in it. By: Mark Michelson (mmichelson) 2009-06-11 09:20:49 I don't know if I can trust the gdb output for thread 6, but I just noticed that the "free" in frame 2 says that it came from libresolv.so.2, so this may mean that the SIP monitor thread is blocking during a DNS lookup for some reason. By: pj (pj) 2009-06-11 15:20:12 so, do you agree, that this is real bug? I think so, because asterisk stability shouldn't rely on dns system and definitively shouldn't lock in case of slow or no response from dns. By: Mark Michelson (mmichelson) 2009-06-11 17:07:06 I don't even know for sure if the problem is related to DNS. That thread in the trace provided might be completely bogus. That's what's a bit frustrating with this issue right now. It's hard to tell what the cause is, and the tools that are supposed to help with that are not helpful at the moment. By: David Brillert (aragon) 2009-06-17 23:23:10 Possibly related to this revision? I only began seeing this issue in 1.4.25, it did not occur in 1.4.24.1 2009-05-28 15:27 +0000 [r197588] Mark Michelson <mmichelson@digium.com> * main/rtp.c, channels/chan_sip.c, include/asterisk/rtp.h: Allow for media to arrive from an alternate source when responding to a reinvite with 491. When we receive a SIP reinvite, it is possible that we may not be able to process the reinvite immediately since we have also sent a reinvite out ourselves. The problem is that whoever sent us the reinvite may have also sent a reinvite out to another party, and that reinvite may have succeeded. As a result, even though we are not going to accept the reinvite we just received, it is important for us to not have problems if we suddenly start receiving RTP from a new source. The fix for this is to grab the media source information from the SDP of the reinvite that we receive. This information is passed to the RTP layer so that it will know about the alternate source for media. Review: https://reviewboard.asterisk.org/r/252 By: David Brillert (aragon) 2009-06-17 23:26:46 I must concur with jvandal on his note https://issues.asterisk.org/view.php?id=15213#105717 jvandal (reporter) 2009-05-29 10:28 If I check on my server, the working revision for is r197562 but fail with r197588 -ASTERISK_FILE_VERSION(__FILE__, "$Revision: 197562M $") +ASTERISK_FILE_VERSION(__FILE__, "$Revision: 197588M $") jvandal (reporter) 2009-05-29 10:28 By: Mark Michelson (mmichelson) 2009-06-18 11:02:38 aragon, if you are also seeing this issue, would you mind uploading "core show locks" from the Asterisk CLI and "thread apply all bt" from gdb when the issue occurs? When pj tried, gdb crapped all over the thread I cared about. Maybe you'll have better luck. By: David Brillert (aragon) 2009-06-18 11:24:37 I opened another bug report for this here https://issues.asterisk.org/view.php?id=15345 All of my debugs are uploaded to that ticket... By: Mark Michelson (mmichelson) 2009-06-18 12:50:50 Discussion on issue 15345 leads me to believe that the problem is due to an infinite loop under certain circumstances. Please try applying sip_inf_loop.patch and see if you still experience the problem. Thanks! By: pj (pj) 2009-06-18 13:34:17 it will be hard to prove, if this patch will solve initial issue, because, as I wrote before, I don't know, how to invoke this lock and currently I haven't even access to system, that was originally affected. But recently, I discovered another lockout, for what I created bugreport 0015343 so you can look at it, if can be eventually related to issue, for what you created patch. By: Digium Subversion (svnbot) 2009-06-22 09:34:11 Repository: asterisk Revision: 202336 U branches/1.4/channels/chan_sip.c ------------------------------------------------------------------------ r202336 | mmichelson | 2009-06-22 09:34:05 -0500 (Mon, 22 Jun 2009) | 25 lines Fix a possible infinite loop in SDP parsing during glare situation. There was a while loop in get_ip_and_port_from_sdp which was controlled by a call to get_sdp_iterate. The loop would exit either if what we were searching for was found or if the return was NULL. The problem is that get_sdp_iterate never returns NULL. This means that if what we were searching for was not present, the loop would run infinitely. This modification of the loop fixes the problem. (closes issue ASTERISK-14217) Reported by: schmidts (closes issue ASTERISK-14332) Reported by: samy (closes issue ASTERISK-13569) Reported by: pj (closes issue ASTERISK-14328) Reported by: aragon Patches: sip_inf_loop.patch uploaded by mmichelson (license 60) Tested by: aragon ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=202336 By: Digium Subversion (svnbot) 2009-06-22 09:35:12 Repository: asterisk Revision: 202337 _U trunk/ U trunk/channels/chan_sip.c ------------------------------------------------------------------------ r202337 | mmichelson | 2009-06-22 09:35:10 -0500 (Mon, 22 Jun 2009) | 31 lines Merged revisions 202336 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r202336 | mmichelson | 2009-06-22 09:34:05 -0500 (Mon, 22 Jun 2009) | 25 lines Fix a possible infinite loop in SDP parsing during glare situation. There was a while loop in get_ip_and_port_from_sdp which was controlled by a call to get_sdp_iterate. The loop would exit either if what we were searching for was found or if the return was NULL. The problem is that get_sdp_iterate never returns NULL. This means that if what we were searching for was not present, the loop would run infinitely. This modification of the loop fixes the problem. (closes issue ASTERISK-14217) Reported by: schmidts (closes issue ASTERISK-14332) Reported by: samy (closes issue ASTERISK-13569) Reported by: pj (closes issue ASTERISK-14328) Reported by: aragon Patches: sip_inf_loop.patch uploaded by mmichelson (license 60) Tested by: aragon ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=202337 By: Digium Subversion (svnbot) 2009-06-22 09:35:38 Repository: asterisk Revision: 202338 _U branches/1.6.0/ U branches/1.6.0/channels/chan_sip.c ------------------------------------------------------------------------ r202338 | mmichelson | 2009-06-22 09:35:35 -0500 (Mon, 22 Jun 2009) | 38 lines Merged revisions 202337 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ................ r202337 | mmichelson | 2009-06-22 09:35:09 -0500 (Mon, 22 Jun 2009) | 31 lines Merged revisions 202336 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r202336 | mmichelson | 2009-06-22 09:34:05 -0500 (Mon, 22 Jun 2009) | 25 lines Fix a possible infinite loop in SDP parsing during glare situation. There was a while loop in get_ip_and_port_from_sdp which was controlled by a call to get_sdp_iterate. The loop would exit either if what we were searching for was found or if the return was NULL. The problem is that get_sdp_iterate never returns NULL. This means that if what we were searching for was not present, the loop would run infinitely. This modification of the loop fixes the problem. (closes issue ASTERISK-14217) Reported by: schmidts (closes issue ASTERISK-14332) Reported by: samy (closes issue ASTERISK-13569) Reported by: pj (closes issue ASTERISK-14328) Reported by: aragon Patches: sip_inf_loop.patch uploaded by mmichelson (license 60) Tested by: aragon ........ ................ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=202338 By: Digium Subversion (svnbot) 2009-06-22 09:36:03 Repository: asterisk Revision: 202339 _U branches/1.6.1/ U branches/1.6.1/channels/chan_sip.c ------------------------------------------------------------------------ r202339 | mmichelson | 2009-06-22 09:36:00 -0500 (Mon, 22 Jun 2009) | 38 lines Merged revisions 202337 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ................ r202337 | mmichelson | 2009-06-22 09:35:09 -0500 (Mon, 22 Jun 2009) | 31 lines Merged revisions 202336 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r202336 | mmichelson | 2009-06-22 09:34:05 -0500 (Mon, 22 Jun 2009) | 25 lines Fix a possible infinite loop in SDP parsing during glare situation. There was a while loop in get_ip_and_port_from_sdp which was controlled by a call to get_sdp_iterate. The loop would exit either if what we were searching for was found or if the return was NULL. The problem is that get_sdp_iterate never returns NULL. This means that if what we were searching for was not present, the loop would run infinitely. This modification of the loop fixes the problem. (closes issue ASTERISK-14217) Reported by: schmidts (closes issue ASTERISK-14332) Reported by: samy (closes issue ASTERISK-13569) Reported by: pj (closes issue ASTERISK-14328) Reported by: aragon Patches: sip_inf_loop.patch uploaded by mmichelson (license 60) Tested by: aragon ........ ................ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=202339 By: Digium Subversion (svnbot) 2009-06-22 09:36:29 Repository: asterisk Revision: 202340 _U branches/1.6.2/ U branches/1.6.2/channels/chan_sip.c ------------------------------------------------------------------------ r202340 | mmichelson | 2009-06-22 09:36:26 -0500 (Mon, 22 Jun 2009) | 38 lines Merged revisions 202337 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ................ r202337 | mmichelson | 2009-06-22 09:35:09 -0500 (Mon, 22 Jun 2009) | 31 lines Merged revisions 202336 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r202336 | mmichelson | 2009-06-22 09:34:05 -0500 (Mon, 22 Jun 2009) | 25 lines Fix a possible infinite loop in SDP parsing during glare situation. There was a while loop in get_ip_and_port_from_sdp which was controlled by a call to get_sdp_iterate. The loop would exit either if what we were searching for was found or if the return was NULL. The problem is that get_sdp_iterate never returns NULL. This means that if what we were searching for was not present, the loop would run infinitely. This modification of the loop fixes the problem. (closes issue ASTERISK-14217) Reported by: schmidts (closes issue ASTERISK-14332) Reported by: samy (closes issue ASTERISK-13569) Reported by: pj (closes issue ASTERISK-14328) Reported by: aragon Patches: sip_inf_loop.patch uploaded by mmichelson (license 60) Tested by: aragon ........ ................ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=202340 |