Summary: | ASTERISK-15958: [patch] [regression] Using Local channels with queues causes deadlocks | ||
Reporter: | Schmooze Com (schmoozecom) | Labels: | |
Date Opened: | 2010-04-14 16:23:28 | Date Closed: | 2011-01-04 16:54:03.000-0600 |
Priority: | Blocker | Regression? | No |
Status: | Closed/Complete | Components: | Channels/chan_local |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) backtrace2-threads.txt ( 1) backtrace-threads.txt ( 2) issue_17185_v1.diff ( 3) issue_17185_v2.diff ( 4) lock.log | |
Description: | Since upgrading to asterisk 1.4.30 we are seeing calls that get answered in the queue still show as calls waiting in the queue. We have seen this on 2 systems now and downgrading to 1.4.29 solves the issue. When looking at "Show Queue" it will show a dahdi channel as still waiting in the queue even though the call has been answered and hung up. Below is a print out of "core show locks" when asterisk is in this state. ======================================================================= === Currently Held Locks ============================================== ======================================================================= === === <file> <line num> <function> <lock name> <lock addr> (times locked) === === Thread ID: -1208542320 (do_devstate_changes started at [ 363] devicestate.c ast_device_state_engine_init()) === ---> Tried and failed to get Lock #0 (channel.c): MUTEX 1130 channel_find_locked (channel lock) 0x9bc63f8 (0) === ------------------------------------------------------------------- === === Thread ID: -1215984752 (pbx_thread started at [ 2645] pbx.c ast_pbx_start()) === ---> Lock #0 (channel.c): MUTEX 1556 ast_hangup (channel lock) 0x9be8648 (1) === ---> Tried and failed to get Lock #1 (chan_local.c): MUTEX 581 local_hangup (channel lock) 0x9bc63f8 (0) === ------------------------------------------------------------------- === === Thread ID: -1215247472 (pbx_thread started at [ 2645] pbx.c ast_pbx_start()) === ---> Lock #0 (channel.c): MUTEX 1704 ast_activate_generator (channel lock) 0x9bc63f8 (1) === ---> Tried and failed to get Lock #1 (chan_local.c): MUTEX 185 local_queue_frame (channel lock) 0x9be8648 (0) === ------------------------------------------------------------------- === === Thread ID: -1217889392 (netconsole started at [ 1036] asterisk.c listener()) === ---> Tried and failed to get Lock #0 (channel.c): MUTEX 1130 channel_find_locked (channel lock) 0x9bc63f8 (0) === ------------------------------------------------------------------- === === Thread ID: -1219118192 (netconsole started at [ 1036] asterisk.c listener()) === ---> Tried and failed to get Lock #0 (channel.c): MUTEX 1130 channel_find_locked (channel lock) 0x9bc63f8 (0) === ------------------------------------------------------------------- === === Thread ID: -1215493232 (netconsole started at [ 1036] asterisk.c listener()) === ---> Tried and failed to get Lock #0 (channel.c): MUTEX 1130 channel_find_locked (channel lock) 0x9bc63f8 (0) === ------------------------------------------------------------------- === === Thread ID: -1217643632 (netconsole started at [ 1036] asterisk.c listener()) === ---> Tried and failed to get Lock #0 (channel.c): MUTEX 1130 channel_find_locked (channel lock) 0x9bc63f8 (0) === ------------------------------------------------------------------- === ======================================================================= While this is happening I can go into the asterisk CLI any type a command like "core show channels" and see all active channels but if I want to run another command after that like "sip show peers" I have to leave the CLI and come back each time. Restarting asterisk cleans it up and removes the lock call. | ||
Comments: | By: Schmooze Com (schmoozecom) 2010-04-14 16:26:18 Also we have tried this with the 1.4.31 RC that was just released this week with the same problems By: Paul Belanger (pabelanger) 2010-04-14 18:00:34 Are you able to provide a backtrace (see below). -- Thank you for your bug report. In order to move your issue forward, we require a backtrace from the core file produced after the crash. Please see the doc/backtrace.txt file in your Asterisk source directory. Also, be sure you have DONT_OPTIMIZE enabled in menuselect within the Compiler Flags section, then: make install after enabling, reproduce the crash, and then execute the instructions in doc/backtrace.txt. When complete, attach that file to this issue report. Thanks! By: Jared Smith (jsmith) 2010-04-15 07:55:40 There is no backtrace, because this isn't a crash issue. It's a locking issue. I might be able to force a backtrace by killing the Asterisk process, but I'm not sure it would be helpful in this case. By: Leif Madsen (lmadsen) 2010-04-15 09:27:13 As discussed on IRC we do actually need a backtrace. This can be done by attaching to the running process and getting the output of "thread apply all bt". Here is a command that will attach to the running Asterisk process and write the information to the /tmp/backtrace.txt file. Please upload that file to this issue when you see the deadlocks happen again. gdb -ex "thread apply all bt" --batch /usr/sbin/asterisk `pidof asterisk` > /tmp/backtrace-threads.txt Also be sure to run 'core show locks' from the Asterisk console as closely as possible to running the 'gdb' command so that things match up. asterisk -rx "core show locks" > /tmp/core-show-locks.txt By: Leif Madsen (lmadsen) 2010-04-15 09:27:26 Feedback requested from the reporter. By: Leif Madsen (lmadsen) 2010-04-15 09:28:55 I'm upgrading the severity of this issue to block the next release of 1.4. By: Schmooze Com (schmoozecom) 2010-04-15 09:37:09 Ok well we will need to compile asterisk 1.4.30 back on this system with the menu selects and have it create the problem again. I will try to get this done today and get you the traces. By: Leif Madsen (lmadsen) 2010-04-15 14:48:13 Also note the updated backtrace.txt document I just committed: http://svn.asterisk.org/svn/asterisk/branches/1.4/doc/backtrace.txt By: Leif Madsen (lmadsen) 2010-04-19 13:06:45 Any update on this issue? By: Bryan Walters (gamegamer43) 2010-04-19 14:26:46 Uploaded the backtrace and locks per your request. By: Leif Madsen (lmadsen) 2010-04-19 14:33:42 Thanks for the information! A developer may be looking at this shortly, so if you could respond promptly to any questions asked, it would be appreciated. By: David Vossel (dvossel) 2010-04-20 10:43:55 It looks to me like this may be related to the changes made in r256014. By: David Vossel (dvossel) 2010-04-20 16:43:43 I take that back, this is not related to r256014. I spent quite a bit of time reviewing this and I'm not sure how this is possible yet. One thing I did notice is that the line numbers in the back trace provided line up perfectly for all the files in release 1.4.30 except for the items in app_queue.c. Do you have any local changes to app_queue you are using? By: David Vossel (dvossel) 2010-04-20 17:38:59 If there are custom code changes to app_queue.c please provide them or any other changes for that matter. We don't support third party code, but I may be able to look at it and tell whether it should make a difference or not if the changes are small enough. By: Bryan Walters (gamegamer43) 2010-04-21 06:43:07 The only code changes to app_queue.c are the changes listed and accepted by Digium in r227424. https://issues.asterisk.org/view.php?id=15168 is where the patch can be found that is being used on this system. By: David Vossel (dvossel) 2010-04-21 11:56:25 Thanks for the list of changes, they shouldn't affect locking. I uploaded a patch that I believe will fix this. Please let me if it resolves the issue or not. By: Bryan Walters (gamegamer43) 2010-04-22 16:11:32 I applied your patch to a system last night and it continued to have the same issues today. Unfortunately I was not able to get a new backtrace from it. Please let me know if additional information is needed going forward so I can ensure we can provide it. By: David Vossel (dvossel) 2010-04-23 10:07:03 Well, that's not good. I'll take another pass at this and see if I can't come up with another possible cause. Thanks for testing it. By: David Vossel (dvossel) 2010-04-23 10:53:58 I uploaded a new patch. If you can, please test it. Thanks again for all your help with this issue. By: Leif Madsen (lmadsen) 2010-04-26 13:17:20 Please test this patch at your earliest convenience so we can move this issue to completion. By: Schmooze Com (schmoozecom) 2010-04-26 15:22:43 We applied the latest patch last night. I am at Digium rest of this week teaching a class so if anyone wants to chat one on one about this come find me in the training room. By: Amilcar S Silvestre (amilcar) 2010-04-27 03:53:28 schmoozecom, did the patch fix the issue? By: Digium Subversion (svnbot) 2010-04-28 16:16:04 Repository: asterisk Revision: 259858 U branches/1.4/channels/chan_local.c U branches/1.4/main/channel.c ------------------------------------------------------------------------ r259858 | dvossel | 2010-04-28 16:16:03 -0500 (Wed, 28 Apr 2010) | 33 lines resolves deadlocks in chan_local Issue_1. In the local_hangup() 3 locks must be held at the same time... pvt, pvt->chan, and pvt->owner. Proper deadlock avoidance is done when the channel to hangup is the outbound chan_local channel, but when it is not the outbound channel we have an issue... We attempt to do deadlock avoidance only on the tech pvt, when both the tech pvt and the pvt->owner are locked coming into that loop. By never giving up the pvt->owner channel deadlock avoidance is not entirely possible. This patch resolves that by doing deadlock avoidance on both the pvt->owner and the pvt when trying to get the pvt->chan lock. Issue_2. ast_prod() is used in ast_activate_generator() to queue a frame on the channel and make the channel's read function get called. This function is used in ast_activate_generator() while the channel is locked, which mean's the channel will have a lock both from the generator code and the frame_queue code by the time it gets to chan_local.c's local_queue_frame code... local_queue_frame contains some of the same crazy deadlock avoidance that local_hangup requires, and this recursive lock prevents that deadlock avoidance from happening correctly. This patch removes ast_prod() from the channel lock so only one lock is held during the local_queue_frame function. (closes issue ASTERISK-15958) Reported by: schmoozecom Patches: issue_17185_v1.diff uploaded by dvossel (license 671) issue_17185_v2.diff uploaded by dvossel (license 671) Tested by: schmoozecom, GameGamer43 Review: https://reviewboard.asterisk.org/r/631/ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=259858 By: Digium Subversion (svnbot) 2010-04-28 16:20:04 Repository: asterisk Revision: 259870 _U trunk/ U trunk/channels/chan_local.c U trunk/main/channel.c ------------------------------------------------------------------------ r259870 | dvossel | 2010-04-28 16:20:04 -0500 (Wed, 28 Apr 2010) | 39 lines Merged revisions 259858 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r259858 | dvossel | 2010-04-28 16:16:03 -0500 (Wed, 28 Apr 2010) | 33 lines resolves deadlocks in chan_local Issue_1. In the local_hangup() 3 locks must be held at the same time... pvt, pvt->chan, and pvt->owner. Proper deadlock avoidance is done when the channel to hangup is the outbound chan_local channel, but when it is not the outbound channel we have an issue... We attempt to do deadlock avoidance only on the tech pvt, when both the tech pvt and the pvt->owner are locked coming into that loop. By never giving up the pvt->owner channel deadlock avoidance is not entirely possible. This patch resolves that by doing deadlock avoidance on both the pvt->owner and the pvt when trying to get the pvt->chan lock. Issue_2. ast_prod() is used in ast_activate_generator() to queue a frame on the channel and make the channel's read function get called. This function is used in ast_activate_generator() while the channel is locked, which mean's the channel will have a lock both from the generator code and the frame_queue code by the time it gets to chan_local.c's local_queue_frame code... local_queue_frame contains some of the same crazy deadlock avoidance that local_hangup requires, and this recursive lock prevents that deadlock avoidance from happening correctly. This patch removes ast_prod() from the channel lock so only one lock is held during the local_queue_frame function. (closes issue ASTERISK-15958) Reported by: schmoozecom Patches: issue_17185_v1.diff uploaded by dvossel (license 671) issue_17185_v2.diff uploaded by dvossel (license 671) Tested by: schmoozecom, GameGamer43 Review: https://reviewboard.asterisk.org/r/631/ ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=259870 By: Digium Subversion (svnbot) 2010-04-28 16:26:31 Repository: asterisk Revision: 259899 _U branches/1.6.2/ U branches/1.6.2/channels/chan_local.c U branches/1.6.2/main/channel.c ------------------------------------------------------------------------ r259899 | dvossel | 2010-04-28 16:26:30 -0500 (Wed, 28 Apr 2010) | 46 lines Merged revisions 259870 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ................ r259870 | dvossel | 2010-04-28 16:20:03 -0500 (Wed, 28 Apr 2010) | 39 lines Merged revisions 259858 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r259858 | dvossel | 2010-04-28 16:16:03 -0500 (Wed, 28 Apr 2010) | 33 lines resolves deadlocks in chan_local Issue_1. In the local_hangup() 3 locks must be held at the same time... pvt, pvt->chan, and pvt->owner. Proper deadlock avoidance is done when the channel to hangup is the outbound chan_local channel, but when it is not the outbound channel we have an issue... We attempt to do deadlock avoidance only on the tech pvt, when both the tech pvt and the pvt->owner are locked coming into that loop. By never giving up the pvt->owner channel deadlock avoidance is not entirely possible. This patch resolves that by doing deadlock avoidance on both the pvt->owner and the pvt when trying to get the pvt->chan lock. Issue_2. ast_prod() is used in ast_activate_generator() to queue a frame on the channel and make the channel's read function get called. This function is used in ast_activate_generator() while the channel is locked, which mean's the channel will have a lock both from the generator code and the frame_queue code by the time it gets to chan_local.c's local_queue_frame code... local_queue_frame contains some of the same crazy deadlock avoidance that local_hangup requires, and this recursive lock prevents that deadlock avoidance from happening correctly. This patch removes ast_prod() from the channel lock so only one lock is held during the local_queue_frame function. (closes issue ASTERISK-15958) Reported by: schmoozecom Patches: issue_17185_v1.diff uploaded by dvossel (license 671) issue_17185_v2.diff uploaded by dvossel (license 671) Tested by: schmoozecom, GameGamer43 Review: https://reviewboard.asterisk.org/r/631/ ........ ................ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=259899 By: Digium Subversion (svnbot) 2010-04-28 16:33:17 Repository: asterisk Revision: 259930 _U branches/1.6.1/ U branches/1.6.1/channels/chan_local.c U branches/1.6.1/main/channel.c ------------------------------------------------------------------------ r259930 | dvossel | 2010-04-28 16:33:17 -0500 (Wed, 28 Apr 2010) | 46 lines Merged revisions 259870 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ................ r259870 | dvossel | 2010-04-28 16:20:03 -0500 (Wed, 28 Apr 2010) | 39 lines Merged revisions 259858 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r259858 | dvossel | 2010-04-28 16:16:03 -0500 (Wed, 28 Apr 2010) | 33 lines resolves deadlocks in chan_local Issue_1. In the local_hangup() 3 locks must be held at the same time... pvt, pvt->chan, and pvt->owner. Proper deadlock avoidance is done when the channel to hangup is the outbound chan_local channel, but when it is not the outbound channel we have an issue... We attempt to do deadlock avoidance only on the tech pvt, when both the tech pvt and the pvt->owner are locked coming into that loop. By never giving up the pvt->owner channel deadlock avoidance is not entirely possible. This patch resolves that by doing deadlock avoidance on both the pvt->owner and the pvt when trying to get the pvt->chan lock. Issue_2. ast_prod() is used in ast_activate_generator() to queue a frame on the channel and make the channel's read function get called. This function is used in ast_activate_generator() while the channel is locked, which mean's the channel will have a lock both from the generator code and the frame_queue code by the time it gets to chan_local.c's local_queue_frame code... local_queue_frame contains some of the same crazy deadlock avoidance that local_hangup requires, and this recursive lock prevents that deadlock avoidance from happening correctly. This patch removes ast_prod() from the channel lock so only one lock is held during the local_queue_frame function. (closes issue ASTERISK-15958) Reported by: schmoozecom Patches: issue_17185_v1.diff uploaded by dvossel (license 671) issue_17185_v2.diff uploaded by dvossel (license 671) Tested by: schmoozecom, GameGamer43 Review: https://reviewboard.asterisk.org/r/631/ ........ ................ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=259930 By: Digium Subversion (svnbot) 2010-04-28 16:35:10 Repository: asterisk Revision: 259936 _U branches/1.6.0/ U branches/1.6.0/channels/chan_local.c U branches/1.6.0/main/channel.c ------------------------------------------------------------------------ r259936 | dvossel | 2010-04-28 16:35:10 -0500 (Wed, 28 Apr 2010) | 46 lines Merged revisions 259870 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ................ r259870 | dvossel | 2010-04-28 16:20:03 -0500 (Wed, 28 Apr 2010) | 39 lines Merged revisions 259858 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r259858 | dvossel | 2010-04-28 16:16:03 -0500 (Wed, 28 Apr 2010) | 33 lines resolves deadlocks in chan_local Issue_1. In the local_hangup() 3 locks must be held at the same time... pvt, pvt->chan, and pvt->owner. Proper deadlock avoidance is done when the channel to hangup is the outbound chan_local channel, but when it is not the outbound channel we have an issue... We attempt to do deadlock avoidance only on the tech pvt, when both the tech pvt and the pvt->owner are locked coming into that loop. By never giving up the pvt->owner channel deadlock avoidance is not entirely possible. This patch resolves that by doing deadlock avoidance on both the pvt->owner and the pvt when trying to get the pvt->chan lock. Issue_2. ast_prod() is used in ast_activate_generator() to queue a frame on the channel and make the channel's read function get called. This function is used in ast_activate_generator() while the channel is locked, which mean's the channel will have a lock both from the generator code and the frame_queue code by the time it gets to chan_local.c's local_queue_frame code... local_queue_frame contains some of the same crazy deadlock avoidance that local_hangup requires, and this recursive lock prevents that deadlock avoidance from happening correctly. This patch removes ast_prod() from the channel lock so only one lock is held during the local_queue_frame function. (closes issue ASTERISK-15958) Reported by: schmoozecom Patches: issue_17185_v1.diff uploaded by dvossel (license 671) issue_17185_v2.diff uploaded by dvossel (license 671) Tested by: schmoozecom, GameGamer43 Review: https://reviewboard.asterisk.org/r/631/ ........ ................ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=259936 |