[Home]

Summary:ASTERISK-15958: [patch] [regression] Using Local channels with queues causes deadlocks
Reporter:Schmooze Com (schmoozecom)Labels:
Date Opened:2010-04-14 16:23:28Date Closed:2011-01-04 16:54:03.000-0600
Priority:BlockerRegression?No
Status:Closed/CompleteComponents:Channels/chan_local
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) backtrace2-threads.txt
( 1) backtrace-threads.txt
( 2) issue_17185_v1.diff
( 3) issue_17185_v2.diff
( 4) lock.log
Description:Since upgrading to asterisk 1.4.30 we are seeing calls that get answered in the queue still show as calls waiting in the queue.  We have seen this on 2 systems now and downgrading to 1.4.29 solves the issue.  When looking at "Show Queue" it will show a dahdi channel as still waiting in the queue even though the call has been answered and hung up.

Below is a print out of "core show locks" when asterisk is in this state.

=======================================================================
=== Currently Held Locks ==============================================
=======================================================================
===
=== <file> <line num> <function> <lock name> <lock addr> (times locked)
===
=== Thread ID: -1208542320 (do_devstate_changes  started at [  363] devicestate.c ast_device_state_engine_init())
=== ---> Tried and failed to get Lock #0 (channel.c): MUTEX 1130 channel_find_locked (channel lock) 0x9bc63f8 (0)
=== -------------------------------------------------------------------
===
=== Thread ID: -1215984752 (pbx_thread           started at [ 2645] pbx.c ast_pbx_start())
=== ---> Lock #0 (channel.c): MUTEX 1556 ast_hangup (channel lock) 0x9be8648 (1)
=== ---> Tried and failed to get Lock #1 (chan_local.c): MUTEX 581 local_hangup (channel lock) 0x9bc63f8 (0)
=== -------------------------------------------------------------------
===
=== Thread ID: -1215247472 (pbx_thread           started at [ 2645] pbx.c ast_pbx_start())
=== ---> Lock #0 (channel.c): MUTEX 1704 ast_activate_generator (channel lock) 0x9bc63f8 (1)
=== ---> Tried and failed to get Lock #1 (chan_local.c): MUTEX 185 local_queue_frame (channel lock) 0x9be8648 (0)
=== -------------------------------------------------------------------
===
=== Thread ID: -1217889392 (netconsole           started at [ 1036] asterisk.c listener())
=== ---> Tried and failed to get Lock #0 (channel.c): MUTEX 1130 channel_find_locked (channel lock) 0x9bc63f8 (0)
=== -------------------------------------------------------------------
===
=== Thread ID: -1219118192 (netconsole           started at [ 1036] asterisk.c listener())
=== ---> Tried and failed to get Lock #0 (channel.c): MUTEX 1130 channel_find_locked (channel lock) 0x9bc63f8 (0)
=== -------------------------------------------------------------------
===
=== Thread ID: -1215493232 (netconsole           started at [ 1036] asterisk.c listener())
=== ---> Tried and failed to get Lock #0 (channel.c): MUTEX 1130 channel_find_locked (channel lock) 0x9bc63f8 (0)
=== -------------------------------------------------------------------
===
=== Thread ID: -1217643632 (netconsole           started at [ 1036] asterisk.c listener())
=== ---> Tried and failed to get Lock #0 (channel.c): MUTEX 1130 channel_find_locked (channel lock) 0x9bc63f8 (0)
=== -------------------------------------------------------------------
===
=======================================================================


While this is happening I can go into the asterisk CLI any type a command like "core show channels" and see all active channels but if I want to run another command after that like "sip show peers" I have to leave the CLI and come back each time.  Restarting asterisk cleans it up and removes the lock call.
Comments:By: Schmooze Com (schmoozecom) 2010-04-14 16:26:18

Also we have tried this with the 1.4.31 RC that was just released this week with the same problems

By: Paul Belanger (pabelanger) 2010-04-14 18:00:34

Are you able to provide a backtrace (see below).
--
Thank you for your bug report. In order to move your issue forward, we require a backtrace from the core file produced after the crash. Please see the doc/backtrace.txt file in your Asterisk source directory.

Also, be sure you have DONT_OPTIMIZE enabled in menuselect within the Compiler Flags section, then:

make install

after enabling, reproduce the crash, and then execute the instructions in doc/backtrace.txt.

When complete, attach that file to this issue report. Thanks!

By: Jared Smith (jsmith) 2010-04-15 07:55:40

There is no backtrace, because this isn't a crash issue.  It's a locking issue.  I might be able to force a backtrace by killing the Asterisk process, but I'm not sure it would be helpful in this case.

By: Leif Madsen (lmadsen) 2010-04-15 09:27:13

As discussed on IRC we do actually need a backtrace. This can be done by attaching to the running process and getting the output of "thread apply all bt". Here is a command that will attach to the running Asterisk process and write the information to the /tmp/backtrace.txt file. Please upload that file to this issue when you see the deadlocks happen again.

gdb -ex "thread apply all bt" --batch /usr/sbin/asterisk `pidof asterisk` > /tmp/backtrace-threads.txt

Also be sure to run 'core show locks' from the Asterisk console as closely as possible to running the 'gdb' command so that things match up.

asterisk -rx "core show locks" > /tmp/core-show-locks.txt

By: Leif Madsen (lmadsen) 2010-04-15 09:27:26

Feedback requested from the reporter.

By: Leif Madsen (lmadsen) 2010-04-15 09:28:55

I'm upgrading the severity of this issue to block the next release of 1.4.

By: Schmooze Com (schmoozecom) 2010-04-15 09:37:09

Ok well we will need to compile asterisk 1.4.30 back on this system with the menu selects and have it create the problem again.  I will try to get this done today and get you the traces.

By: Leif Madsen (lmadsen) 2010-04-15 14:48:13

Also note the updated backtrace.txt document I just committed:  http://svn.asterisk.org/svn/asterisk/branches/1.4/doc/backtrace.txt

By: Leif Madsen (lmadsen) 2010-04-19 13:06:45

Any update on this issue?

By: Bryan Walters (gamegamer43) 2010-04-19 14:26:46

Uploaded the backtrace and locks per your request.

By: Leif Madsen (lmadsen) 2010-04-19 14:33:42

Thanks for the information! A developer may be looking at this shortly, so if you could respond promptly to any questions asked, it would be appreciated.

By: David Vossel (dvossel) 2010-04-20 10:43:55

It looks to me like this may be related to the changes made in r256014.

By: David Vossel (dvossel) 2010-04-20 16:43:43

I take that back, this is not related to r256014.

I spent quite a bit of time reviewing this and I'm not sure how this is possible yet.  One thing I did notice is that the line numbers in the back trace provided line up perfectly for all the files in release 1.4.30 except for the items in app_queue.c.  Do you have any local changes to app_queue you are using?

By: David Vossel (dvossel) 2010-04-20 17:38:59

If there are custom code changes to app_queue.c please provide them or any other changes for that matter.  We don't support third party code, but I may be able to look at it and tell whether it should make a difference or not if the changes are small enough.

By: Bryan Walters (gamegamer43) 2010-04-21 06:43:07

The only code changes to app_queue.c are the changes listed and accepted by Digium in r227424. https://issues.asterisk.org/view.php?id=15168 is where the patch can be found that is being used on this system.

By: David Vossel (dvossel) 2010-04-21 11:56:25

Thanks for the list of changes, they shouldn't affect locking.

I uploaded a patch that I believe will fix this. Please let me if it resolves the issue or not.

By: Bryan Walters (gamegamer43) 2010-04-22 16:11:32

I applied your patch to a system last night and it continued to have the same issues today. Unfortunately I was not able to get a new backtrace from it. Please let me know if additional information is needed going forward so I can ensure we can provide it.

By: David Vossel (dvossel) 2010-04-23 10:07:03

Well, that's not good. I'll take another pass at this and see if I can't come up with another possible cause.  Thanks for testing it.

By: David Vossel (dvossel) 2010-04-23 10:53:58

I uploaded a new patch.  If you can, please test it.  Thanks again for all your help with this issue.

By: Leif Madsen (lmadsen) 2010-04-26 13:17:20

Please test this patch at your earliest convenience so we can move this issue to completion.

By: Schmooze Com (schmoozecom) 2010-04-26 15:22:43

We applied the latest patch last night.  I am at Digium rest of this week teaching a class so if anyone wants to chat one on one about this come find me in the training room.

By: Amilcar S Silvestre (amilcar) 2010-04-27 03:53:28

schmoozecom, did the patch fix the issue?

By: Digium Subversion (svnbot) 2010-04-28 16:16:04

Repository: asterisk
Revision: 259858

U   branches/1.4/channels/chan_local.c
U   branches/1.4/main/channel.c

------------------------------------------------------------------------
r259858 | dvossel | 2010-04-28 16:16:03 -0500 (Wed, 28 Apr 2010) | 33 lines

resolves deadlocks in chan_local

Issue_1.
In the local_hangup() 3 locks must be held at the same time... pvt, pvt->chan,
and pvt->owner.  Proper deadlock avoidance is done when the channel to hangup
is the outbound chan_local channel, but when it is not the outbound channel we
have an issue... We attempt to do deadlock avoidance only on the tech pvt, when
both the tech pvt and the pvt->owner are locked coming into that loop.  By
never giving up the pvt->owner channel deadlock avoidance is not entirely possible.
This patch resolves that by doing deadlock avoidance on both the pvt->owner and the pvt
when trying to get the pvt->chan lock.

Issue_2.
ast_prod() is used in ast_activate_generator() to queue a frame on the channel
and make the channel's read function get called.  This function is used in
ast_activate_generator() while the channel is locked, which mean's the channel
will have a lock both from the generator code and the frame_queue code by the
time it gets to chan_local.c's local_queue_frame code... local_queue_frame
contains some of the same crazy deadlock avoidance that local_hangup requires,
and this recursive lock prevents that deadlock avoidance from happening correctly.
This patch removes ast_prod() from the channel lock so only one lock is held during
the local_queue_frame function.

(closes issue ASTERISK-15958)
Reported by: schmoozecom
Patches:
     issue_17185_v1.diff uploaded by dvossel (license 671)
     issue_17185_v2.diff uploaded by dvossel (license 671)
Tested by: schmoozecom, GameGamer43

Review: https://reviewboard.asterisk.org/r/631/


------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=259858

By: Digium Subversion (svnbot) 2010-04-28 16:20:04

Repository: asterisk
Revision: 259870

_U  trunk/
U   trunk/channels/chan_local.c
U   trunk/main/channel.c

------------------------------------------------------------------------
r259870 | dvossel | 2010-04-28 16:20:04 -0500 (Wed, 28 Apr 2010) | 39 lines

Merged revisions 259858 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
 r259858 | dvossel | 2010-04-28 16:16:03 -0500 (Wed, 28 Apr 2010) | 33 lines
 
 resolves deadlocks in chan_local
 
 Issue_1.
 In the local_hangup() 3 locks must be held at the same time... pvt, pvt->chan,
 and pvt->owner.  Proper deadlock avoidance is done when the channel to hangup
 is the outbound chan_local channel, but when it is not the outbound channel we
 have an issue... We attempt to do deadlock avoidance only on the tech pvt, when
 both the tech pvt and the pvt->owner are locked coming into that loop.  By
 never giving up the pvt->owner channel deadlock avoidance is not entirely possible.
 This patch resolves that by doing deadlock avoidance on both the pvt->owner and the pvt
 when trying to get the pvt->chan lock.
 
 Issue_2.
 ast_prod() is used in ast_activate_generator() to queue a frame on the channel
 and make the channel's read function get called.  This function is used in
 ast_activate_generator() while the channel is locked, which mean's the channel
 will have a lock both from the generator code and the frame_queue code by the
 time it gets to chan_local.c's local_queue_frame code... local_queue_frame
 contains some of the same crazy deadlock avoidance that local_hangup requires,
 and this recursive lock prevents that deadlock avoidance from happening correctly.
 This patch removes ast_prod() from the channel lock so only one lock is held during
 the local_queue_frame function.
 
 (closes issue ASTERISK-15958)
 Reported by: schmoozecom
 Patches:
       issue_17185_v1.diff uploaded by dvossel (license 671)
       issue_17185_v2.diff uploaded by dvossel (license 671)
 Tested by: schmoozecom, GameGamer43
 
 Review: https://reviewboard.asterisk.org/r/631/
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=259870

By: Digium Subversion (svnbot) 2010-04-28 16:26:31

Repository: asterisk
Revision: 259899

_U  branches/1.6.2/
U   branches/1.6.2/channels/chan_local.c
U   branches/1.6.2/main/channel.c

------------------------------------------------------------------------
r259899 | dvossel | 2010-04-28 16:26:30 -0500 (Wed, 28 Apr 2010) | 46 lines

Merged revisions 259870 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
 r259870 | dvossel | 2010-04-28 16:20:03 -0500 (Wed, 28 Apr 2010) | 39 lines
 
 Merged revisions 259858 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
   r259858 | dvossel | 2010-04-28 16:16:03 -0500 (Wed, 28 Apr 2010) | 33 lines
   
   resolves deadlocks in chan_local
   
   Issue_1.
   In the local_hangup() 3 locks must be held at the same time... pvt, pvt->chan,
   and pvt->owner.  Proper deadlock avoidance is done when the channel to hangup
   is the outbound chan_local channel, but when it is not the outbound channel we
   have an issue... We attempt to do deadlock avoidance only on the tech pvt, when
   both the tech pvt and the pvt->owner are locked coming into that loop.  By
   never giving up the pvt->owner channel deadlock avoidance is not entirely possible.
   This patch resolves that by doing deadlock avoidance on both the pvt->owner and the pvt
   when trying to get the pvt->chan lock.
   
   Issue_2.
   ast_prod() is used in ast_activate_generator() to queue a frame on the channel
   and make the channel's read function get called.  This function is used in
   ast_activate_generator() while the channel is locked, which mean's the channel
   will have a lock both from the generator code and the frame_queue code by the
   time it gets to chan_local.c's local_queue_frame code... local_queue_frame
   contains some of the same crazy deadlock avoidance that local_hangup requires,
   and this recursive lock prevents that deadlock avoidance from happening correctly.
   This patch removes ast_prod() from the channel lock so only one lock is held during
   the local_queue_frame function.
   
   (closes issue ASTERISK-15958)
   Reported by: schmoozecom
   Patches:
         issue_17185_v1.diff uploaded by dvossel (license 671)
         issue_17185_v2.diff uploaded by dvossel (license 671)
   Tested by: schmoozecom, GameGamer43
   
   Review: https://reviewboard.asterisk.org/r/631/
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=259899

By: Digium Subversion (svnbot) 2010-04-28 16:33:17

Repository: asterisk
Revision: 259930

_U  branches/1.6.1/
U   branches/1.6.1/channels/chan_local.c
U   branches/1.6.1/main/channel.c

------------------------------------------------------------------------
r259930 | dvossel | 2010-04-28 16:33:17 -0500 (Wed, 28 Apr 2010) | 46 lines

Merged revisions 259870 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
 r259870 | dvossel | 2010-04-28 16:20:03 -0500 (Wed, 28 Apr 2010) | 39 lines
 
 Merged revisions 259858 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
   r259858 | dvossel | 2010-04-28 16:16:03 -0500 (Wed, 28 Apr 2010) | 33 lines
   
   resolves deadlocks in chan_local
   
   Issue_1.
   In the local_hangup() 3 locks must be held at the same time... pvt, pvt->chan,
   and pvt->owner.  Proper deadlock avoidance is done when the channel to hangup
   is the outbound chan_local channel, but when it is not the outbound channel we
   have an issue... We attempt to do deadlock avoidance only on the tech pvt, when
   both the tech pvt and the pvt->owner are locked coming into that loop.  By
   never giving up the pvt->owner channel deadlock avoidance is not entirely possible.
   This patch resolves that by doing deadlock avoidance on both the pvt->owner and the pvt
   when trying to get the pvt->chan lock.
   
   Issue_2.
   ast_prod() is used in ast_activate_generator() to queue a frame on the channel
   and make the channel's read function get called.  This function is used in
   ast_activate_generator() while the channel is locked, which mean's the channel
   will have a lock both from the generator code and the frame_queue code by the
   time it gets to chan_local.c's local_queue_frame code... local_queue_frame
   contains some of the same crazy deadlock avoidance that local_hangup requires,
   and this recursive lock prevents that deadlock avoidance from happening correctly.
   This patch removes ast_prod() from the channel lock so only one lock is held during
   the local_queue_frame function.
   
   (closes issue ASTERISK-15958)
   Reported by: schmoozecom
   Patches:
         issue_17185_v1.diff uploaded by dvossel (license 671)
         issue_17185_v2.diff uploaded by dvossel (license 671)
   Tested by: schmoozecom, GameGamer43
   
   Review: https://reviewboard.asterisk.org/r/631/
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=259930

By: Digium Subversion (svnbot) 2010-04-28 16:35:10

Repository: asterisk
Revision: 259936

_U  branches/1.6.0/
U   branches/1.6.0/channels/chan_local.c
U   branches/1.6.0/main/channel.c

------------------------------------------------------------------------
r259936 | dvossel | 2010-04-28 16:35:10 -0500 (Wed, 28 Apr 2010) | 46 lines

Merged revisions 259870 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
 r259870 | dvossel | 2010-04-28 16:20:03 -0500 (Wed, 28 Apr 2010) | 39 lines
 
 Merged revisions 259858 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
   r259858 | dvossel | 2010-04-28 16:16:03 -0500 (Wed, 28 Apr 2010) | 33 lines
   
   resolves deadlocks in chan_local
   
   Issue_1.
   In the local_hangup() 3 locks must be held at the same time... pvt, pvt->chan,
   and pvt->owner.  Proper deadlock avoidance is done when the channel to hangup
   is the outbound chan_local channel, but when it is not the outbound channel we
   have an issue... We attempt to do deadlock avoidance only on the tech pvt, when
   both the tech pvt and the pvt->owner are locked coming into that loop.  By
   never giving up the pvt->owner channel deadlock avoidance is not entirely possible.
   This patch resolves that by doing deadlock avoidance on both the pvt->owner and the pvt
   when trying to get the pvt->chan lock.
   
   Issue_2.
   ast_prod() is used in ast_activate_generator() to queue a frame on the channel
   and make the channel's read function get called.  This function is used in
   ast_activate_generator() while the channel is locked, which mean's the channel
   will have a lock both from the generator code and the frame_queue code by the
   time it gets to chan_local.c's local_queue_frame code... local_queue_frame
   contains some of the same crazy deadlock avoidance that local_hangup requires,
   and this recursive lock prevents that deadlock avoidance from happening correctly.
   This patch removes ast_prod() from the channel lock so only one lock is held during
   the local_queue_frame function.
   
   (closes issue ASTERISK-15958)
   Reported by: schmoozecom
   Patches:
         issue_17185_v1.diff uploaded by dvossel (license 671)
         issue_17185_v2.diff uploaded by dvossel (license 671)
   Tested by: schmoozecom, GameGamer43
   
   Review: https://reviewboard.asterisk.org/r/631/
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=259936