[Home]

Summary:ASTERISK-11973: Lock errors on the CLI
Reporter:Doug (doug)Labels:
Date Opened:2008-05-05 10:05:32Date Closed:2008-05-13 16:14:46
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Channels/chan_local
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) chan_local.so
( 1) chan-local.txt
( 2) lock_info
( 3) locks20080507.txt
( 4) messages.txt
Description:We have seen lock errors on a few different servers running 1.4.19. These seem to be mostly related to the queues. If anyone can please shed som light on these errors. In some cases the system seems to stop call delivery on the queues.

****** ADDITIONAL INFORMATION ******

[May 5 09:37:15] ERROR[6008]: /dar/build/asterisk-1.4.19/include/asterisk/lock.h:281 __ast_pthread_mutex_destroy: channel.c line 1258 (ast_channel_free): Error destroying mutex &chan->lock: Device or resource busy

[May 5 09:36:32] ERROR[6008]: /dar/build/asterisk-1.4.19/include/asterisk/lock.h:462 __ast_pthread_mutex_unlock: chan_local.c line 188 (local_queue_frame): mutex '&us->lock' freed more times than we've locked!

[May 5 09:36:32] ERROR[6008]: /dar/build/asterisk-1.4.19/include/asterisk/lock.h:479 __ast_pthread_mutex_unlock: chan_local.c line 188 (local_queue_frame): Error releasing mutex: Operation not permitted
Comments:By: Russell Bryant (russell) 2008-05-05 11:32:36

Please give 1.4.20 a try

By: Julian Yap (jyap) 2008-05-06 19:07:36

I'm not sure if it's the same issue but running 1.4.20-rc1, I see similar errors.

[May  6 13:36:01] ERROR[8583] /usr/src/asterisk-1.4.20-rc1/include/asterisk/lock.h: channel.c line 4549 (ast_channel_unlock): mutex '&chan->lock' freed more times than we've locked!
[May  6 13:36:01] ERROR[8583] /usr/src/asterisk-1.4.20-rc1/include/asterisk/lock.h: channel.c line 4549 (ast_channel_unlock): Error releasing mutex: Operation not permitted
[May  6 13:36:05] ERROR[8583] /usr/src/asterisk-1.4.20-rc1/include/asterisk/lock.h: channel.c line 1258 (ast_channel_free): Error destroying mutex &chan->lock: Device or resource busy
[May  6 13:56:04] ERROR[11072] /usr/src/asterisk-1.4.20-rc1/include/asterisk/lock.h: channel.c line 4549 (ast_channel_unlock): mutex '&chan->lock' freed more times than we've locked!
[May  6 13:56:04] ERROR[11072] /usr/src/asterisk-1.4.20-rc1/include/asterisk/lock.h: channel.c line 4549 (ast_channel_unlock): Error releasing mutex: Operation not permitted
[May  6 13:56:14] ERROR[11072] /usr/src/asterisk-1.4.20-rc1/include/asterisk/lock.h: channel.c line 1258 (ast_channel_free): Error destroying mutex &chan->lock: Device or resource busy

By: Doug (doug) 2008-05-07 02:35:22

Russell - Can you confirm if 1.4.20 has resolved this issue? I see jyap had a similar problem on 20 rc1. Im not sure that it is related but we had a sever that become unresponsive and no channel count on show channels and I saw the same lock message in the CLI. This seems to be a constant problem and many tickets have been created around this. I have attached the lock info.



By: Doug (doug) 2008-05-07 05:29:09

We updated a server to 1.4.20 and still see the same errors on CLI. We have had 3 times today where the lock has happened. This is strange as 1.4.19 was locking once every few days but before we updated had it three times in a few hours. Any Ideas what is causing this and any idea if the CLI and lock_info errors are related?

By: Doug (doug) 2008-05-07 05:32:59

[May 7 11:06:50] ERROR[10324]: /dar/build/asterisk-1.4.20/include/asterisk/lock.h:479 __ast_pthread_mutex_unlock: chan_local.c line 188 (local_queue_frame): mutex '&us->lock' freed more times than we've locked!
[May 7 11:06:50] ERROR[10324]: /dar/build/asterisk-1.4.20/include/asterisk/lock.h:496 __ast_pthread_mutex_unlock: chan_local.c line 188 (local_queue_frame): Error releasing mutex: Operation not permitted

By: Julian Yap (jyap) 2008-05-07 14:42:38

Uploaded is locks20080507.txt which required an Asterisk restart to resolve.

By: Doug (doug) 2008-05-08 06:12:21

jyap - are you using queues extensivly? if so are you using direct agents or local channel.



By: Julian Yap (jyap) 2008-05-08 13:29:19

DougUDI, I am not using queues.  Your bug may or may not be related to mine so I have filed a separate bug ASTERISK-11990.

By: Doug (doug) 2008-05-09 07:12:27

Please can someone assist here we have been having major problems with this and have no resolution yet. Thanks

By: Doug (doug) 2008-05-12 06:39:16

We rolled back to 1.4.17 for now allthough this has problems on the IAX channels and problems with Local channel for queues. Anyone have any idea's?

By: Doug (doug) 2008-05-12 08:52:27

I have attached a copy of our chan_local as most locks seem to occur on line 188 of this file...if this helps.

By: Phillip Smith (phillip) 2008-05-12 09:12:05

We currently get this on Asterisk 1.4.18/19/20-rc1/20-rc2

By: Doug (doug) 2008-05-12 09:44:46

Even though our locks seem to be related to chan_local I am doubting that there is much difference between our problem and ticket 0012603 as we both get the same error on the same line in channel.c as below:

[May 7 05:38:06] ERROR[5024] /usr/src/asterisk-1.4.20-rc1/include/asterisk/lock.h: channel.c line 1258 (ast_channel_free): Error destroying mutex &chan->lock: Device or resource busy

By: Mark Michelson (mmichelson) 2008-05-12 12:05:44

A few notes here:

@DougUDI: In your original bug description, the timestamps are out of order on the messages. Did you re-arrange them or did they actually appear in that order? This information could make a big difference in trying to track this issue down. Also, uploading chan_local.so is useless since .so files are compiled and don't mean anything to the average developer and also since we can use svn to get whatever revision of chan_local.c you are presently using.

@jyap: I'd recommend that you disable the DEBUG_CHANNEL_LOCKS compiler option. It's pretty much a subset of DETECT_DEADLOCKS, and in a case such as this one, it actually hinders the ability to determine where a deadlock is occurring. The reason for this is that when DEBUG_CHANNEL_LOCKS is on, there is a single function that all channel locks stem from, and so that same function is shown in the core show locks output for all channels that are locked. In contrast, when DEBUG_CHANNEL_LOCKS is disabled, ast_channel_lock is a macro, meaning that the core show locks output will show exactly where the call to ast_channel_lock was made.

@all: I looked at the code for about an hour and while I can't say for sure the circumstances behind this, it seems like this is probably stemming from a race condition between threads and their lock operations. I'll have more information as I figure out more.



By: Julian Yap (jyap) 2008-05-12 13:16:21

putnopvut: OK, I'll recompile Asterisk _without DEBUG_CHANNEL_LOCKS compiler option tonight.

By: Doug (doug) 2008-05-13 01:27:22

The timestamps are fine sorry I pasted it that way was not out of the CLI like that. I have added a /var/log/asterisk/messages as the lock accours as well. You will see the lock errors in that as well.

Is there anything else you need?

By: Doug (doug) 2008-05-13 10:25:34

Thanks for the help so far. We have found that the lock is definitly related to the chan_local as we moved to Direct channel on 1 of the problem servers and this has stopped the locks. We do need to use Local channel as some of the features are needed. Please let me know if you have found any more detail around the locking.

Thanks

By: Digium Subversion (svnbot) 2008-05-13 16:13:49

Repository: asterisk
Revision: 116038

U   branches/1.4/channels/chan_local.c

------------------------------------------------------------------------
r116038 | russell | 2008-05-13 16:12:25 -0500 (Tue, 13 May 2008) | 24 lines

Fix a deadlock involving channel autoservice and chan_local that was debugged
and fixed by mmichelson and me.

We observed a system that had a bunch of threads stuck in ast_autoservice_stop().
The reason these threads were waiting around is because this function waits to
ensure that the channel list in the autoservice thread gets rebuilt before the
stop() function returns.  However, the autoservice thread was also locked, so
the autoservice channel list was never getting rebuilt.

The autoservice thread was stuck waiting for the channel lock on a local channel.
However, the local channel was locked by a thread that was stuck in the autoservice
stop function.

It turned out that the issue came down to the local_queue_frame() function in
chan_local.  This function assumed that one of the channels passed in as an
argument was locked when called.  However, that was not always the case.  There
were multiple cases in which this channel was not locked when the function was
called.  We fixed up chan_local to indicate to this function whether this channel
was locked or not.  The previous assumption had caused local_queue_frame() to
improperly return with the channel locked, where it would then never get unlocked.

(closes issue ASTERISK-11973)
(related to issue ASTERISK-11990)

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=116038

By: Digium Subversion (svnbot) 2008-05-13 16:14:33

Repository: asterisk
Revision: 116039

_U  trunk/
U   trunk/channels/chan_local.c

------------------------------------------------------------------------
r116039 | russell | 2008-05-13 16:14:28 -0500 (Tue, 13 May 2008) | 32 lines

Merged revisions 116038 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r116038 | russell | 2008-05-13 16:17:23 -0500 (Tue, 13 May 2008) | 24 lines

Fix a deadlock involving channel autoservice and chan_local that was debugged
and fixed by mmichelson and me.

We observed a system that had a bunch of threads stuck in ast_autoservice_stop().
The reason these threads were waiting around is because this function waits to
ensure that the channel list in the autoservice thread gets rebuilt before the
stop() function returns.  However, the autoservice thread was also locked, so
the autoservice channel list was never getting rebuilt.

The autoservice thread was stuck waiting for the channel lock on a local channel.
However, the local channel was locked by a thread that was stuck in the autoservice
stop function.

It turned out that the issue came down to the local_queue_frame() function in
chan_local.  This function assumed that one of the channels passed in as an
argument was locked when called.  However, that was not always the case.  There
were multiple cases in which this channel was not locked when the function was
called.  We fixed up chan_local to indicate to this function whether this channel
was locked or not.  The previous assumption had caused local_queue_frame() to
improperly return with the channel locked, where it would then never get unlocked.

(closes issue ASTERISK-11973)
(related to issue ASTERISK-11990)

........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=116039

By: Digium Subversion (svnbot) 2008-05-13 16:14:46

Repository: asterisk
Revision: 116040

_U  branches/1.6.0/
U   branches/1.6.0/channels/chan_local.c

------------------------------------------------------------------------
r116040 | russell | 2008-05-13 16:14:41 -0500 (Tue, 13 May 2008) | 40 lines

Merged revisions 116039 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
r116039 | russell | 2008-05-13 16:18:55 -0500 (Tue, 13 May 2008) | 32 lines

Merged revisions 116038 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r116038 | russell | 2008-05-13 16:17:23 -0500 (Tue, 13 May 2008) | 24 lines

Fix a deadlock involving channel autoservice and chan_local that was debugged
and fixed by mmichelson and me.

We observed a system that had a bunch of threads stuck in ast_autoservice_stop().
The reason these threads were waiting around is because this function waits to
ensure that the channel list in the autoservice thread gets rebuilt before the
stop() function returns.  However, the autoservice thread was also locked, so
the autoservice channel list was never getting rebuilt.

The autoservice thread was stuck waiting for the channel lock on a local channel.
However, the local channel was locked by a thread that was stuck in the autoservice
stop function.

It turned out that the issue came down to the local_queue_frame() function in
chan_local.  This function assumed that one of the channels passed in as an
argument was locked when called.  However, that was not always the case.  There
were multiple cases in which this channel was not locked when the function was
called.  We fixed up chan_local to indicate to this function whether this channel
was locked or not.  The previous assumption had caused local_queue_frame() to
improperly return with the channel locked, where it would then never get unlocked.

(closes issue ASTERISK-11973)
(related to issue ASTERISK-11990)

........

................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=116040