[Home]

Summary:ASTERISK-15213: [patch] [regression] asterisk deadlocks and calls will stop queueing.
Reporter:valon24 (valon24)Labels:
Date Opened:2009-11-25 11:31:03.000-0600Date Closed:2011-06-07 14:00:54
Priority:MajorRegression?Yes
Status:Closed/CompleteComponents:Channels/chan_agent
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) bug16321.patch
( 1) coreshowlocks1124_1424.txt
( 2) coreshowlocks1124_1457.txt
( 3) coreshowlocks1124_1658.txt
( 4) coreshowlocks1124_1703.txt
( 5) our_confs.rar
Description:Our system will go fine for days processing calls queueing them and then deliver the calls to agents, we will then see the problem at random, it will happen sometimes every 3 or 4 days and then sometimes 3 times in one day.

The problem will force us to restart asterisk.

A call will come into the inbound route and then the queue application is called.
after the queue application is call there is just dead air and nothing more will happen with that call. Once it happens no other calls will be processed they will just keep hitting the queue application and go to silence.

Normally in the CLI we will see something like:

[Nov 23 14:35:18] VERBOSE[4930] logger.c:     -- Executing [909@from-internal:18] Queue("SIP/192.168.6.132-c8170bc0", "909|t|1712|1712") in new stack

[Nov 23 14:35:18] VERBOSE[4930] logger.c:     -- Started music on hold, class 'acc_1', on SIP/192.168.6.132-c8170bc0


After the system deadlocks we will only see:


[Nov 23 14:36:14] VERBOSE[5009] logger.c:     -- Executing [903@from-internal:18] Queue("SIP/192.168.6.132-c8114300", "903|t|1709|1709") in new stack

I have will attach the output of core show locks and so everyone can see the full log but this is normally what we see each time we see the issue

This happened 3 times yesterday:

=======================================================================
=== Currently Held Locks ==============================================
=======================================================================
===
=== <file> <line num> <function> <lock name> <lock addr> (times locked)
===
=== Thread ID: 1097607488 (do_devstate_changes  started at [  363] devicestate.c ast_device_state_engine_init())
=== ---> Lock #0 (devicestate.c): MUTEX 289 do_state_change &(&devstate_cbs)->lock 0x75e390 (1)
=== ---> Lock #1 (chan_agent.c): MUTEX 304 agent_devicestate_cb &(&agents)->lock 0x2aaaba9a24f0 (1)
=== ---> Waiting for Lock #2 (chan_agent.c): MUTEX 313 agent_devicestate_cb &p->lock 0x2aaaaceecd80 (1)
=== --- ---> Locked Here: chan_agent.c line 498 (agent_read)
=== -------------------------------------------------------------------
===
=== Thread ID: 1106180416 (session_do           started at [ 2522] manager.c accept_thread())
=== ---> Lock #0 (manager.c): RDLOCK 2281 process_message &actionlock 0x76b720 (1)
=== ---> Waiting for Lock #1 (app_queue.c): MUTEX 4781 manager_queues_status &(&queues)->lock 0x2aaab6129bd0 (1)
=== --- ---> Locked Here: app_queue.c line 2842 (try_calling)
=== -------------------------------------------------------------------
===
=== Thread ID: 1112938816 (pbx_thread           started at [ 2623] pbx.c ast_pbx_start())
=== ---> Lock #0 (channel.c): MUTEX 2061 __ast_read &chan->lock 0x2768a68 (1)
=== ---> Lock #1 (chan_agent.c): MUTEX 498 agent_read &p->lock 0x2aaaaceecd80 (1)
=== ---> Waiting for Lock #2 (chan_agent.c): MUTEX 1669 agent_logoff &(&agents)->lock 0x2aaaba9a24f0 (1)
=== --- ---> Locked Here: chan_agent.c line 304 (agent_devicestate_cb)
=== -------------------------------------------------------------------
===
=== Thread ID: 1129191744 (pbx_thread           started at [ 2623] pbx.c ast_pbx_start())
=== ---> Lock #0 (app_queue.c): MUTEX 2842 try_calling &(&queues)->lock 0x2aaab6129bd0 (1)
=== ---> Waiting for Lock #1 (chan_agent.c): MUTEX 1456 agent_request &(&agents)->lock 0x2aaaba9a24f0 (1)
=== --- ---> Locked Here: chan_agent.c line 304 (agent_devicestate_cb)
=== -------------------------------------------------------------------





Comments:By: valon24 (valon24) 2009-11-25 11:59:27.000-0600

This issue actually happened 4 times yesterday 1 time 3 days prior and then randomly for the last 6 weeks.

We are using multiple versions and have seen it happen across all of the systems
1.4.25 1.4.26.2 1.4.26.3 we upgraded to 1.4.27 a few days ago and still see the same issue, however we do not recall seeing this issue in 1.4.24.

Some additional info on the system here:
The system processes about 4k calls per day.
There are normally between 50 and 100 agents signed in to the box.
This box is used only as an ACD server.

Please let me know if there is anything I can add to help out with this as it seems like a fairly serious problem.

Thanks

By: valon24 (valon24) 2009-11-25 21:26:17.000-0600

We again saw the issue today on a machine running 1.4.25 this machine had the deadlock about 2 weeks ago.

By: Leif Madsen (lmadsen) 2009-12-01 12:44:38.000-0600

Could you provide your configuration and console output so this can be reproduced in the lab?

By: valon24 (valon24) 2009-12-01 17:48:20.000-0600

On the 26th, we downgraded three systems to 1.4.24.1.  All three have been running flawlessly and we have not run into the same issue.  One of the boxes is still running AsteriskNow 1.5 64bit with 1.4.26.3, and has continued to have this happen once every day since and twice this morning.  We will be downgrading this machine to version 1.4.24.1 tonight.

I'll post our configuration the moment i have a chance.  The full log does not show anything strange during these occurrences.

By: valon24 (valon24) 2009-12-05 13:09:29.000-0600

Update:

Since the 26th we have seen no issues running with 1.4.24.1

We have since downgraded our last install to 1.4.24.1 running AsteriskNow.

I have our custom configs for that system. with the extentions_custom, queues_custom, agents etc.

Would you like the full configs or just an example of a single item from each?

By: Leif Madsen (lmadsen) 2009-12-07 12:32:35.000-0600

I'd say the full configs just in case it has something to do with a large number of items vs. a single item

By: valon24 (valon24) 2009-12-08 14:17:31.000-0600

I added our custom configs.

We have still seen no issues since the 1.4.24.1 downgrade.

Let me know if we can get you anything else.

By: Jeff Peeler (jpeeler) 2010-01-05 18:09:40.000-0600

I'm almost positive the uploaded patch will fix this deadlock scenario, but if the reporter could test anyway that would be appreciated.

By: Digium Subversion (svnbot) 2010-01-19 11:22:20.000-0600

Repository: asterisk
Revision: 241227

U   branches/1.4/channels/chan_agent.c

------------------------------------------------------------------------
r241227 | jpeeler | 2010-01-19 11:22:19 -0600 (Tue, 19 Jan 2010) | 13 lines

Fix deadlock in agent_read by removing call to agent_logoff.

One must always lock the agents list lock before the agent private. agent_read
locks the private immediately, so locking the agents list lock is not an
option (which is what agent_logoff requires). Because agent_read already
has access to the agent private all that is necessary is to do the required
hanging up that agent_logoff performed.

(closes issue ASTERISK-15213)
Reported by: valon24
Patches:
     bug16321.patch uploaded by jpeeler (license 325)

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=241227

By: Digium Subversion (svnbot) 2010-01-19 12:46:12.000-0600

Repository: asterisk
Revision: 241314

_U  trunk/
U   trunk/channels/chan_agent.c

------------------------------------------------------------------------
r241314 | jpeeler | 2010-01-19 12:46:11 -0600 (Tue, 19 Jan 2010) | 20 lines

Merged revisions 241227 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
 r241227 | jpeeler | 2010-01-19 11:22:18 -0600 (Tue, 19 Jan 2010) | 13 lines
 
 Fix deadlock in agent_read by removing call to agent_logoff.
 
 One must always lock the agents list lock before the agent private. agent_read
 locks the private immediately, so locking the agents list lock is not an
 option (which is what agent_logoff requires). Because agent_read already
 has access to the agent private all that is necessary is to do the required
 hanging up that agent_logoff performed.
 
 (closes issue ASTERISK-15213)
 Reported by: valon24
 Patches:
       bug16321.patch uploaded by jpeeler (license 325)
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=241314

By: Digium Subversion (svnbot) 2010-01-19 12:58:11.000-0600

Repository: asterisk
Revision: 241316

_U  branches/1.6.0/
U   branches/1.6.0/channels/chan_agent.c

------------------------------------------------------------------------
r241316 | jpeeler | 2010-01-19 12:58:11 -0600 (Tue, 19 Jan 2010) | 27 lines

Merged revisions 241314 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
 r241314 | jpeeler | 2010-01-19 12:46:11 -0600 (Tue, 19 Jan 2010) | 20 lines
 
 Merged revisions 241227 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
   r241227 | jpeeler | 2010-01-19 11:22:18 -0600 (Tue, 19 Jan 2010) | 13 lines
   
   Fix deadlock in agent_read by removing call to agent_logoff.
   
   One must always lock the agents list lock before the agent private. agent_read
   locks the private immediately, so locking the agents list lock is not an
   option (which is what agent_logoff requires). Because agent_read already
   has access to the agent private all that is necessary is to do the required
   hanging up that agent_logoff performed.
   
   (closes issue ASTERISK-15213)
   Reported by: valon24
   Patches:
         bug16321.patch uploaded by jpeeler (license 325)
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=241316

By: Digium Subversion (svnbot) 2010-01-19 13:01:10.000-0600

Repository: asterisk
Revision: 241317

_U  branches/1.6.1/
U   branches/1.6.1/channels/chan_agent.c

------------------------------------------------------------------------
r241317 | jpeeler | 2010-01-19 13:01:10 -0600 (Tue, 19 Jan 2010) | 27 lines

Merged revisions 241314 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
 r241314 | jpeeler | 2010-01-19 12:46:11 -0600 (Tue, 19 Jan 2010) | 20 lines
 
 Merged revisions 241227 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
   r241227 | jpeeler | 2010-01-19 11:22:18 -0600 (Tue, 19 Jan 2010) | 13 lines
   
   Fix deadlock in agent_read by removing call to agent_logoff.
   
   One must always lock the agents list lock before the agent private. agent_read
   locks the private immediately, so locking the agents list lock is not an
   option (which is what agent_logoff requires). Because agent_read already
   has access to the agent private all that is necessary is to do the required
   hanging up that agent_logoff performed.
   
   (closes issue ASTERISK-15213)
   Reported by: valon24
   Patches:
         bug16321.patch uploaded by jpeeler (license 325)
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=241317

By: Digium Subversion (svnbot) 2010-01-19 13:02:48.000-0600

Repository: asterisk
Revision: 241318

_U  branches/1.6.2/
U   branches/1.6.2/channels/chan_agent.c

------------------------------------------------------------------------
r241318 | jpeeler | 2010-01-19 13:02:48 -0600 (Tue, 19 Jan 2010) | 27 lines

Merged revisions 241314 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
 r241314 | jpeeler | 2010-01-19 12:46:11 -0600 (Tue, 19 Jan 2010) | 20 lines
 
 Merged revisions 241227 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
   r241227 | jpeeler | 2010-01-19 11:22:18 -0600 (Tue, 19 Jan 2010) | 13 lines
   
   Fix deadlock in agent_read by removing call to agent_logoff.
   
   One must always lock the agents list lock before the agent private. agent_read
   locks the private immediately, so locking the agents list lock is not an
   option (which is what agent_logoff requires). Because agent_read already
   has access to the agent private all that is necessary is to do the required
   hanging up that agent_logoff performed.
   
   (closes issue ASTERISK-15213)
   Reported by: valon24
   Patches:
         bug16321.patch uploaded by jpeeler (license 325)
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=241318