Summary: | ASTERISK-15213: [patch] [regression] asterisk deadlocks and calls will stop queueing. | ||
Reporter: | valon24 (valon24) | Labels: | |
Date Opened: | 2009-11-25 11:31:03.000-0600 | Date Closed: | 2011-06-07 14:00:54 |
Priority: | Major | Regression? | Yes |
Status: | Closed/Complete | Components: | Channels/chan_agent |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) bug16321.patch ( 1) coreshowlocks1124_1424.txt ( 2) coreshowlocks1124_1457.txt ( 3) coreshowlocks1124_1658.txt ( 4) coreshowlocks1124_1703.txt ( 5) our_confs.rar | |
Description: | Our system will go fine for days processing calls queueing them and then deliver the calls to agents, we will then see the problem at random, it will happen sometimes every 3 or 4 days and then sometimes 3 times in one day. The problem will force us to restart asterisk. A call will come into the inbound route and then the queue application is called. after the queue application is call there is just dead air and nothing more will happen with that call. Once it happens no other calls will be processed they will just keep hitting the queue application and go to silence. Normally in the CLI we will see something like: [Nov 23 14:35:18] VERBOSE[4930] logger.c: -- Executing [909@from-internal:18] Queue("SIP/192.168.6.132-c8170bc0", "909|t|1712|1712") in new stack [Nov 23 14:35:18] VERBOSE[4930] logger.c: -- Started music on hold, class 'acc_1', on SIP/192.168.6.132-c8170bc0 After the system deadlocks we will only see: [Nov 23 14:36:14] VERBOSE[5009] logger.c: -- Executing [903@from-internal:18] Queue("SIP/192.168.6.132-c8114300", "903|t|1709|1709") in new stack I have will attach the output of core show locks and so everyone can see the full log but this is normally what we see each time we see the issue This happened 3 times yesterday: ======================================================================= === Currently Held Locks ============================================== ======================================================================= === === <file> <line num> <function> <lock name> <lock addr> (times locked) === === Thread ID: 1097607488 (do_devstate_changes started at [ 363] devicestate.c ast_device_state_engine_init()) === ---> Lock #0 (devicestate.c): MUTEX 289 do_state_change &(&devstate_cbs)->lock 0x75e390 (1) === ---> Lock #1 (chan_agent.c): MUTEX 304 agent_devicestate_cb &(&agents)->lock 0x2aaaba9a24f0 (1) === ---> Waiting for Lock #2 (chan_agent.c): MUTEX 313 agent_devicestate_cb &p->lock 0x2aaaaceecd80 (1) === --- ---> Locked Here: chan_agent.c line 498 (agent_read) === ------------------------------------------------------------------- === === Thread ID: 1106180416 (session_do started at [ 2522] manager.c accept_thread()) === ---> Lock #0 (manager.c): RDLOCK 2281 process_message &actionlock 0x76b720 (1) === ---> Waiting for Lock #1 (app_queue.c): MUTEX 4781 manager_queues_status &(&queues)->lock 0x2aaab6129bd0 (1) === --- ---> Locked Here: app_queue.c line 2842 (try_calling) === ------------------------------------------------------------------- === === Thread ID: 1112938816 (pbx_thread started at [ 2623] pbx.c ast_pbx_start()) === ---> Lock #0 (channel.c): MUTEX 2061 __ast_read &chan->lock 0x2768a68 (1) === ---> Lock #1 (chan_agent.c): MUTEX 498 agent_read &p->lock 0x2aaaaceecd80 (1) === ---> Waiting for Lock #2 (chan_agent.c): MUTEX 1669 agent_logoff &(&agents)->lock 0x2aaaba9a24f0 (1) === --- ---> Locked Here: chan_agent.c line 304 (agent_devicestate_cb) === ------------------------------------------------------------------- === === Thread ID: 1129191744 (pbx_thread started at [ 2623] pbx.c ast_pbx_start()) === ---> Lock #0 (app_queue.c): MUTEX 2842 try_calling &(&queues)->lock 0x2aaab6129bd0 (1) === ---> Waiting for Lock #1 (chan_agent.c): MUTEX 1456 agent_request &(&agents)->lock 0x2aaaba9a24f0 (1) === --- ---> Locked Here: chan_agent.c line 304 (agent_devicestate_cb) === ------------------------------------------------------------------- | ||
Comments: | By: valon24 (valon24) 2009-11-25 11:59:27.000-0600 This issue actually happened 4 times yesterday 1 time 3 days prior and then randomly for the last 6 weeks. We are using multiple versions and have seen it happen across all of the systems 1.4.25 1.4.26.2 1.4.26.3 we upgraded to 1.4.27 a few days ago and still see the same issue, however we do not recall seeing this issue in 1.4.24. Some additional info on the system here: The system processes about 4k calls per day. There are normally between 50 and 100 agents signed in to the box. This box is used only as an ACD server. Please let me know if there is anything I can add to help out with this as it seems like a fairly serious problem. Thanks By: valon24 (valon24) 2009-11-25 21:26:17.000-0600 We again saw the issue today on a machine running 1.4.25 this machine had the deadlock about 2 weeks ago. By: Leif Madsen (lmadsen) 2009-12-01 12:44:38.000-0600 Could you provide your configuration and console output so this can be reproduced in the lab? By: valon24 (valon24) 2009-12-01 17:48:20.000-0600 On the 26th, we downgraded three systems to 1.4.24.1. All three have been running flawlessly and we have not run into the same issue. One of the boxes is still running AsteriskNow 1.5 64bit with 1.4.26.3, and has continued to have this happen once every day since and twice this morning. We will be downgrading this machine to version 1.4.24.1 tonight. I'll post our configuration the moment i have a chance. The full log does not show anything strange during these occurrences. By: valon24 (valon24) 2009-12-05 13:09:29.000-0600 Update: Since the 26th we have seen no issues running with 1.4.24.1 We have since downgraded our last install to 1.4.24.1 running AsteriskNow. I have our custom configs for that system. with the extentions_custom, queues_custom, agents etc. Would you like the full configs or just an example of a single item from each? By: Leif Madsen (lmadsen) 2009-12-07 12:32:35.000-0600 I'd say the full configs just in case it has something to do with a large number of items vs. a single item By: valon24 (valon24) 2009-12-08 14:17:31.000-0600 I added our custom configs. We have still seen no issues since the 1.4.24.1 downgrade. Let me know if we can get you anything else. By: Jeff Peeler (jpeeler) 2010-01-05 18:09:40.000-0600 I'm almost positive the uploaded patch will fix this deadlock scenario, but if the reporter could test anyway that would be appreciated. By: Digium Subversion (svnbot) 2010-01-19 11:22:20.000-0600 Repository: asterisk Revision: 241227 U branches/1.4/channels/chan_agent.c ------------------------------------------------------------------------ r241227 | jpeeler | 2010-01-19 11:22:19 -0600 (Tue, 19 Jan 2010) | 13 lines Fix deadlock in agent_read by removing call to agent_logoff. One must always lock the agents list lock before the agent private. agent_read locks the private immediately, so locking the agents list lock is not an option (which is what agent_logoff requires). Because agent_read already has access to the agent private all that is necessary is to do the required hanging up that agent_logoff performed. (closes issue ASTERISK-15213) Reported by: valon24 Patches: bug16321.patch uploaded by jpeeler (license 325) ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=241227 By: Digium Subversion (svnbot) 2010-01-19 12:46:12.000-0600 Repository: asterisk Revision: 241314 _U trunk/ U trunk/channels/chan_agent.c ------------------------------------------------------------------------ r241314 | jpeeler | 2010-01-19 12:46:11 -0600 (Tue, 19 Jan 2010) | 20 lines Merged revisions 241227 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r241227 | jpeeler | 2010-01-19 11:22:18 -0600 (Tue, 19 Jan 2010) | 13 lines Fix deadlock in agent_read by removing call to agent_logoff. One must always lock the agents list lock before the agent private. agent_read locks the private immediately, so locking the agents list lock is not an option (which is what agent_logoff requires). Because agent_read already has access to the agent private all that is necessary is to do the required hanging up that agent_logoff performed. (closes issue ASTERISK-15213) Reported by: valon24 Patches: bug16321.patch uploaded by jpeeler (license 325) ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=241314 By: Digium Subversion (svnbot) 2010-01-19 12:58:11.000-0600 Repository: asterisk Revision: 241316 _U branches/1.6.0/ U branches/1.6.0/channels/chan_agent.c ------------------------------------------------------------------------ r241316 | jpeeler | 2010-01-19 12:58:11 -0600 (Tue, 19 Jan 2010) | 27 lines Merged revisions 241314 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ................ r241314 | jpeeler | 2010-01-19 12:46:11 -0600 (Tue, 19 Jan 2010) | 20 lines Merged revisions 241227 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r241227 | jpeeler | 2010-01-19 11:22:18 -0600 (Tue, 19 Jan 2010) | 13 lines Fix deadlock in agent_read by removing call to agent_logoff. One must always lock the agents list lock before the agent private. agent_read locks the private immediately, so locking the agents list lock is not an option (which is what agent_logoff requires). Because agent_read already has access to the agent private all that is necessary is to do the required hanging up that agent_logoff performed. (closes issue ASTERISK-15213) Reported by: valon24 Patches: bug16321.patch uploaded by jpeeler (license 325) ........ ................ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=241316 By: Digium Subversion (svnbot) 2010-01-19 13:01:10.000-0600 Repository: asterisk Revision: 241317 _U branches/1.6.1/ U branches/1.6.1/channels/chan_agent.c ------------------------------------------------------------------------ r241317 | jpeeler | 2010-01-19 13:01:10 -0600 (Tue, 19 Jan 2010) | 27 lines Merged revisions 241314 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ................ r241314 | jpeeler | 2010-01-19 12:46:11 -0600 (Tue, 19 Jan 2010) | 20 lines Merged revisions 241227 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r241227 | jpeeler | 2010-01-19 11:22:18 -0600 (Tue, 19 Jan 2010) | 13 lines Fix deadlock in agent_read by removing call to agent_logoff. One must always lock the agents list lock before the agent private. agent_read locks the private immediately, so locking the agents list lock is not an option (which is what agent_logoff requires). Because agent_read already has access to the agent private all that is necessary is to do the required hanging up that agent_logoff performed. (closes issue ASTERISK-15213) Reported by: valon24 Patches: bug16321.patch uploaded by jpeeler (license 325) ........ ................ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=241317 By: Digium Subversion (svnbot) 2010-01-19 13:02:48.000-0600 Repository: asterisk Revision: 241318 _U branches/1.6.2/ U branches/1.6.2/channels/chan_agent.c ------------------------------------------------------------------------ r241318 | jpeeler | 2010-01-19 13:02:48 -0600 (Tue, 19 Jan 2010) | 27 lines Merged revisions 241314 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ................ r241314 | jpeeler | 2010-01-19 12:46:11 -0600 (Tue, 19 Jan 2010) | 20 lines Merged revisions 241227 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r241227 | jpeeler | 2010-01-19 11:22:18 -0600 (Tue, 19 Jan 2010) | 13 lines Fix deadlock in agent_read by removing call to agent_logoff. One must always lock the agents list lock before the agent private. agent_read locks the private immediately, so locking the agents list lock is not an option (which is what agent_logoff requires). Because agent_read already has access to the agent private all that is necessary is to do the required hanging up that agent_logoff performed. (closes issue ASTERISK-15213) Reported by: valon24 Patches: bug16321.patch uploaded by jpeeler (license 325) ........ ................ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=241318 |