Summary: | ASTERISK-13059: CPU Usage Increases and then Asterisk Crashes | ||
Reporter: | A.R. Nasir Qureshi (nasirq) | Labels: | |
Date Opened: | 2008-11-12 08:45:39.000-0600 | Date Closed: | 2011-06-07 14:00:41 |
Priority: | Critical | Regression? | No |
Status: | Closed/Complete | Components: | General |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) debug1-optimized.txt ( 1) Lock-Log-1-4-2009.txt ( 2) Lock-Log-30-4-2009.txt | |
Description: | I am using Asterisk in a Call Center Environment. I am using AgentCallbackLogin for Agents. The Agents are on SIP phones (Polycom). I am using the Asterisk::Manager perl api for connection to the manager interface for CTI. One perl script runs (using Asterisk::Manager) and does two things. First it send Action: Status every 500 ms to get the list of currently active channels and populates them in a MySQL table. Secondly it reads a directory for files, which if created consists Managers Actions. It reads the file, sends the actions to Asterisk and deletes the file. This way I can send many Managers actions while having only one manager connection to the Asterisk. I have used this approach to reduce the load on Asterisk if many clients need to read the Events or send actions. The problem I am facing is that sometimes (and I am unable to find the steps to reproduce) when sending actions, this interface seems to hang. No more actions get processed and the CPU usage starts to climb up. Using top I have seen more than two and sometimes five to six asterisk threads using a lot of CPU. Some times new calls go through and sometimes not. Sometimes after the active calls are hanged up everything gets back to normal and some times it gets worst. Using soft hangup on CLI does not work and even using restart now does not do anything. The only option left is using killall -9 asterisk. At this stage of this issue, I need expert guidance to enable me to further dig in to find out what could be wrong when this occurs. Please help me out. | ||
Comments: | By: Leif Madsen (lmadsen) 2008-11-12 14:16:44.000-0600 Can you please enabled DONT_OPTIMIZE in menuselect, then rebuild and reinstall Asterisk, and when you get a core dump (after asterisk crashes), can you attach the backtrace? Follow the instructions in the doc/backtrace.txt file located in your Asterisk source directory. Thanks! By: A.R. Nasir Qureshi (nasirq) 2008-11-12 23:43:36.000-0600 I'll do that. And what if asterisk does not crash, but uses a lot of CPU and becomes non responsive ? Will killing it with -9 create a core dump ? If not, how can I know what is happening then or what part or asterisk is causing the problem ? By: has1084 (hasitha) 2008-11-14 05:58:46.000-0600 Hi, I'm having exactly the same issue. The service crashed this morning again after running 6 hours. When it crashed CPU time of the process is 96 hours and CPU usage is 130%. I had to use 'kill -9' (as usual) because there was no other way to restart the service. I complied the asterisk with DONT_OPTIMIZE enabled and it was running with option -g when it crashed. But I can't find any core dump files in /tmp directory. Please advice what else can be done? By: has1084 (hasitha) 2008-11-17 03:34:46.000-0600 Hi, Asterisk does not produce any core dump files. Actually, it does not crash and stop. The process just freeze and stop responding to any sip messages. Does anyone have any idea about this problem? Thanks! By: Leif Madsen (lmadsen) 2009-02-02 14:58:19.000-0600 Does this still happen to be an issue? By: A.R. Nasir Qureshi (nasirq) 2009-02-09 03:33:37.000-0600 I have not tested the procedures that caused the problems for some time. Let me do it and then I'll know if it is happening again or not. BTW, I need a way to know where the asterisk is stuck when it does not crash (and produce a core dump) and just uses a lot of CPU and becomes non responsive. By: Leif Madsen (lmadsen) 2009-02-09 10:47:16.000-0600 I've assigned to file for now. file: can you answer the reporters question? I'm thinking that at this point we're looking for the issue to be reproduced in valgrind? By: Joshua C. Colp (jcolp) 2009-02-09 10:48:44.000-0600 Sounds like it could be spinning 'round and 'round... attaching gdb using: gdb asterisk --pid=`pidof asterisk` And doing "thread apply all bt" would give the needed information. By: Joshua C. Colp (jcolp) 2009-02-25 11:06:23.000-0600 nasirq: Have you been able to get the needed information? By: A.R. Nasir Qureshi (nasirq) 2009-02-25 23:15:03.000-0600 Not yet. The system in question has now been moved to production now, so I am unable to stimulate the conditions. I have setup another testing system similar to the first one, and I will soon start testing. Will post the results soon. By: Joshua C. Colp (jcolp) 2009-03-09 16:15:29 nasirq: Any results yet? By: Dominic Böttger (dominic) 2009-03-10 06:11:09 Same issue in our production environment :-( Attached the debug output of gdb. Can't recompile asterisk at the moment (production). Didn't have this issue for 9 weeks. Today it happened again. In the CLI i was able to execute "core show channels" just for one time. Then i had to exit the cli and run again to execute "core show channels" again. I am using asterisk 1.4.22, can't upgrade to 1.4.23, cause there are issues with Siemens Openstage phones .... I am not using AgentCallBackLogin. I am using the managementinterface to read the devicestates for a devicestateserver. By: Joshua C. Colp (jcolp) 2009-03-11 14:40:44 dominic: Going to need Asterisk compiled with DONT_OPTIMIZE and DEBUG_THREADS in menuselect / Compiler Flags. Attach the output of core show locks taken when it has deadlocked. By: Leif Madsen (lmadsen) 2009-03-23 12:32:49 dominic: ping? any luck on getting the requested information, or determining an ETA for it? By: A.R. Nasir Qureshi (nasirq) 2009-04-01 01:34:06 Today we again faced this problem and I was not doing any thing (sending actions) that I stated when I started this report. First I was able to do a "show channels" and get results, and then I did the gdb thing. After that the show channels was not working (output of asterisk CLI and gdb attached). I did gdb again. (both outputs attached). Asterisk process was using a lot of CPU PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3283 root 20 0 608m 16m 7284 S 83.8 0.2 128:21.42 asterisk To get out of the situation I had to do a "killall -9 asterisk". My asterisk is only compiled with DONT_OPTIMIZE. I am recompiling it to your specs, and will do the change over on the weekend. I did'nt ready the post carefully so was not able to do core show locks. Will remember it next time as well. By: A.R. Nasir Qureshi (nasirq) 2009-04-01 01:55:11 One more thing. I tried to telnet into the manager interface. Doing Action: Status the first time showed the result, but then it did'nt work. Action: Status Response: Success Message: Channel status will follow Event: Status Privilege: Call Channel: SIP/807-01e6cd60 CallerID: 807 CallerIDNum: 807 CallerIDName: <unknown> Account: State: Up Link: Local/807@extensions-96f8,2 Uniqueid: 1238563323.13158 Event: Status Privilege: Call Channel: Agent/1004 CallerID: unknown CallerIDNum: unknown CallerIDName: unknown Account: State: Up Link: SIP/901-ec051ae0 Uniqueid: 1238563323.13157 Event: Status Privilege: Call Channel: Local/807@extensions-96f8,2 CallerID: unknown CallerIDNum: unknown CallerIDName: unknown Account: State: Up Context: extensions Extension: 807 Priority: 1 Seconds: 2975 Link: SIP/807-01e6cd60 Uniqueid: 1238563323.13156 Action: Status Action: Status By: Leif Madsen (lmadsen) 2009-04-01 09:46:13 nasirq: if your note above isn't related to the issue here, then please open an alternate bug to track it. Thanks! By: A.R. Nasir Qureshi (nasirq) 2009-04-02 01:21:48 This is related to the same issue, as this is the same machine, same usage of Manager interface and same symptoms. I was just not sending actions, and may be since we do not know what caused the original issue, sending of actions was not the cause of the problem. However if you still think that it is not related, I'll open another bug. By: Frank Waller (explidous) 2009-04-07 08:52:55 We had similar Issues when using the same Manager connection to send and receive actions. Using a separate Manager connection for listening and separate connection for sending each set of dependent commands has solved our problems with this. As a note we do use a separate thread for each manager connection as well... By: Joshua C. Colp (jcolp) 2009-04-15 12:49:37 Pinging anyone at all. I need Asterisk compiled with DONT_OPTIMIZE and DEBUG_THREADS in menuselect / Compiler Flags and the output of core show locks attached to this issue to proceed. By: A.R. Nasir Qureshi (nasirq) 2009-04-16 03:55:25 I have complied asterisk with the required flags. Now as soon as we face the issue, I will update this bug with the required info. By: Joshua C. Colp (jcolp) 2009-04-27 09:34:11 nasirq: Glad to hear that! I'll be here waiting. By: A.R. Nasir Qureshi (nasirq) 2009-04-30 05:26:25 We just got another Lock Down. I have attached the file with the output of core show locks, (gdb) thread apply all bt and show channels. I had to kill asterisk using -9 again to then restart it to restore service. By: Joshua C. Colp (jcolp) 2009-05-04 11:26:28 After examining further this has already been fixed in newer versions. It was fixed in revision 148912 from issue 13676. |