|Summary:||ASTERISK-15058: [regression] High CPU usage, choppy sound|
|Date Opened:||2009-10-30 19:45:47||Date Closed:||2011-06-07 14:07:45|
|Environment:||Attachments:||( 0) gdb.txt|
( 1) gdb2.txt
( 2) gdb3.txt
|Description:||After upgrading from Asterisk 184.108.40.206 to 220.127.116.11 I seem to have a random problem with high CPU usage. When making a SIP to SIP call or SIP to System call Asterisk process CPU usage goes much higher than normal and the sound get's choppy. This is somewhat random but occurs fairly regularly, maybe 50% of the time.|
If it happens it happens as soon as a call is answered. CPU usage sometimes stays high for minutes and will suddenly go low even though I didn't do anything. More often than not it only stays high for a short period of time and then goes back to normal.
I am running 64bit CentOS 5.4. Log files and Asterisk CLI -vvvvvvvvvvr look clean.
Not sure if this is useful but I will attached a gdb.txt file taken while it is happening
|Comments:||By: mustardman (mustardman) 2009-10-31 03:51:50|
I cannot get Asterisk to actually answer calls with gdb attached but maybe it will still help.
By: mustardman (mustardman) 2009-10-31 16:35:24
Upon further testing it is worse than I originally thought. The longer Asterisk sits idling the worse it is when it first answers a call. After about 10 minutes of idling it goes to 50% cpu usage (on my dual core HT system). If I leave Asterisk idling for an hour or two and call in to the system the Asterisk process pins the CPU at 100% for about 30 seconds.
I am starting asterisk with safe_asterisk but it is the asterisk process that pins the CPU.
By: mustardman (mustardman) 2009-11-02 02:35:35.000-0600
Still testing and trying to track this down. It is not random at all. Basically the longer Asterisk sits idle the longer the high CPU usage. It basically spikes to 100% and after a period of time goes down to normal. After numerous tests it seems directly proportional to the length of time Asterisk sits idle.
If Asterisk is running all night and I make a call it spikes and stays at 100% for about 30seconds to 1 minute. If I make a call 5minutes later it spikes but comes down to normal after only a few seconds.
It occurs as soon as the call comes in requiring Asterisk to do something. Memory usage of all processes seems normal and does not change overnight or before or during or after the spike.
Anyone know how to trace the Asterisk process while it is running? I tried gdb but Asterisk seems to stop working when I try trace the process even though the Asterisk process is still running.
By: Leif Madsen (lmadsen) 2009-11-02 09:08:00.000-0600
And does this still happen if you revert back to 18.104.22.168? 22.214.171.124 is a security only update, so that *shouldn't* cause this, but if that really is the case, then we need to review the change and determine why that might be happening.
By: Leif Madsen (lmadsen) 2009-11-02 09:09:49.000-0600
Additionally, for your backtrace to be useful, you will need to enable DONT_OPTIMIZE in menuselect within the Compiler Flags setting of Asterisk. After enabling, do "make install" and re-run your backtrace.
I'm not sure if this will be useful to a developer or not, but since you were willing to provide it, then we might as well have it setup correctly :) Thanks!
By: mustardman (mustardman) 2009-11-06 21:32:08.000-0600
I have installed 126.96.36.199 RC3 and now RC4 and so far it looks like maybe the issue is not there anymore. Still testing. I have also added some development packages. It may have had something to do with a missing dependency but I have no information either way to confirm or deny that. I do look at the checks when I run ./configure but it is not always clear or conclusive.
Would it be too much to ask for an exact list of all the dependencies and packages required in a readme file? I know that I can just install everything or the Development tools group etc. but that adds hundreds of megs of stuff I don't want or need. On hardware limited or VPS systems that is not practical.
It sure would be nice if there was an up to date list of dependencies required so I can only install the packages needed and don't have to constantly guess or overinstall to be sure.
By: Leif Madsen (lmadsen) 2009-11-07 14:10:49.000-0600
There is no list of dependencies other than the list which is provided in menuselect under each application. If the application is available for selection/deselection, then the dependencies have been satisfied.
If Asterisk compiles, and the packages that you want to select are available for select and build, then the dependencies have been satisfied. It is unlikely that installing a package caused anything to change. It is likely either a bug was fixed in the source to cause this to no longer be an issue for you, or something on your network or OS changed to resolve this.
Regardless, I'm going to close this issue, and if it reappears, then you can reopen this issue.
By: mustardman (mustardman) 2009-11-09 02:56:51.000-0600
Downgraded from 1.6.2RC4 to 188.8.131.52 without changing anything else on my system and the issue re-appeared. So it is definitely an issues with 1.6.1.x that is NOT present in 1.6.2.x.
By: Leif Madsen (lmadsen) 2009-11-09 08:19:55.000-0600
The difference between 1.6.1.x and 1.6.2.x is a major version change -- they cannot be compared to each other.
You have still not answered the initial question of whether just running 184.108.40.206 continues to have the same problem as 220.127.116.11 was only a security update which did not touch this part of the code.
Additionally, 18.104.22.168 was also only a security update which should not have affected this.
22.214.171.124-rc1 would be a more appropriate version to check against as bug fixes will be contained in there beyond 126.96.36.199.
I would like to see if 188.8.131.52 is experiencing this same issue, as I don't see how the update to 184.108.40.206 could have alone explained this issue.
By: mustardman (mustardman) 2009-11-09 20:51:27.000-0600
Ok I went and re-installed all over again and compiled with don't optimize flag and attached gdb3.txt. Let me know what else I need to do.
This was on Asterisk 220.127.116.11
By: mustardman (mustardman) 2009-11-09 23:00:10.000-0600
Problem is still there in 18.104.22.168.
Sigh........going back to 1.6.0.x to give that a try.
By: mustardman (mustardman) 2009-11-10 00:59:28.000-0600
Asterisk 22.214.171.124 is ok as far as I can tell. No CPU spike!
So it appears only Asterisk 1.6.1.x has the problem.
By: Leif Madsen (lmadsen) 2009-11-10 10:01:52.000-0600
Issue is Acknowledge. Per the issue guidelines though, as this is part of chan_sip, please provide the following information (as quoted from the issue guidelines):
Include debug output! Please include output from "sip debug" if you have a SIP problem. This seems obvious, but apparently is not. Set debug to 4, verbose to 4, turn on sip history and dumphistory in sip.conf and capture all output. A packet trace from ethereal will not tell us what is happening inside your Asterisk server, so that is not a replacement.
By: Leif Madsen (lmadsen) 2009-11-12 15:53:38.000-0600
Can you provide the circumstances that this is happening, such as console output when it happens so that we can try to track down why this is happening? Thanks!
By: mustardman (mustardman) 2009-11-12 16:40:43.000-0600
Ok, it is as I described in the first post. Simplest test setup is a SIP client such as Xlite or a SIP phone registered to Asterisk as an extension. Simply dial into the system. Simplest test case is echo test. Spike occurs the instant Asterisk responds to the dial in.
Spike ONLY occurs after a period of idle time (maybe 15-30minutes). If Asterisk is busy doing other simultaneous things or has recently serviced other calls then the spike will NOT occur.
There is some correlation between the length of idle time and the length of time the CPU remains spiked at 100%. If I leave the server on overnight, for example, then call into the system in the morning the spike is quite long (1-2minutes). If I call in after maybe 15-30minutes of idle time the spike only lasts a few seconds. Same thing with re-boots. Spike will not occur immediately after a system reboot. Only after running idle for 15-30minutes. I do not see any changes to memory usage during idle. All my observations are done using the 'top' tool on CentOS 5.
I am not installing/using Zaptel/Dahdi in my system and I do not have any trunks configured btw. All I do is install Asterisk from scratch and configure one SIP device. I use an Aastra 55i and Xlite for SIP testing. It does not seem to matter what the SIP client is.
Asterisk is installed with all defaults + ulaw+ gsm sounds. I also install Asterisk Addons with all defaults.
By: Leif Madsen (lmadsen) 2009-11-13 08:53:15.000-0600
Well that is certainly interesting.
Your last note makes it perfectly clear how to reproduce. I will see if I can reproduce on a virtual machine.
By: Leif Madsen (lmadsen) 2009-11-18 11:19:56.000-0600
Last note deleted due to being in the wrong bug!
By: Leif Madsen (lmadsen) 2009-12-10 15:05:07.000-0600
I want to try and reproduce this now, and want to make sure -- I should be seeing this if I use something like Echo() or Milliwatt() right?
You said SIP to System call, but I'm not sure exactly what you mean by that.
By: mustardman (mustardman) 2009-12-10 15:16:09.000-0600
What I mean is that you do not need to call another extension or anything like that. You simply do an echo test with one extension.
You will not see anything until after the Asterisk and safe_asterisk processes have been idling for serveral minutes. The longer they are idling the longer the CPU spike.
By: Leif Madsen (lmadsen) 2009-12-10 15:18:33.000-0600
Awesome, that's what I thought too. I'm going to try and reproduce this now on 126.96.36.199, and then if successful, I'll try and reproduce on the latest 1.6.1 branch. Thanks!
By: Leif Madsen (lmadsen) 2009-12-10 17:37:03.000-0600
Hmmm, so I have CentOS 5.3, 64-bit, running 188.8.131.52, and a Zoiper softphone attached. I didn't have anything going on for 41 minutes, and the first thing that I did was place a call to Echo(), and didn't see any spike at all.
I was running top in another window and watched it while I placed the call, and there was no spike.
Unfortunately I'm not sure what else I can do to reproduce this at the moment. Any suggestions?
[root@asttest1 asterisk]# uname -a
Linux asttest1 2.6.18-128.4.1.el5 #1 SMP Tue Aug 4 20:19:25 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
The one thing I noticed that you did differently was to run safe_asterisk instead of what I did of running Asterisk in the foreground (asterisk -cvvvvg).
I just noticed that you say you don't have DAHDI installed at all. I just had DAHDI dummy module loaded while doing this test, so I'm going to unload that module, and then restart Asterisk with 'service asterisk start' so it uses the safe_asterisk script.
In another 30 minutes I'll try placing a call to Echo() again and see if I get the spike inside of top.
By: Leif Madsen (lmadsen) 2009-12-10 17:37:38.000-0600
Oops, forgot to send the last note after I typed it up (dinner as called :))
I just tried again after leaving it up for 1.5 hrs, and I still can't seem to reproduce this issue unfortunately.
By: mustardman (mustardman) 2009-12-10 17:41:49.000-0600
I am running FreePBX which you guys immediately refuse to recognized (for the purposes of bug reports) so I just don't bother mentioning that.
Not running safe_asterisk is a pretty big difference.
By: Leif Madsen (lmadsen) 2009-12-10 17:47:43.000-0600
In my last test I was running safe_asterisk as I used the init.d script, which utilizes safe_asterisk.
By: mustardman (mustardman) 2009-12-16 11:55:23.000-0600
I am running this on an OpenVZ VPS with FreePBX. Neither of those things should make a difference though.
I have 3 identical VPS installs. The ONLY difference is the version of Asterisk they are running. The one running 1.6.0.x and the one running 1.6.2.x are fine. It's only the one running 1.6.1.x that seems to have this problem. I have installed this one over and over again trying different versions of 1.6.1.x. They all seem to do it in the 1.6.1 version.
No Zaptel/Dahdi. Ulaw sounds and addon sounds selected in addition to the default GSM sounds.
By: Leif Madsen (lmadsen) 2009-12-17 09:17:59.000-0600
Someone else will need to reproduce or follow up on this, as my test setup does not exhibit this problem, which means there is something unique in your setup causing this.
By: Leif Madsen (lmadsen) 2009-12-22 15:54:08.000-0600
Would it be possible for a developer to gain access to the box that is exhibiting this issue?
By: mustardman (mustardman) 2009-12-22 16:05:52.000-0600
Yes I can probably arrange something. I am running 1.6.0.x right now. I'll see if I can load up 1.6.1.x
By: mustardman (mustardman) 2010-01-17 13:29:15.000-0600
Little bit of an update. I came across something that may (may?) explain what is going on. Perhaps someone who understands the inner workings of Asterisk can confirm if this could be a problem.
I am running this on an OpenVZ VPS. OpenVZ is not compatible with udev which I believe Asterisk uses in some way. There are some work arounds for this. I was disabling udev and adding these lines on boot up instead:
Searching around a more complete work around which I was NOT doing is to also add:
/bin/rm -rf /dev/null
/bin/rm -rf /dev/random
/bin/rm -rf /dev/tty*
/bin/rm -rf /dev/pty*
/bin/mknod /dev/null c 1 3
/bin/mknod /dev/random c 1 8
So I am wondering if Asterisk was trying to use /dev/null and/or /dev/random and was not able to which could cause the CPU spike?
By: Paul Belanger (pabelanger) 2010-06-02 13:10:52
Per the Asterisk maintenance timeline page at http://www.asterisk.org/asterisk-versions maintenance (bug) support for the 1.6.0 and 1.6.1 branches has ended. For continued maintenance support please move to the 1.6.2 branch.
More information on this change can be found in the release announcement: http://www.asterisk.org/node/49924
By: Leif Madsen (lmadsen) 2010-06-08 10:40:19
I'm closing this issue due to it's age, and that 1.6.0 and 1.6.1 are no longer supported via maintenance releases. If you are able to reproduce this issue on the latest 1.6.2, then please open a new issue along with debug information. Thanks!