[Home]

Summary:ASTERISK-16135: CPU usage very high when putting call on MOH
Reporter:Miguel Molina (coolmig)Labels:
Date Opened:2010-05-24 11:26:00Date Closed:2010-08-31 14:46:51
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Resources/res_musiconhold
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) issue_17384_full_log
( 1) sip.conf
Description:Test scenario is very simple, two asterisk servers talking to each other SIP.

SIP phone dials a extension on the first server which makes a call to the other server that queues the call (and plays MOH). CPU usage is very little, as always.

As long as I enable MOH on the SIP client, the CPU usage skyrockets... as soon as I stop MOH on the SIP phone, everything goes back to normal.

This limits normal asterisk use with many channels involved.

****** ADDITIONAL INFORMATION ******

This happens the same in 1.6.2.7-rc1.
Comments:By: Paul Belanger (pabelanger) 2010-05-24 13:29:34

We need a debug log (see below), also post your sip.conf file.
---
We require a complete debug log to help triage the issue.

This document will provide instructions on how to collect debugging logs from an Asterisk machine for the purpose of helping bug marshals troubleshoot an issue:

http://svn.digium.com/svn/asterisk/trunk/doc/HOWTO_collect_debug_information.txt

By: Miguel Molina (coolmig) 2010-05-24 17:07:40

Ok, I followed your instructions. Please see the attached log, I started the call described in the simple test scenario, and put it into MOH (see line 1109 of the log). What I can see is that MOH files that are of minutes in length are played in a couple of seconds! Could this be a timing issue?

I am running this tests on two virtual machines in an OpenVZ environment that belong to the same physical host.

By: Paul Belanger (pabelanger) 2010-05-25 08:31:53

What format are you using for MOH?  I suspect you are transcoding your audio, which is causing your CPU to spike.

By: Miguel Molina (coolmig) 2010-05-25 08:52:23

There are MOH files in GSM and WAV format (native file playback). My calls are in GSM format, so asterisk must be choosing the right one and it shouldn't be transcoding.

By: Paul Belanger (pabelanger) 2010-05-25 09:01:39

According to your debug log, your format is actually alaw, not GSM.  So, it looks like Asterisk is transcoding our MOH.  Install the alaw MOH format.
---
Thanks for your comments. This does not appear to be a bug report and we are closing it. We appreciate the difficulties you are facing, but it would make more sense to raise your question in the support tracker, http://www.asterisk.org/support

By: Miguel Molina (coolmig) 2010-07-30 15:37:17

I'm reopening this because I repeated the test making sure there was no transcoding (pabelanger was right about the alaw format on the previous test). The results are the same making sure there's no transcoding. The debug file shows that the musiconhold files that normally have a length of several minutes are being played in a couple of seconds, like being played at a very high speed.

I think this is related to https://issues.asterisk.org/view.php?id=17726
because of the dependency of res_musiconhold of DAHDI, and the tests were made using res_timing_pthread.

By: Paul Belanger (pabelanger) 2010-07-30 16:05:48

Do you mind test ASTERISK-16441 and reporting back.

By: Sean Bright (seanbright) 2010-07-31 09:01:24

This isn't related to ASTERISK-16441.



By: Leif Madsen (lmadsen) 2010-07-31 09:49:53

Odd, because I just tested this same scenario the other day on my virtual machine and I was not able to reproduce this.

By: Leif Madsen (lmadsen) 2010-08-05 14:43:05

What does the reporter mean by, "As long as I enable MOH on the SIP client, the CPU usage skyrockets... as soon as I stop MOH on the SIP phone, everything goes back to normal."

Do you mean if the client hits the hold button? Please elaborate.

By: Leif Madsen (lmadsen) 2010-08-05 14:43:47

Also I wonder if a backtrace of this issue being reproduced would be useful as well. If the reporter can get a backtrace of the issue that would be great.

By: Leif Madsen (lmadsen) 2010-08-06 13:34:48

Do you get better response than res_timing_pthread? What are the specs of this server you're getting the issues on?

The reason I ask about the specs of the server is that res_timing_pthread is not terribly efficient, but if this is a powerful server you're running on we may want to look into this further to determine if anything can be done.

Trying another timing source may help with this if the server is not very powerful.

Thanks!

By: Miguel Molina (coolmig) 2010-08-06 18:25:50

"Do you mean if the client hits the hold button? Please elaborate. "

Yes, I mean when I hit the hold button on the SIP softphone, the issue appears.

I'm sorry for not answering early, I've been quite busy these days... I'll try to test when I get a chance.

One question, is there an easy way to change from res_timing_dahdi to res_timing_pthread without uninstalling DAHDI?

The server is not that powerful, is a HP ML150 with a Dual-Core Xeon 3.2GHz with 2GB RAM.

By: Leif Madsen (lmadsen) 2010-08-16 15:50:51

noload => res_timing_dahdi.so in modules.conf should force res_timing_pthread.so if compiled.

By: Leif Madsen (lmadsen) 2010-08-16 15:52:53

Wait, you said all along that you were using res_timing_pthread, not res_timing_dahdi -- then I asked if you got a better response if NOT using res_timing_pthread -- so I have no idea what timing module you're actually using here.

By: Miguel Molina (coolmig) 2010-08-17 14:28:50

The initial report tests were made with res_timing_pthread.so. After a while I managed to install DAHDI and use res_timing_dahdi.so on the virtual machines, but I didn't have a chance to test this particular scenario with res_timing_dahdi.so.

Thanks for the noload => res_timings_dahdi.so tip, I'll see if it works.

I'll do the tests and report back here.

By: Miguel Molina (coolmig) 2010-08-17 16:03:16

OK I tested with the latest version (1.6.2.11) and this looks solved... I think you can close this issue. I both tested with DAHDI and pthread timing modules without difference. I set up the same scenario having positive results, no CPU spikes and no weird MOH behavior.

As to the question "finally what was the issue?" I only can think about what I saw on the first report (I will paste only interesting debug lines):

[May 24 16:44:51] VERBOSE[13490] res_musiconhold.c:     -- Started music on hold, class 'default', on SIP/TRUNK_INTER_MAQUINAS-00000005
[May 24 16:44:51] DEBUG[13490] channel.c: Scheduling timer at (50 requested / 50 actual) timer ticks per second

Here is the weird part for me: the timing was requested but "generator got voice" so timing is scheduled 0/0 with MOH on:

[May 24 16:44:51] DEBUG[13490] channel.c: Generator got voice, switching to phase locked mode
[May 24 16:44:51] DEBUG[13490] channel.c: Scheduling timer at (0 requested / 0 actual) timer ticks per second
[May 24 16:44:52] DEBUG[13490] rtp.c: Got RTCP report of 64 bytes

That made MOH play at a very fast speed with the corresponding CPU spike:

[May 24 16:44:52] DEBUG[13490] res_musiconhold.c: SIP/TRUNK_INTER_MAQUINAS-00000005 Opened file 2 '/var/lib/asterisk/moh/fpm-calm-river'
[May 24 16:44:53] DEBUG[13490] res_musiconhold.c: SIP/TRUNK_INTER_MAQUINAS-00000005 Opened file 0 '/var/lib/asterisk/moh/fpm-sunshine'
[May 24 16:44:54] DEBUG[13490] res_musiconhold.c: SIP/TRUNK_INTER_MAQUINAS-00000005 Opened file 1 '/var/lib/asterisk/moh/fpm-world-mix'
[May 24 16:44:56] DEBUG[13490] res_musiconhold.c: SIP/TRUNK_INTER_MAQUINAS-00000005 Opened file 2 '/var/lib/asterisk/moh/fpm-calm-river'
[May 24 16:44:57] DEBUG[13490] res_musiconhold.c: SIP/TRUNK_INTER_MAQUINAS-00000005 Opened file 0 '/var/lib/asterisk/moh/fpm-sunshine'
[May 24 16:44:57] DEBUG[13490] rtp.c: Got RTCP report of 64 bytes
[May 24 16:44:58] DEBUG[13490] res_musiconhold.c: SIP/TRUNK_INTER_MAQUINAS-00000005 Opened file 1 '/var/lib/asterisk/moh/fpm-world-mix'
[May 24 16:44:59] DEBUG[13490] res_musiconhold.c: SIP/TRUNK_INTER_MAQUINAS-00000005 Opened file 2 '/var/lib/asterisk/moh/fpm-calm-river'
[May 24 16:45:00] DEBUG[13490] res_musiconhold.c: SIP/TRUNK_INTER_MAQUINAS-00000005 Opened file 0 '/var/lib/asterisk/moh/fpm-sunshine'



Now with 1.6.2.11, the same scenario plays well, with no "generator got voice" and no 0/0 timer scheduling while MOH on:

[Aug 17 16:00:16] DEBUG[3556]: channel.c:2488 ast_settimeout: Scheduling timer at (50 requested / 50 actual) timer ticks per second
[Aug 17 16:00:16] DEBUG[1813]: chan_sip.c:4091 __sip_ack: Stopping retransmission on 'f29a31c8-b112-45a7-93de-e68074695d00' of Response 23987: Match Found
   -- Started music on hold, class 'default', on SIP/TRUNK_INTER_MAQUINAS-00000003
[Aug 17 16:00:16] DEBUG[3556]: channel.c:2488 ast_settimeout: Scheduling timer at (50 requested / 50 actual) timer ticks per second
virtual1_ast1*CLI> [Aug 17 16:00:16] DEBUG[3556]: res_musiconhold.c:307 ast_moh_files_next: SIP/TRUNK_INTER_MAQUINAS-00000003 Opened file 0 '/var/lib/asterisk/moh/fpm-sunshine'
[Aug 17 16:00:16] DEBUG[3556]: res_musiconhold.c:307 ast_moh_files_next: SIP/TRUNK_INTER_MAQUINAS-00000003 Opened file 0 '/var/lib/asterisk/moh/fpm-sunshine

File plays well with almost 0% CPU consumption and no CPU spike. Timing only goes to 0/0 when I hit the hold button again to disable MOH:

   -- Stopped music on hold on SIP/TRUNK_INTER_MAQUINAS-00000003
virtual1_ast1*CLI> [Aug 17 16:02:19] DEBUG[3556]: channel.c:2488 ast_settimeout: Scheduling timer at (0 requested / 0 actual) timer ticks per second
[Aug 17 16:02:19] DEBUG[3556]: channel.c:2488 ast_settimeout: Scheduling timer at (0 requested / 0 actual) timer ticks per second


That was the cause IMHO.

By: Leif Madsen (lmadsen) 2010-08-31 14:46:50

Closing this issue per the reporter. If additional information comes up in the future to nail down a bug that needs to be fixed, please open a new issue and we can relate it to this one.

Thanks!