Summary:ASTERISK-17163: Asterisk consumes 100% of CPU on Mac OS X
Reporter:Steven Sokol (ssokol)Labels:
Date Opened:2010-12-26 11:12:43.000-0600Date Closed:
Versions:Frequency of
is related toASTERISK-20750 res_timing_kqueue makes Asterisk use 100% CPU
Environment:Attachments:( 0) asterisk_sample.txt
Description:I've run Asterisk 1.8.0 and 1.8.1 on several Macs (all running Mac OS X 10.6.5) and in every case it sucks up 100.x% of available CPU.  (Note that in call cases the system has at least 2 cores, so this is possible).  I've run a profile (or what the Mac calls a "sample") and it shows a lot of this:

   2169 Thread_29623
     2169 thread_start
       2169 _pthread_start
         2169 dummy_start
           2169 do_devstate_changes
             2169 _pthread_cond_wait
               2169 __semwait_signal
   2169 Thread_29624
     2169 thread_start
       2169 _pthread_start
         2169 dummy_start
           2169 tps_processing_function
             2169 _pthread_cond_wait
               2169 __semwait_signal
   2169 Thread_29625
     2169 thread_start
       2169 _pthread_start
         2169 dummy_start
           2169 do_parking_thread
             2169 ast_internal_poll
               2169 select$DARWIN_EXTSN

I'll attach the complete sample file.  Please let me know what else you need to help figure this out.
Comments:By: John Todd (jtodd) 2010-12-26 12:43:48.000-0600

I'm seeing the same issue on my system running TRUNK.

By: Tilghman Lesher (tilghman) 2010-12-26 16:47:31.000-0600

The culprit appears to be chan_iax2, specifically that the call to ast_cond_wait() is returning prematurely.

By: Steven Sokol (ssokol) 2010-12-26 21:32:38.000-0600

I tried loading Asterisk without chan_iax2.so (noload => chan_iax2.so) and I still see the CPU at 100.x%.  I see a very high number of calls to __semwait_signal and a slightly lower number of calls to _pthread_cond_wait.

By: Tilghman Lesher (tilghman) 2010-12-27 13:49:37.000-0600

Right, but as the callstack indicates, pthread_cond_wait is implemented internally with __semwait_signal.  The use of the GUI sampler in the OS X Dev Tools indicated that the chan_iax2 threads took up 26% of the total.  This bears more investigation, still.

By: Steven Sokol (ssokol) 2010-12-28 12:49:39.000-0600

The culprit seems to be pbx_spool.so.  I started Asterisk without any modules and loaded in key suspects one at a time until the CPU load jumped to 100%.  Loading pbx_spool.so did it.  

So to confirm that it is the only issue, I restarted with a noload statement for spool but everything else as normal.  The CPU load was a constant 0.6%.  I tried again with both pbx_spool.so and chan_iax2.so noloaded and the CPU load dropped to 0% at idle.

By: John Papandriopoulos (jpap) 2011-06-28 21:21:40.414-0500

I can confirm this bug in the {{pbx_spool.so}} module on Asterisk running under OSX 10.6.8.

The 100% CPU occurs because of a tight polling loop on the Asterisk call file spool directory.

A simple non-elegant solution is to add the following CPP directives to the top (anywhere) of the {{pbx/pbx_spool.c}} source file:

{noformat}/* Force filesystem polling on OSX */
#ifdef __APPLE__

This kicks in the second {{static void *scan_thread(void *unused)}} function that does a poll spaced at one-second intervals instead of the tight loop.  The CPU utilization issue is then resolved.  

I also updated the {{struct timespec ts}} variable in the above function to be 10 seconds instead of 1 second, resulting in negligible ("zero") CPU utilization.

An elegant fix would be to support the {{/dev/fsevents}} API on OSX, which is analogous to inotify on Linux.

By: Charles Day (cedayiv) 2012-09-30 18:14:21.797-0500

I can also confirm this bug in the {{pbx_spool.so}} module of Asterisk 10.8.0 running under OSX 10.8.2 (Mountain Lion).

The 100% CPU occurs when {{pbx_spool.so}} is loaded by modules.conf. Without {{pbx_spool.so}}, CPU usage is only about 0.9%.

However, I tried adding the code suggested by John Papandriopoulos to {{pbx/pbx_spool.c}} and rebuilding (make clean, make, sudo make install). It didn't seem to help, as CPU usage was still 100%.