[Home]

Summary:ASTERISK-05793: Limit on number of simultaneous app_ices instances
Reporter:Daniel Mikusa (dmikusa)Labels:
Date Opened:2005-12-07 09:12:29.000-0600Date Closed:2011-06-07 14:10:07
Priority:MinorRegression?No
Status:Closed/CompleteComponents:Applications/app_ices
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 20060214__close_ALL_fds.diff.txt
( 1) bt.txt
( 2) bt-2006-02-15.txt
( 3) bt-new.txt
Description:As far as I can tell there is a limit of running 24 simultaneous app_ices applications.  I do not think there should be any limit.  The general process to reproduce this error is as follows.  

1.)Setup an extension in the dial plan as such.

exten => 400,1,SetGroup('x')
exten => 400,2,GetGroupCount()
exten => 400,3,Ices(${GROUPCOUNT}-ices.xml)

2.)You will need to generate 25 ICES configuration files as below, replacing the values in parenthesis with values for your setup.

<?xml version="1.0"?>
<ices>
   <background>1</background>
   <logpath>/var/log/asterisk/ices</logpath>
   <logfile>(number).log</logfile>
   <loglevel>4</loglevel>
   <consolelog>0</consolelog>
   <pidfile>/var/log/asterisk/ices/(number).pid</pidfile>

   <stream>
       <metadata>
           <name>(number)</name>
           <genre>Test</genre>
           <description>Test</description>
           <url>http://www.asterisk.org</url>
       </metadata>

       <input>
           <module>stdinpcm</module>
           <param name="rate">8000</param>
           <param name="channels">1</param>
       </input>

       <instance>
           <hostname>(your host)</hostname>
           <port>(your port)</port>
           <password>(your password)</password>
           <mount>/(number)</mount>
           <yp>0</yp>

           <encode>
               <quality>0</quality>
               <samplerate>8000</samplerate>
               <channels>1</channels>
           </encode>

           <downmix>0</downmix>
           <savefile>/tmp/(number).ogg</savefile>
       </instance>

   </stream>
</ices>

3.)Begin by dialing the extension 24 times.  You will notice that there are now 24 instances of app_ices running.  As you bring up the 25 instance, one of the first 24 instances will stop processing audio, and timeout with the remote server.


****** ADDITIONAL INFORMATION ******

Here's what I have found out through my testing.

1.)Outside of Asterisk I can run as many instances of the ICES application as I want.  In my testing I created and ran 35 instances for about 20 minutes.
2.)As I examine the logs on the remote Icecast server, Icecast reports that the ICES process has not sent it any data in the last 45 seconds, so it drops the connection due to a timeout.  ( I have my source timeout set to 45 seconds, which is twice the default ).
3.)I have increased the system open file limit ( ulimit -n ) to a very large number.
4.)The machine I am testing on has plenty of resources.  Processing 24 app_ices instances consumes about 20% of the CPU.  The machine has 1GB Ram and is no where near using it all.
5.) I have had this problem under Asterisk 1.08, 1.09, 1.20.  I haven't not tried it with 1.21 as it just came out this morning, but will be trying it soon.
6.) I have run three different Linux distro's, Suse 9.3, CentOS 4.2, and Fedora Core 4.
Comments:By: Daniel Mikusa (dmikusa) 2005-12-07 11:41:31.000-0600

I've tested version 1.21 of Asterisk with this bug and it still exists.  More importantly though I've found what appears to be the problem.

When the 25 instance of the app_ices application is run, one of the original 24 instances ( doesn't seem to always be the same instance, seems random ) stops receiving frames.

I was able to use 'show channel xxxx' on a channel that was still working, and 'show channel xxxx' on the channel that dropped its connection to the remote Icecast server.  The difference was the number of Frames into the channel.  On the working channel the frames in counter was incrementing very rapidly.  On the channel which lost its connection the frames in counter has stopped.

By: Olle Johansson (oej) 2006-01-04 04:55:55.000-0600

Can you attach debug output?

By: Daniel Mikusa (dmikusa) 2006-01-04 07:58:54.000-0600

I'd be glad to do this, but I do not know how.  Can you explain or point me to some docs?

By: Daniel Mikusa (dmikusa) 2006-01-10 09:15:57.000-0600

Ok, I tried some of the stuff from the voip-info.org wiki about debugging.  Since * is not crashing there is no core file.  So I tried to to connect to the thread with gdb to run a backtrace.  Here's the output although it doesn't seem useful at all.

#0  0xffffe410 in ?? ()
#1  0xbffff198 in ?? ()
#2  0xffffffff in ?? ()
#3  0x00000000 in ?? ()
#4  0x40193954 in poll () from /lib/tls/libc.so.6
ASTERISK-1  0x080c2a7e in main (argc=1, argv=0xbffff614) at asterisk.c:2400

That being said, here's where I notice that there is a problem with *.  First here is the output of the show channel command for a working channel running app_ices.

The First Reading...

show channel IAX2/cartman-2
-- General --
          Name: IAX2/cartman-2
          Type: IAX2
      UniqueID: 1136907798.1
     Caller ID: (N/A)
Caller ID Name: (N/A)
   DNID Digits: (N/A)
         State: Up (6)
         Rings: 0
  NativeFormat: 4
   WriteFormat: 4
    ReadFormat: 64
1st File Descriptor: -1
     Frames in: 25957
    Frames out: 0
Time to Hangup: 0
  Elapsed Time: 0h8m39s
 Direct Bridge: <none>
Indirect Bridge: <none>
--   PBX   --
       Context: internal
     Extension: 410400021
      Priority: 2
    Call Group: 0
  Pickup Group: 0
   Application: ICES
          Data: contrib/2-ices.xml
   Blocking in: ast_waitfor_nandfds
     Variables:
GROUP=feed

Second Reading ---

show channel IAX2/cartman-2
-- General --
          Name: IAX2/cartman-2
          Type: IAX2
      UniqueID: 1136907798.1
     Caller ID: (N/A)
Caller ID Name: (N/A)
   DNID Digits: (N/A)
         State: Up (6)
         Rings: 0
  NativeFormat: 4
   WriteFormat: 4
    ReadFormat: 64
1st File Descriptor: -1
     Frames in: 27357
    Frames out: 0
Time to Hangup: 0
  Elapsed Time: 0h9m7s
 Direct Bridge: <none>
Indirect Bridge: <none>
--   PBX   --
       Context: internal
     Extension: 410400021
      Priority: 2
    Call Group: 0
  Pickup Group: 0
   Application: ICES
          Data: contrib/2-ices.xml
   Blocking in: ast_waitfor_nandfds
     Variables:
GROUP=feed

Over a 28 seconds period * send 1400 frames to the channel.  Now here's the output from the show channel command on the broken channel running app_ices.

The First Reading...

show channel IAX2/cartman-16
-- General --
          Name: IAX2/cartman-16
          Type: IAX2
      UniqueID: 1136907807.15
     Caller ID: (N/A)
Caller ID Name: (N/A)
   DNID Digits: (N/A)
         State: Up (6)
         Rings: 0
  NativeFormat: 4
   WriteFormat: 4
    ReadFormat: 64
1st File Descriptor: -1
     Frames in: 4783
    Frames out: 0
Time to Hangup: 0
  Elapsed Time: 0h6m40s
 Direct Bridge: <none>
Indirect Bridge: <none>
--   PBX   --
       Context: internal
     Extension: 410400021
      Priority: 2
    Call Group: 0
  Pickup Group: 0
   Application: ICES
          Data: contrib/16-ices.xml
   Blocking in: ast_waitfor_nandfds
     Variables:
GROUP=feed

Second Reading...

show channel IAX2/cartman-16
-- General --
          Name: IAX2/cartman-16
          Type: IAX2
      UniqueID: 1136907807.15
     Caller ID: (N/A)
Caller ID Name: (N/A)
   DNID Digits: (N/A)
         State: Up (6)
         Rings: 0
  NativeFormat: 4
   WriteFormat: 4
    ReadFormat: 64
1st File Descriptor: -1
     Frames in: 4805
    Frames out: 0
Time to Hangup: 0
  Elapsed Time: 0h7m27s
 Direct Bridge: <none>
Indirect Bridge: <none>
--   PBX   --
       Context: internal
     Extension: 410400021
      Priority: 2
    Call Group: 0
  Pickup Group: 0
   Application: ICES
          Data: contrib/16-ices.xml
   Blocking in: ast_waitfor_nandfds
     Variables:
GROUP=feed

Over a 47 second period * only sent 22 frames to this channel.  This does not seem correct.  Also note that this only started happening after I connected the 25th channel.  Before that * was sending frames to this channel just fine.

I'd be happy to run some more tests if need be.  I have a machine setup in the office running the trunk version of * from the svn repository.  Just let me know what I can do to help.

By: Tilghman Lesher (tilghman) 2006-01-21 13:46:00.000-0600

Was this backtrace from a build of Asterisk made with 'make dontoptimize' ?  Note that you have to 'make clean' first.

By: Daniel Mikusa (dmikusa) 2006-01-23 09:01:31.000-0600

My latest tests were run on Asterisk SVN-trunk-r7833M.  It was compiled with the standard 'make' and 'make install'.  No special options.

By: Tilghman Lesher (tilghman) 2006-01-23 09:45:28.000-0600

Good.  For a backtrace to be helpful, you need to compile with 'make dontoptimize'.  Upload, don't post, the resulting backtrace to this bug, in plaintext, not zipped.

By: Daniel Mikusa (dmikusa) 2006-01-23 12:18:21.000-0600

Ok, this looks better.

I ran 'svn update', 'make clean', and 'make dont-optimize'.

Setup the test, and after Asterisk stopped sending frames to the channel, I ran 'gdb /usr/sbin/asterisk pid', and lastly 'bt'.

I uploaded the results.



By: Tilghman Lesher (tilghman) 2006-01-23 12:34:33.000-0600

Okay, now upload the output of:

(gdb) thread apply all bt full

By: Daniel Mikusa (dmikusa) 2006-01-23 13:17:02.000-0600

I've uploaded that.  It's under bt-new.txt.  For future reference, is there a way to easily pipe that output to a file?

By: Kevin P. Fleming (kpfleming) 2006-02-14 16:50:14.000-0600

Do you want to continue chasing this one?

By: Tilghman Lesher (tilghman) 2006-02-14 18:33:59.000-0600

This patch might work.  I suspect the problem is that not all fds are getting properly closed after a fork() and an open File Descriptor is causing other threads to hang.

Note that this patch applies to trunk and will only have an effect on Linux.

By: Daniel Mikusa (dmikusa) 2006-02-15 12:20:21.000-0600

I applied the update, re-compiled, and re-tested.  Still having the same problem.  I've uploaded another backtrace.  Also as I was testing this time I noticed that the ICES processes didn't exit after I hungup the channels in Asterisk.

I'm not sure if this information is helpful or not, but I had originally believed that the problem was related to the use of fork() and pipes.  My solution to this problem was to create an application module for Asterisk that encoded the audio inline and did not use fork() and pipes.  While the application I created worked just fine, it also suffered from this 24 instance maximum.  I can post the source code for that if it will be helpful.

By: Olle Johansson (oej) 2006-03-08 14:22:17.000-0600

Anyone still interested in this?

By: Clod Patry (junky) 2006-04-11 22:43:13

Re-Open if there's any interest.