Summary:ASTERISK-07854: PRI Channels become unavailable if too many call files are queued
Reporter:Andre Courchesne (acourchesne)Labels:
Date Opened:2006-10-02 09:43:05Date Closed:2011-06-07 14:02:45
Versions:Frequency of
Environment:Attachments:( 0) Appel_double.txt
( 1) channel_info.txt
( 2) full_log.txt
( 3) pri_debug.txt
Description:Using: asterisk
      libpri 1.2.3

Using call files we generate mid to heavy load dialing on 63 channels (3 PRI configures as 23B=1D). When a call is established it is send to a context where an IVR takes over.

We are seeing a problem where channels become unavailable one after the other until all channels are unavailable to dial on. When this happend even a manual dial (suing an IAX softphone) fails with a congestion. Stopping ans restarting asterisk "clears" all channels.


full_log.txt: Shows a call on channel 22 that got through (lines 1-95) and then some PRI warnings or error, and then an attempt to initiate a call on channel 22 that failed.
Comments:By: Andre Courchesne (acourchesne) 2006-10-02 12:15:08

Thanks for the update, but I would like to note that I am not experiencing a segfault not am I using IAX channels (except for debuggin purpose).

By: Andre Courchesne (acourchesne) 2006-10-02 12:42:51

Just uploaded 2 info file as I was able to get the system to fail. I took as an example channel 60.

pri_debug.txt shows the pri debug when trying to make a call from a IAX phone after the problem has occured.

channel_info.txt if a zap show channel(s) on the channels and channel 60

By: Sean (c0w) 2006-10-03 08:18:54

< Cause (len= 4) [ Ext: 1  Coding: CCITT (ITU) standard (0) 0: 0   Location: User (0)
<                  Ext: 1  Cause: Invalid number format (28), class = Normal Event (1) ]
-- Processing IE 8 (cs0, Cause)

I have seen this error message before you fix this by setting your pridialplan=unknown in asterisk/zaptata.conf


By: Andre Courchesne (acourchesne) 2006-10-04 07:35:22

Ok, the problem seems to be caused by placing to many call files at in /var/spool/asterisk/outgoing

If I put a 500ms to 750ms delay inbetween call file generation and checking that the number of files in /var/spool/asterisk/outgoing does not exceed the number of total channels, everything seems fine.

Went through 19700 dials without a glitch with this delay.

Is there anything about this documented ?

By: Andre Courchesne (acourchesne) 2006-10-05 10:49:55

Well, had other failures again after  around 36000 dials. The delay was 900ms using usleep... The most stable I got so far was with a sleep of 1 seconds.
I am restarting our dialer now with 1 second delay.

By: Andre Courchesne (acourchesne) 2006-10-05 23:16:18

Possible the same as http://bugs.digium.com/view.php?id=7870

By: Serge Vecher (serge-v) 2006-10-06 09:40:36

acourchesne: Looks like you've identified the problem pretty well. AFAIK, Asterisk will automatically "generate a call" for every call file placed in the outgoing dir. So if you queue too many, "bad things" will happen. Now, how to rememedy the situation is not exactly clear... Should the "excess" call files be "queued" until they are "ready" to be dialed? Are there mechanism in Asterisk to allow that ...? I think this bug needs to get some feedback from the development community at large. Can you please email the asterisk-dev mailing list to solicit some feedback on this?

By: Andre Courchesne (acourchesne) 2006-10-09 20:18:37

Well, posted on asterisk-dev on Oct 06 http://lists.digium.com/pipermail/asterisk-dev/2006-October/023799.html and no feedback yet.

For my 2cents, I would prefer that the file be queued untill asterisk is ready or has the ability to dial them.

By: Andre Courchesne (acourchesne) 2006-10-10 15:07:02

New developement...
In a more controlled system we were able to reproduce what we believe is happening. If a call is generated on let's say Zap/47, there was a 13 seconds delay of PRI D-Channel communication.

During that time our code was looking at the output of API "action: status" for the Zap channel status. When our code "saw" that the ZAP/47 was not used, we issued an other call on ZAP/47 and got the "Unable to request..." message.

Now, is it possible that the API "action: status" does not repost the line being used as soon as a Dial cmd is issued to it?

I have attached the file Appel_double.txt. Look at the following lines:
Line 1: First call on ZAP/47
Line 103: Second call on ZAP/47 (at that time action status was reporting ZAP/47 as unused)
Line 412: Zap/47 of first call was answered
Line 795: First call on Zap/47 completed.

By: jmls (jmls) 2006-11-12 12:19:51.000-0600

ping. housekeeping - did you get any feedback ?

By: Andre Courchesne (acourchesne) 2006-11-29 14:34:03.000-0600

No feedback.

We modified our code to access lines directly (Zap/x) instead of using the line group (Zap/gx). We keep an SQL table of the last time a particular channel was hangup and do not re-use this same channel for a 2 seconds period. This seems to work... for now...

By: Alex Richardson (alexrch) 2006-12-04 08:13:22.000-0600

@acourchesne: I'm facing a similar problem. Could you post your changes of the source code? Or would you rather send it to me over e-mail? (alexrixhardson@nospam.yahoo.com - delete nospam.)

Thank you

By: Serge Vecher (serge-v) 2007-02-28 13:39:18.000-0600

acourchesne: do you want to post those modifications in this bugnote?

By: Andre Courchesne (acourchesne) 2007-03-13 07:17:31

The modifications are not within Asterisk code but rather in our specific application. Here is how I do it:

Call files are generated and specific channels are used (i.e. Zap/15 and not Zap/g0)

- Before generating a call file, I check in an SQL table for an available line.
- If a line is free, I flag it in the same table as used and I generate a call file using that line.
- The dialer dialplan will, using an AGI script, reset the line flag in the SQL table when the call is completed.

The tricky part is making sure of all end points of the dialer dialplan so that the line flag is always reset when the line is hangup.

Using that technique, I was able to keep 4 PRI (82 lines for dial + 10 lines for transfers) busy using a Dual Xeon Core Duo Intel motherboard based server.

The CPU load calculation leads me to believe that between 8 to 10 PRI could be serviced by this code on this hardware platform.

By: Serge Vecher (serge-v) 2007-03-13 08:53:29

ok, if you are interested, you could document this solution and together with your agi scripts submit it for testing. If it works well, there is a place in ./contrib directory for these kinds of solutions.

By: Michiel van Baak (mvanbaak) 2007-09-07 09:22:53

Looks like acourchesne does not want to share the agi (which is fine)
If you still want to document and/or submit the agi you can reopen this bug.