Summary:ASTERISK-00131: PRI channels die under load
Reporter:casey0999 (casey0999)Labels:
Date Opened:2003-08-20 10:31:15Date Closed:2004-09-25 02:46:18
Versions:Frequency of
Environment:Attachments:( 0) 1.txt
( 1) 2.txt
( 2) 3.txt
( 3) 4.txt
( 4) AstLog5.txt.gz
Description:Running latest (Aug19) CVS.  B channels on E1 die under a call load greater than 10 simultaneous calls on a span.  Running a simple IVR application (answer, playback, hangup) Various errors logged (see files attached).  Dead channels do not restart properly unless asterisk restarted.


File 1 shows first error indication:  Rejected frame is retransmitted on wrong frame?? (see Warning message)
Comments:By: casey0999 (casey0999) 2003-08-20 10:34:23

File 2 shows another error occurring logged in "intense" mode.  I tried to capture the first error occurring under load, because errors seem to be cumulative.

By: casey0999 (casey0999) 2003-08-20 10:36:15

File 3 shows strange "hangup on channel -1" error.

By: casey0999 (casey0999) 2003-08-20 14:42:12

Complete debug file sent to Martin as requested (5 minutes of incoming calls under heavy load), shows a number of warnings, and two channels apparently are locked up after this period.

Note many occurences of frame errors on a particular frame # and "re-transmitting frame" messages to frame# + 1, frame# + 2, etc - is this a bug?

By: x martinp (martinp) 2003-08-20 17:54:25

I still didn't receive the log from you. Can you paste it to bugtracker or send it once again to martinp@digium.com ?

By: casey0999 (casey0999) 2003-08-20 18:01:27

I have added it here.  Just gunzip it!
I've also emailed it thanks

By: x martinp (martinp) 2003-08-21 15:20:54

How do you know that the two channels are locked ? You shouldn't look at the callerid, extension and context when you do "zap show channel <ch_no>". If the PRI flag is set to Call then it's locked. Also "zap show channels" doesn't work correctly.

By: x martinp (martinp) 2003-08-21 15:22:23

As to the frames being rejected the q921 protocol is like TCP -> makes sure the message it's carrying will get to the destination. So it looks like the other side rejects our frames (they might be busy ...) but we retransmit them and that shouldn't be regarded as error.

By: casey0999 (casey0999) 2003-08-21 15:41:11

Customer reported 2 locked channels, ie: unable to connect further calls to asterisk.  Martin has requested further detail on this to determine whether an asterisk or customer issue. Will provide in next test -Scott

By: casey0999 (casey0999) 2003-08-26 17:45:35

Have just completed test under very heavy call load (72,000 incoming 4-second calls/hour to 4 E1's!)- generated by another Asterisk system.  Results to be summarized separately, but the general result is that receiving channels lock up (show busy state to PBX) fairly often under medium to heavy load, but are almost always cleared after a period of time (presumably by the built-in code that restarts each B channel periodically).  Lock up may be related to the following error received on the receiving end (still trying to find out):
"WARNING: File chan_zap.c, Line 5482 (zt_pri_error):  PRI Read on 137 failed, unknown error 500.   PRI: Got event 8"
I will provide more information on the stuck channels, but the good news is that they seem to heal themselves.

By: x martinp (martinp) 2003-08-26 17:50:49

But how do you know that they're locked ? Can you send a debug/trace that you think is showing that the channels are locked (preferabely PRI trace of layer 3)

By: John Todd (jtodd) 2003-09-12 18:28:11

Casey0999 - this has some specific interest to me.  Can you forward the notes to the report here so Martin can take a look?

By: John Todd (jtodd) 2003-09-29 03:30:57

Is this still pending?  Has there been any progress by either Digium or casey0999 towards solving this, or finding out what the root cause is?