Summary:ASTERISK-05043: [patch] Alarm on on idle PRI span that recovers quickly will likely leave the PRI D-Channel stuck in "Down" status
Reporter:Steve Davies . (stevedavies)Labels:
Date Opened:2005-09-10 06:18:11Date Closed:2008-01-15 15:47:52.000-0600
Versions:Frequency of
Environment:Attachments:( 0) asterisk-dontmarkdchandown.patch
Description:If a PRI span sees a layer1 alarm that clears quickly, then the D-channel on that span can end up stuck in Down status.  This is more likely on a otherwise idle span.

We've seen various reports of mysteriously down spans off and on on asterisk-users.  This is, I'm sure, the cause.

The failure is pretty easy to duplicate:
 1) Bring up two PRI ports on an Asterisk box with a cross-over PRI cable between them
 2) Confirm that both ports show Provisioned, Up, Active
 3) Disconnect the cable and reconnect it within a couple of seconds
 4) Check the spans again, likely that they are now Provisioned, Down, Active
 5) Wait as long as you like and note that the spans never come back up.

If this doesn't happen the first time then try again.  The alarm has to come and go before libpri decides that the d-channel is down.

Here's the reason for the problem:  In chan_zap, the D-chan is marked down if a layer1 alarm is detected on the underlying timeslot.  But the D-chan is only marked up again when libpri sends through a PRI event of some sort.

But if the alarm comes and goes in between quickly enough, libpri never notices anything wrong, so from libpri's point of view nothing happened, so no events are sent through and the span ends up stuck down.

I think this is an error of mixing the layers.  In this case the D-channel never actually did go down.  So chan_zap shouldn't have marked it down.  libpri is quite able and does detect that the d-channel is down by timers and so on - and when that happens it tells chan_zap via events.

So - my fix is simply to change chan_zap not to mark the dchan down.

The span is still marked as not NOTINALARM (ie INALARM), so won't be used for calls.  And if the alarm persists libpri will soon notice the failure and tell us that the d-chan is down.  Then, we are in step and will always hear from libpri when it manages to bring it up again.

Disclaimer on file.

Comments:By: Edwin Groothuis (mavetju) 2005-09-11 05:16:20

This might also be the cause of http://lists.digium.com/pipermail/asterisk-users/2005-August/122299.html

By: Mark Spencer (markster) 2005-09-12 00:19:27

I'm thinking the right answer may be to restart the PRI when coming out of alarm *if* the backup d-channel (if configured) isn't already online.

By: Steve Davies . (stevedavies) 2005-09-12 13:52:40


Why do you want to restart the span?  The D-channel didn't go down from our POV.  No frames were lost during the alarm period, or not enough for libpri decide it was down.

So when the alarm clears we can just carry on...

If the other end thinks the D-channel went down then IT will restart things.


By: Mark Spencer (markster) 2005-09-12 22:21:56

I've fixed this "properly" I think in CVS head.  PLease confirm the fix works in your test scenarios, thanks!  You'll need latest libpri *and* Asterisk.

By: Digium Subversion (svnbot) 2008-01-15 15:47:52.000-0600

Repository: asterisk
Revision: 6567

U   trunk/channels/chan_zap.c

r6567 | markster | 2008-01-15 15:47:52 -0600 (Tue, 15 Jan 2008) | 2 lines

Try a more generally correct solution, for NFAS (bug ASTERISK-5043)