[Home]

Summary:ASTERISK-01814: T1 PRI chronic Red Alarms, but no alarm on zttool utility
Reporter:fcofer (fcofer)Labels:
Date Opened:2004-06-13 18:50:53Date Closed:2011-06-07 14:04:54
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Core/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:
Description:Erratic red alarm T1 PRI on asterisk, but zttool running concurrently during alarm shows no errors, irq misses, or alarms, on any span.

Using asterisk and quad Digium T405P, configured as follows:

Span 1 connects to ISDN PRI (fractional 8 B channels, D channel 24).
Span 2 connects to T1 Mux and analog stations.
Span 3 connects to ISDN PRI Nortel BCM hybrid key system digital trunk.
Span 4 is not configured.

Host system is Athlon > 1.8GHz, 512MB running Gentoo Linux 2.4.20 vanilla kernel, RAID 1 (software Linux), 2 x IDE 80GB.  T1 card shares no interrupt.

After 12 hours to several days, asterisk detects a red alarm on the configured channels 1 through 8 of span 1 (ISDN PRI.)  Concurrently run zttool shows no alarms, no irq misses, and no errors on any span.
When the alarm is first detected, it "bounces" several times, then quits for 6 hours or so, then recurs.  Suspecting a timing problem, the configuration has been repeatedly checked for proper clocking (to CO span 1) and this is also verified by zttool which shows sync source as "Card 0, span1" on all configured spans.  The original Digium TE410P Quad card was replaced with a Digium T405P with no improvement.  Telco has replaced both Adtran HDSL2 smartjack and repeater cards.  Problem is independent of traffic and occurs late night, early morning, weekends or during load.  Simulated load of CPU does not produce any failure and CPU load is otherwise negligible.  The utility zttest has been run on the new T405P card with no errors.  Zttool shows red alarm if a span is disconnected, LB when under loopback test, but otherwise shows no errors on any span.

****** ADDITIONAL INFORMATION ******

Span 1 connects to telco Adtran HDSL2-R.  Numerous loop around tests and 30m span pattern tests were performed over several days, and the span tested clear in both directions through to CSU with no errors.
Span 2 connects to a Zhone T1 mux and shows no alarms or errors, either from asterisk console or messages (zap channels 25 -48).  
Span 3 connects to Nortel BCM PRI (fractional 8 channels, plus D on 24(channels 9-23 are deprovisioned); it likewise shows no errors.
Span 4 is not connected and not configured.  It has been swapped with Span 1 on occasion with no improvement.

Upon occurrence of an red alarm condition on Span 1, all calls are dropped. Successive call attempts during the red alarm condition encounter 120IPM congestion tone. However, calls can still be made between BMC (Span 3) and the T1 (Span 2) when the red alarm is reported by the asterisk console on Span 1. Specifically, during a red alarm, only service through Span 1 is affected.

Here is an example from the messages log (grepped for "channel 1: Red"):

Jun 10 04:02:29 WARNING[163851]: Detected alarm on channel 1: Red Alarm
Jun 12 21:02:17 WARNING[163851]: Detected alarm on channel 1: Red Alarm
Jun 12 21:05:12 WARNING[163851]: Detected alarm on channel 1: Red Alarm
Jun 12 21:08:23 WARNING[163851]: Detected alarm on channel 1: Red Alarm
Jun 12 21:47:01 WARNING[163851]: Detected alarm on channel 1: Red Alarm
Jun 12 21:58:15 WARNING[163851]: Detected alarm on channel 1: Red Alarm

... the alarm continues bouncing, then eventually abates...

Jun 13 01:44:57 WARNING[163851]: Detected alarm on channel 1: Red Alarm
Jun 13 01:52:19 WARNING[163851]: Detected alarm on channel 1: Red Alarm
Jun 13 12:15:15 WARNING[163851]: Detected alarm on channel 1: Red Alarm
Jun 13 12:20:32 WARNING[163851]: Detected alarm on channel 1: Red Alarm
Jun 13 12:25:45 WARNING[163851]: Detected alarm on channel 1: Red Alarm
Jun 13 12:40:02 WARNING[163851]: Detected alarm on channel 1: Red Alarm
Jun 13 12:57:49 WARNING[163851]: Detected alarm on channel 1: Red Alarm

..  it then subsides for several hours, e.g., none further as of Sun Jun 13 17:39:39 EDT 200.

Debug output is apparently is suppressed during alarm. (This has been reported on bug report separately).

Configuration (/etc/zapata.conf)
span=1,1,3,esf,b8zs  
span=2,0,0,esf,b8zs
span=3,0,0,esf,b8zs
#span=4,1,3,esf,b8zs

NOTES
All cables, protectors, and the like, have been verified, reterminated or swapped. However, these types of errors should have been discovered via a short-term error test.

The server and all connected telephony equipment is powered by two separate UPS's, and report no downtime during the red alarm events.

Except during artificial load tests, the server's CPU usage has never risen above 1%.

Error ASTERISK-496 errors occur sporadically, but without any clear relationship to the red alarms. UDMA was reduced from UDMA 5 to UDMA 3, but this failed to correct the error ASTERISK-496 problem.

Debug and warning message logs have been retained. Inexplicably, the red alarms appear to occur with no external stimulus.

QUESTIONS
1. How can the asterisk messages log show a red alarm, yet the zttool utility (running concurrently and watched during alarm transition) shows no red alarm?

2. What are the conditions that asterisk uses to declare red alarms? How does this differ from the zttool utility?

3. Any other ideas?
Comments:By: Mark Spencer (markster) 2004-06-13 20:48:18

No CPU utilization, interrupt misses, or anything even remotely of the sort can cause a red alarm.  Your red alarms occur frequently and randomly, making it unlikely to be any sort of test condition placed on the line.  This is almost certainly a problem with your T1 line.  How long does the line stay in alarm when it does go into alarm?  If you have a PC with a serial cable you can look at the HDSL2 unit and it will show you the signal strength locally and remotely on both loops of your HDSL circuit.  If one is solid, and the other is not, you have an open circuit in your HDSL circuit (this one bit me back when I was in Auburn).

Ask your telco to place a T-bird on your line and watch for the error.

By: Mark Spencer (markster) 2004-06-13 20:49:35

In any case, this is a technical support issue and not a bug tracker issue.  If for any reason you are unable to get the assistance you need from technical support, find me on IRC.