[Home]

Summary:ASTERISK-03920: D-channel knocked down by remote zap channels
Reporter:jharragi (jharragi)Labels:
Date Opened:2005-04-12 12:00:42Date Closed:2011-06-07 14:10:23
Priority:MinorRegression?No
Status:Closed/CompleteComponents:Core/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:
Description:AstN(=1,2...) -> IAX2 -> AstD -> Merlin_Legend_5ess_PRI

Call originating at certain Zap channels or groups at any one of the AstN locations cause the PRI's (between AstD and the Merlin Legend) d-channel to go down dropping all in progress calls and reseting the b-channels.
I initially thought this problem was one end of the AstD PRI's configuration (and tried several flavors of PRI & nsf) but I am also getting some messages that have been observed by other ast users:
Apr 12 12:15:45 NOTICE[31627]: chan_zap.c:7395 pri_dchannel: PRI got event: HDLC Abort (6) on Primary D-channel of span 1
Apr 12 12:15:45 WARNING[31627]: chan_zap.c:7149 zt_pri_error: PRI: !! Got reject for frame 120, retransmitting frame 120 now, updating n_r!

...that appear to be related. Also the problem seems that it might be related to the CID strings - Is there a string size limit for the strings that asterisk allows to exceed?

I can't hack around to much as I have been having interrupted calls for the last week and am now avoiding routes that kill the pri and can't rock the boat too much.

****** ADDITIONAL INFORMATION ******

Remote boxes are on HEAD (AstD was also but is currently Stable but the problem remains).

Here is some zapata.conf from the first location, AstHighSchool, causing the trouble. Calls originating on the PRI work fine, those on group 1 bring down the channels at the remote PRI:
emdigitwait=3500
signalling=em_w
context=toshiba
group=1
callerid="Monroe-Woodbury High School" <845-460-7000>
channel=>1-24

switchtype = national
signalling = pri_cpe
context=hs-pri-in
group = 2
channel => 25-47

Now calls originating on this box, AstHarCtr, channels 8 or 9 are accepted. Calls originating on channel 7 cause the remote d-channel to reset - dropping all calls on that PRI. Notice the extra " typo on the two working channels in zapata.conf...

group=11
callgroup=11
pickupgroup=11,12
callerid="HC FS 6111 dmauriel" <845-460-6110>
mailbox=6111
channel=7
callerid="HC SD 6112 "lgorman" <845-460-6110>
mailbox=6112
channel=8
callgroup=12
callerid="HC SD 6113 "aansons" <845-460-6110>
mailbox=6113
channel=9

Now here is another zapata.conf snip at another location AstMicro that brings down the pri on the following cli:
callerid="MCC 6625 mrivelli" <845-460-6600>
mailbox=6625
channel=17


< Protocol Discriminator: Q.931 (8)  len=5
< Call Ref: len= 2 (reference 32869/0x8065) (Terminator)
< Message type: ALERTING (1)
   -- Zap/2-1 is ringing
< Protocol Discriminator: Q.931 (8)  len=5
< Call Ref: len= 2 (reference 32869/0x8065) (Terminator)
< Message type: CONNECT (7)
> Protocol Discriminator: Q.931 (8)  len=5
> Call Ref: len= 2 (reference 101/0x65) (Originator)
> Message type: CONNECT ACKNOWLEDGE (15)
   -- Zap/2-1 answered Zap/22-1
   -- Attempting native bridge of Zap/22-1 and Zap/2-1
   -- Accepting unauthenticated call from 192.168.32.111, requested format = 4, actual format = 4
   -- Executing Dial("IAX2/mcc@mcc/1", "Zap/g1/6425|60") in new stack
-- Making new call for cr 32870
> Protocol Discriminator: Q.931 (8)  len=60
> Call Ref: len= 2 (reference 102/0x66) (Originator)
> Message type: SETUP (5)
> [04 03 80 90 a2]
> Bearer Capability (len= 5) [ Ext: 1  Q.931 Std: 0  Info transfer capability: Speech (0)
>                              Ext: 1  Trans mode/rate: 64kbps, circuit-mode (16)
>                              Ext: 1  User information layer 1: u-Law (34)
> [18 03 a1 83 83]
> Channel ID (len= 5) [ Ext: 1  IntID: Implicit, PRI Spare: 0, Preferred Dchan: 0
>                        ChanSel: Reserved
>                       Ext: 1  Coding: 0   Number Specified   Channel Type: 3
>                       Ext: 1  Channel: 3 ]
> [20 02 00 e6]
> Network-Specific Facilities (len= 2) [ ACCUNET Switched Digital Service ]
> [28 12 b1 4d 43 43 20 36 36 32 35 20 6d 72 69 76 65 6c 6c 69]
> Display (len=18) Charset: 31 [ MCC 6625 mrivelli ]
> [6c 0c 21 81 38 34 35 34 36 30 36 36 30 30]
> Calling Number (len=14) [ Ext: 0  TON: National Number (2)  NPI: ISDN/Telephony Numbering Plan (E.164/E.163) (1)
>                           Presentation: Presentation permitted, user number passed network screening (1) '8454606600' ]
> [70 05 a1 36 34 32 35]
> Called Number (len= 7) [ Ext: 1  TON: National Number (2)  NPI: ISDN/Telephony Numbering Plan (E.164/E.163) (1) '6425' ]
   -- Called g1/6425
 == Primary D-Channel on span 1 down
Apr 12 09:04:09 WARNING[31627]: chan_zap.c:1931 pri_find_dchan: No D-channels available!  Using Primary on channel anyway 24!
NEW_HANGUP DEBUG: Calling q931_hangup, ourstate Active, peerstate Connect Request
> Protocol Discriminator: Q.931 (8)  len=9
> Call Ref: len= 2 (reference 100/0x64) (Originator)
> Message type: DISCONNECT (69)
> [08 02 81 90]
> Cause (len= 4) [ Ext: 1  Coding: CCITT (ITU) standard (0) 0: 0   Location: Private network serving the local user (1)
>                  Ext: 1  Cause: Normal Clearing (16), class = Normal Event (1) ]
NEW_HANGUP DEBUG: Destroying the call, ourstate Disconnect Request, peerstate Disconnect Indication
NEW_HANGUP DEBUG: Calling q931_hangup, ourstate Active, peerstate Connect Request
> Protocol Discriminator: Q.931 (8)  len=9
> Call Ref: len= 2 (reference 101/0x65) (Originator)
> Message type: DISCONNECT (69)
> [08 02 81 90]
> Cause (len= 4) [ Ext: 1  Coding: CCITT (ITU) standard (0) 0: 0   Location: Private network serving the local user (1)
>                  Ext: 1  Cause: Normal Clearing (16), class = Normal Event (1) ]
NEW_HANGUP DEBUG: Destroying the call, ourstate Disconnect Request, peerstate Disconnect Indication
NEW_HANGUP DEBUG: Calling q931_hangup, ourstate Call Initiated, peerstate Overlap sending
> Protocol Discriminator: Q.931 (8)  len=9
> Call Ref: len= 2 (reference 102/0x66) (Originator)
> Message type: DISCONNECT (69)
> [08 02 81 90]
> Cause (len= 4) [ Ext: 1  Coding: CCITT (ITU) standard (0) 0: 0   Location: Private network serving the local user (1)
>                  Ext: 1  Cause: Normal Clearing (16), class = Normal Event (1) ]
NEW_HANGUP DEBUG: Destroying the call, ourstate Disconnect Request, peerstate Disconnect Indication
NEW_HANGUP DEBUG: Calling q931_hangup, ourstate Active, peerstate Active
> Protocol Discriminator: Q.931 (8)  len=9
> Call Ref: len= 2 (reference 32804/0x8024) (Terminator)
> Message type: DISCONNECT (69)
> [08 02 81 90]
> Cause (len= 4) [ Ext: 1  Coding: CCITT (ITU) standard (0) 0: 0   Location: Private network serving the local user (1)
>                  Ext: 1  Cause: Normal Clearing (16), class = Normal Event (1) ]
NEW_HANGUP DEBUG: Destroying the call, ourstate Disconnect Request, peerstate Disconnect Indication
NEW_HANGUP DEBUG: Calling q931_hangup, ourstate Active, peerstate Active
> Protocol Discriminator: Q.931 (8)  len=9
> Call Ref: len= 2 (reference 32802/0x8022) (Terminator)
> Message type: DISCONNECT (69)
> [08 02 81 90]
Comments:By: Paul Cadach (pcadach) 2005-04-12 12:13:49

Is your configuration works anytime before? Which hardware you uses to handle PRIs at Asterisk's side (AstD, as I understand)? Any messages on your syslog related to zaptel?

By: Matthew Fredrickson (mattf) 2005-04-12 12:43:52

This might be better directed to Digium support, but check these things in this order:
1.) Make sure that the card is not sharing interrupts with any other cards on your system (`cat /proc/interrupts`)
2.) Make sure that (if you're using IDE devices) that the IDE interrupt interrupting the T1 card's interrupt (hdparm -u1 /dev/hdwhichever).  I don't know if there's a way to do that in SCSI systems.
3.) Make sure that you're not running X or frame buffer console.
4.) Make sure that your timing is correct in /etc/zaptel.conf (i.e. if  you're CPE, you most likely should be drawing timing from the telco and your span line should look like this: span=1,1,0,esf,b8zs.  The 1 in the second column means that you use that span as a primary sync source).

By: jharragi (jharragi) 2005-04-12 14:04:32

The machine having the trouble only connects IAX2 or tdm to the merlin_legend pri. The asterisk box has a tor2 (on a dedicated irq) with the single T in use configured as pri_net. The it had been working reliably for a long time as wink start tie line T but I wanted to utilize isdn features. It became real confusing because I also updated all of the asterisk boxes over spring break (we are a school district - everything seemed ok) and initially I was sure the trouble was on the new configuration of the merlin_legend. The pc is dedicated for asterisk in text mode.

span=1,0,3,esf,b8zs

Note that this is the only zap device and no clock source is specified. I've always thought it odd that this isn's specified, nevertheless, zttool reports that the tor2 is interenally clocked. No irq misses and 1 bipolar violation (I'm not sure I've seen these before on this machine).

By: Paul Cadach (pcadach) 2005-04-12 14:20:49

All TDM networks are true synchronous, so you should have syncronization source anyway - just zaptel card which is clock source for PBX or PBX is clock source for zaptel. If PBX plays clock source role you should update your zaptel.conf to indizate tor2 card is not self-clocked (i.e. span=1,1,3,esf,b8zs).

By: jharragi (jharragi) 2005-04-12 14:47:53

As pri_net the asterisk box is master. At any rate I have the legend set as slave. Also it looks doubtful that it is timing problems as calls originating at another machine predictably succeed or fail depending on what zap channel (or channel type) they are generated on. For instance the calls passing through the AstHighSchool that work come from our local telco:

telco_national_pri -> AstHighSchool -> AstD -> Merlin_Legend
works, while
Toshiba_424_wink_T -> AstHighSchool -> AstD -> Merlin_Legend
fails

and at another building,

fxo_bad_CID_string -> AstHarCtr -> AstD -> Merlin_Legend
works, while...
fxo_good_CID_string -> AstHarCtr -> AstD -> Merlin_Legend
fails

By: jharragi (jharragi) 2005-04-12 15:37:40

if I capture the fxo_bad_CID_string to a working asterisk system from the 2 caller id strings

asterisk -rx 'show channel IAX2/hc@hc-4' > aanson
asterisk -rx 'show channel IAX2/hc@hc-2' > dmauriel

pt-ast:~ # diff aanson dmauriel

<  Caller ID Name: HC SD 6113
---
>  Caller ID Name: HC FS 6111 dmauriel

I simply get a shorter string. There appeared to be no other significant difference. Got to leave for today so I can't test this out as I have to wait until the system is mostly unused. But does anyone know if there is historic ISDN CID Name string lengths?

By: jharragi (jharragi) 2005-04-12 17:21:53

...ahh. It is definitely related to CID Name. Shortening
callerid="Monroe-Woodbury High School" <845-460-7000>
to
callerid="MSCSD-HS" <845-460-7000>
now allows
Toshiba_424_wink_T -> AstHighSchool -> AstD -> Merlin_Legend
to work
...so it is either a bug in the MerlinLegend's or asterisk's implimentation of pri. Either way asterisk is going to have to truncate the string (as a patch for the old merlin is unlikely - unless it has a limitation variable set somehow). I will see if I can find the string limits for various flavors of PRI.

By: Matthew Fredrickson (mattf) 2005-04-12 19:10:24

Hrm... presumably, it should be truncated at the libpri level.  Can you find out what the magic length is that causes the D-channel to flip flop?  If so, we can update the code so that it automatically truncates if it's too long.

By: Kevin P. Fleming (kpfleming) 2005-04-13 00:49:16

I've been told by my telco contacts that the "normal" maximum length for a CNAM string is 15 characters. I don't know what standard, if any, specifies that, though.

By: Paul Cadach (pcadach) 2005-04-13 01:36:57

This is definitely MerinLegend's bug. Existing standards defines length for calling name dual: 15 symbols with moving to 63 (as I remember) symbols. So, limiting of length of received calling name is task for receiver side, not for transmitting one (because transmitting side could be compliant with newer standards while receiving not).

By: jharragi (jharragi) 2005-04-13 09:46:48

Maybe bug is to harsh a word. The Merlin just needed to connect to '96 gear (or whatever year) asterisk's connectivity continues to expand. So let's call it currently unsupported legacy gear - but obviously there is no reason to exclude it. I've got a quick patch that just inserts a null in the callername string. I'm about to test it. Can you reload libpri on the fly? ...or must you restart asterisk or the zap driver.

edited on: 04-13-05 09:53

By: Matthew Fredrickson (mattf) 2005-04-13 09:50:53

You'll have to restart Asterisk for a code update in libpri to work.

By: jharragi (jharragi) 2005-04-13 11:06:36

...ok this change seems to be keeping it up. I'll look at making a more general solution next week. Any suggestions regarding #ifdef OldMerlnLegend vs. a configurable solution? I'm also still getting the frame retransmission (which may be related to the older isdn implimentation too) but can't investigate this at the moment.

diff q931.c.orig q931.c
2223c2223
<               if (req->callername)
---
>               if (req->callername) {
2225c2225,2226
<               else
---
>                       c->callername[15] = 0;
>               } else

By: Matthew Fredrickson (mattf) 2005-04-13 11:39:47

You probably wouldn't want to truncate it there, if we did make an option to truncate it.  That would break long CNAMs for other switches that do support CNAMs of longer length (see associated transmit_display function).

By: jharragi (jharragi) 2005-04-13 12:33:26

Thanks, Yea this was just a hack to get through the problem at hand. I noticed there was libpri work happening recently. If this can be done easily while you are thinking about the 'big picture' I'm happy to step aside.

By: Paul Cadach (pcadach) 2005-04-13 12:40:30

Also, transmission of Caller name on CPE interfaces is prohibited by specifications (Q.931, etc.), and should be configurable too (for some "advanced" systems like Cisco).

By: Mark Spencer (markster) 2005-04-15 01:50:26

This should be able to be done with dialplan logic.  Nice detective work in finding the problem, everyone.

By: jharragi (jharragi) 2005-04-15 08:43:56

I've looked at another legend - that does accepts longer CIDs ( 18 char - I'll try sending it longer strings next week just to make sure that wasn't a fluke). I have not compaired T interface vintages yet - but apparently they were making the transition to the longer strings. So for the record, this is probably an old isdn thing - rather than merlin-legend.

edited on: 04-15-05 08:45

By: Matthew Fredrickson (mattf) 2005-04-15 10:00:06

Can we close this out now?

By: jharragi (jharragi) 2005-04-15 13:12:53

One more tangental thought I wanted to jot down but can't investigate - gotta do my taxes :^( - anyway, you guys might know off the top of your head is:
What is bringing the dchannel down, is the remote end just dropping or is asterisk deciding to do this? If so, maybe a less harsh response is in order. Initially I was thinking it would be resonable to (re)transmit a most generic SETUP or better shift into truncate (or simplified) mode since the problem woun't go away and post a notice. But this is moving into the area of autoprobing which might not be resonable in libpri - but it could be cool. It also might be a resonable pursuit to be able to utilize asterisk to probe a T and write auto or suggested confs (knowing how hard it can be to set some of this stuff up, particularly for newer users - or for instance, even have a telco tell you what they are feeding you). Just jabbering... John

By: jharragi (jharragi) 2005-04-19 11:54:28

...by dialplan logic, do you mean code or configuration? It is easy enough to do:
SetCIDName(${CALLERIDNAME:0:15})
...and of course it can be worked into a more complex dialplan - but it looks like an animal trap waiting to be stepped on in the future. My boss is pushing GUI configurator - they just don't understand!

By: Matthew Fredrickson (mattf) 2005-04-20 10:59:04

That's what I would assume Mark had in mind.  There isn't really a clean way to get around it.  It'd be a big mess to put the switch specific nuances for every little private-vendor PBX out there.  It makes a lot more sense to just put it in the dialplan.

By: Kevin P. Fleming (kpfleming) 2005-04-20 11:07:08

I agree, this should happen before the call hits chan_zap, since you are aware of the problem with that switch. If there are more reports of this problem, it's possible libpri could be extended to support a 'maximum CNAM length' on a per-span basis.