Summary: | ASTERISK-00050: Out-of-service or stuck B channels on PRIs not handled correctly | ||
Reporter: | John Todd (jtodd) | Labels: | |
Date Opened: | 2003-08-05 15:42:27 | Date Closed: | 2008-01-15 14:44:10.000-0600 |
Priority: | Major | Regression? | No |
Status: | Closed/Complete | Components: | Core/General |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) disabled-bchan-test.txt | |
Description: | When B channels are taken out of service or that are "stuck" on a PRI (OOSNE on the Summa 4k VCO we are using) it is the case that Asterisk continues to send calls to the out-of-service channels and/or does not forward calls correctly around the bad channel(s). The switch then replies (or Asterisk determines, whichever) that the chosen channel is not available, and provides an alternate channel. Asterisk recognizes this, but fails to appropriate re-terminate the call on the new channel, and the call fails. This leads to multiple call failures on systems with "stuck" or "out-of-service" channels on a PRI. Several times I have had "stuck" channels that have not un-stuck themselves with the periodic resets. We tested by taking certain channels out of service and then seeing what happened when we tried to make calls outbound through the PRI. ****** ADDITIONAL INFORMATION ****** In this example, the first 8 B channels have been taken out of service on the switch side of things. You can see that the call is redirected to channel 9, but the call terminates instead of being sent through. Now, this may ultimately hinge back on my long-ago comments on why the administrator needs to have better error message reporting from the Dial command, but that is beside the point. :-) gw3*CLI> -- Executing SetCallerID("SIP/2403652270-f42a", "3015841518") in new stack -- Executing Dial("SIP/2403652270-f42a", "Zap/g1/12129040788") in new stack -- Making new call for cr 32775 > Protocol Discriminator: Q.931 (8) len=50 > Call Ref: len= 2 (reference 7/0x7) (Originator) > Message type: SETUP (5) > Bearer Capability (len= 3) [ Ext: 1 Q.931 Std: 0 Info transfer capability: Speech (0) > Ext: 1 Trans mode/rate: 64kbps, circuit-mode (16) > Ext: 1 User information layer 1: u-Law (34) > Channel ID (len= 5) [ Ext: 1 IntID: Implicit, PRI Spare: 0, Preferred Dchan: 0 > ChanSel: Reserved > Ext: 1 Coding: 0 Number Specified Channel Type: 3 > Ext: 1 Channel: 1 ] > Display (len= 1) [ > Display (len= 1) [ 1> Display (len= 1) [ 1 ] > Progress Indicator (len= 2) [ Ext: 1 Coding: CCITT (ITU) standard (0) 0: 0 Location: User (0) > Ext: 1 Progress Description: Calling equipment is non-ISDN. (3) ] > Calling Number (len=14) [ Ext: 0 TON: Unknown Number Type (0) NPI: Unknown Number Plan (0) > Presentation: Unknown (1) '3015841518' ] > Called Number (len=14) [ Ext: 1 TON: Unknown Number Type (0) NPI: Unknown Number Plan (0) '12129040788' ] -- Called g1/12129040788 < Protocol Discriminator: Q.931 (8) len=10 < Call Ref: len= 2 (reference 32775/0x8007) (Terminator) < Message type: CALL PROCEEDING (2) < Channel ID (len= 5) [ Ext: 1 IntID: Implicit, PRI Spare: 0, Exclusive Dchan: 0 < ChanSel: Reserved < Ext: 1 Coding: 0 Number Specified Channel Type: 3 < Ext: 1 Channel: 9 ] -- Processing IE 24 (Channel Identification) < Protocol Discriminator: Q.931 (8) len=14 < Call Ref: len= 2 (reference 32775/0x8007) (Terminator) < Message type: ALERTING (1) < Channel ID (len= 5) [ Ext: 1 IntID: Implicit, PRI Spare: 0, Exclusive Dchan: 0 < ChanSel: Reserved < Ext: 1 Coding: 0 Number Specified Channel Type: 3 < Ext: 1 Channel: 9 ] < Progress Indicator (len= 2) [ Ext: 1 Coding: CCITT (ITU) standard (0) 0: 0 Location: Public network serving the local user (2) < Ext: 1 Progress Description: Inband information or appropriate pattern now available. (8) ] -- Processing IE 24 (Channel Identification) -- Processing IE 30 (Progress Indicator) -- Moving call from channel 1 to channel 9 == No one is available to answer at this time > Protocol Discriminator: Q.931 (8) len=9 > Call Ref: len= 2 (reference 7/0x7) (Originator) > Message type: DISCONNECT (69) > Cause (len= 2) [ Ext: 1 Coding: CCITT (ITU) standard (0) 0: 0 Location: Private network serving the local user (1) > Ext: 1 Cause: Normal Clearing (16), class = Normal Event (1) ] -- Hungup 'Zap/1-1' < Protocol Discriminator: Q.931 (8) len=9 < Call Ref: len= 2 (reference 32775/0x8007) (Terminator) < Message type: RELEASE (77) < Cause (len= 2) [ Ext: 1 Coding: CCITT (ITU) standard (0) 0: 0 Location: User (0) < Ext: 1 Cause: Normal Clearing (16), class = Normal Event (1) ] -- Processing IE 8 (Cause) > Protocol Discriminator: Q.931 (8) len=5 > Call Ref: len= 2 (reference 7/0x7) (Originator) > Message type: RELEASE COMPLETE (90) gw3*CLI> gw3*CLI> | ||
Comments: | By: John Todd (jtodd) 2003-08-05 18:19:32 Additional notes: I have tested this with pri_cpe and pri_net with the same results. (this was due to other debugging I was doing) By: John Todd (jtodd) 2003-09-08 21:47:23 citats asked about this problem on IRC - don't remember if he had the same issues or not... By: John Todd (jtodd) 2003-09-29 03:32:35 Still pending. I suppose that deactivation of B channels wasn't tested in the latest round of certifications... :-) By: Brian West (bkw918) 2003-11-22 16:13:26.000-0600 jtodd are you still having this problem? By: John Todd (jtodd) 2003-12-01 20:53:13.000-0600 Yes, I'm stil having this problem. By: tlieberman (tlieberman) 2003-12-02 07:04:05.000-0600 Reminder sent to markster Mark, I've been having this problem for 3 months now. Could it be my providor? I have been working with John but I think this is my problem. Can I offer an incentive to help this issue get a priority bump? Regards, Todd. [digium.com ASTERISK-1897] Help w/Busy Out Connect to the asterisk console with it logging like so: asterisk -vvvr|tee /tmp/astlog Then do: pri debug span 1 When this problem happens again do the following command: tail -1000 /tmp/astlog > astlog2 Then send us the astlog2 file. This is strange that it happens at the same time on monday mornings. It looks like your telco is doing something weird but we can look at this new log file for you. Regards John By: John Todd (jtodd) 2004-01-09 21:21:27.000-0600 so, I'm informed that "some work" has been done on this bug, though there are no notes to that extent. I will attempt, in the next few weeks, to talk with my vendor and have them test bringing down several PRI B channels out of a group to see if the failure mode still exists. By: John Todd (jtodd) 2004-01-09 21:55:56.000-0600 Well, I got lucky and managed to find a helpful switch jockey who assisted me with testing this again. Same results, as expected. We turned off channels 1-8, but outbound calls still tried to go through channel 1. Despite "moving call..." messages, the call failed. See attached file. By: Brian West (bkw918) 2004-01-09 21:57:36.000-0600 Ok maybe this will help nail it down :) By: jrollyson (jrollyson) 2004-01-24 16:05:08.000-0600 2 sightings of problem, with solid debugging info, marking as a Confirmed bug ;) By: Brian West (bkw918) 2004-02-01 19:24:26.000-0600 jtodd can you contact kram so he can login and watch this as it happens. By: John Todd (jtodd) 2004-02-01 21:02:03.000-0600 Tom, maybe you can help with this, since having me in the middle will slow down the process. The problem is easy to duplicate, and you have control of a switch on which to test. Can you see if you can get Mark to log into your machine to show these symptoms? By: steveu (steveu) 2004-02-22 11:13:00.000-0600 The key problem seems to be this. In pri_fixup you will find the lines: if (pri->pvt[channel]->owner) { pri->pvt[channel]->owner->pvt->pvt = pri->pvt[channel]; pri->pvt[channel]->owner->fds[0] = pri->pvt[channel]->subs[SUB_REAL].zfd; } else There is something missing here. It needs to be more like this: if (pri->pvt[channel]->owner) { pri->pvt[channel]->owner->pvt->pvt = pri->pvt[channel]; pri->pvt[channel]->owner->fds[0] = pri->pvt[channel]->subs[SUB_REAL].zfd; pri->pvt[channel]->subs[SUB_REAL].owner = pri->pvt[x]->subs[SUB_REAL].owner; } else With this change I can reselect B-channels, and everything seems to work OK. I'm not sure if anything else should actually be updated in this loop, though. By: steveu (steveu) 2004-02-22 11:53:30.000-0600 Thinking about this some more I guess a more polished solution is: if (pri->pvt[channel]->owner) { pri->pvt[channel]->owner->pvt->pvt = pri->pvt[channel]; pri->pvt[channel]->owner->fds[0] = pri->pvt[channel]->subs[SUB_REAL].zfd; pri->pvt[channel]->subs[SUB_REAL].owner = pri->pvt[x]->subs[SUB_REAL].owner; pri->pvt[channel]->subs[SUB_CALLWAIT].owner = pri->pvt[x]->subs[SUB_CALLWAIT].owner; pri->pvt[channel]->subs[SUB_THREEWAY].owner = pri->pvt[x]->subs[SUB_THREEWAY].owner; pri->pvt[x]->subs[SUB_REAL].owner = NULL; pri->pvt[x]->subs[SUB_CALLWAIT].owner = NULL; pri->pvt[x]->subs[SUB_THREEWAY].owner = NULL; } else I think in reality what I posted before in adequate, but this seems the more complete solution By: Mark Spencer (markster) 2004-02-22 15:54:29.000-0600 The one line is the correct one since we check already to be sure it's not in use. By: Digium Subversion (svnbot) 2008-01-15 14:44:09.000-0600 Repository: asterisk Revision: 2213 U trunk/channels/chan_zap.c ------------------------------------------------------------------------ r2213 | markster | 2008-01-15 14:44:09 -0600 (Tue, 15 Jan 2008) | 2 lines Small but important fix for channel relocation (bug ASTERISK-50) ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=2213 By: Digium Subversion (svnbot) 2008-01-15 14:44:10.000-0600 Repository: asterisk Revision: 2214 U branches/v1-0_stable/channels/chan_zap.c ------------------------------------------------------------------------ r2214 | markster | 2008-01-15 14:44:10 -0600 (Tue, 15 Jan 2008) | 2 lines Small but important pri_fixup addition (bug ASTERISK-50, thangs steveu!) ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=2214 |