[Home]

Summary:DAHLIN-00025: [patch] bad master selection
Reporter:Pavel Selivanov (biohumanoid)Labels:
Date Opened:2008-07-31 00:11:27Date Closed:2019-06-10 07:13:49
Priority:MajorRegression?No
Status:Closed/CompleteComponents:dahdi (the module)
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 0001-dahdi-wip-Fix-dynamic_loc-span-operation-with-DAHDI-.patch
( 1) 2013.01.17-dahdi-base.c.patch
( 2) dahdi_dummy.c_syncprio.patch
( 3) dahdi_dummy.c.patch
( 4) dahdi_dynamic.c_syncprio.patch
( 5) dahdi-base.c_syncprio.patch
( 6) dahdi-base.c.patch
( 7) dahdi-base.c.patch
( 8) zaptel-base.c.patch
( 9) ztdummy.c.patch
Description:Current dahdi master handling work in the following way:
The first SPAN become a master
If a new registered !!! SPAN is preferred , it is the new master
If a span is unregistered - the new master is the first available span
If master SPAN is in alarm state - the new master is the first non in alarm state SPAN

So
1. dahdi-dummy once loaded will be the master forever.
2. even not-running SPAN can be a master
3. master implementation is ugly...

****** ADDITIONAL INFORMATION ******

Implemented: correct(symmetric) master arbitrage

1. trying preffered master SPAN
2. trying non-zero channel SPANS
3. trying zero-channel SPANS (ztdummy,...)

thus:
1. dahdi-dynamic CAN be (and it works perfect) a master
2. at any time, we can tell/expect - who is the master
3. If (and only if) we have no master - dahdi-dummy will be our master
Comments:By: Pavel Selivanov (biohumanoid) 2008-07-31 00:12:47

The patched attached are for zaptel-1.4.11.
If you feel you want to apply them to dahdi, I'll make them for dahdi.

By: Kevin P. Fleming (kpfleming) 2008-07-31 11:23:07

We will not be adding any more features to Zaptel at this point, so yes, your patches must be made against dahdi-linux.

The patches look reasonably good, but you'll need to follow our coding guidelines and the existing style of the code. If you can upload patches that meet our coding guidelines, I'll get them merged. Thanks!

By: Pavel Selivanov (biohumanoid) 2008-09-23 00:42:48

I've uploaded a patches for DAHDI, please remove old one (zaptel-).
Briefly:
1. trying preffered master SPAN
2. trying non-zero channel SPANS
3. trying zero-channel SPANS (dahdi_dummy,...)

By: Pavel Selivanov (biohumanoid) 2008-10-15 00:37:21

Are this patches going to be tested/included in trunk ?
please, remove zaptel-base.c.patch and ztdummy.c.patch files.

By: Michael Haigh (madrouter) 2008-12-04 10:03:40.000-0600

These patches seem to only half fix the problem.  There's still no definitive order in which spans will be used as the master.  This is a problem if the prefmaster (which is a silly concept) goes down.  DAHDI will then simply use the first non-zero channel span as it's master which might be something you definitely don't want.

Please see attached patches (that I happened to be working on at the same time) for something that works better (IMHO).  With these patches applied the "timing" field in the span definitions in system.conf actually does what the documentation says.

If you have a span with timing set to 0 it will NEVER become the master (implies you can have a situation with no master - and hence no DAHDI ticks - but this is a misconfiguration rather than a bug).  If you have a span with timing set to 1 it will be master in preference to a span with timing of 2 etc.  If a span is in alarm it will not be considered when deciding who is master.

If the current master goes into alarm DAHDI simply picks the non-alarmed span with the next highest priority and makes it master.  Importantly - if a "better" master span comes OUT of alarm (or gets registered if it's a dynamic span) DAHDI will change to using the better span as master.

dahdi_dummy is hard configured with a sync priority of 100 (arbitrary value), so it can be loaded and will not become master unless ALL other spans configured as possible masters are in alarm.

By: Michael Haigh (madrouter) 2008-12-04 10:09:59.000-0600

Note that this patch also potentially affects bug 0013954.

By: Michael Haigh (madrouter) 2008-12-08 11:20:37.000-0600

Sorry guys - disclaimer just got approved so you might actually be able to look at the patches now :)

By: Pavel Selivanov (biohumanoid) 2008-12-25 01:16:26.000-0600

Yes, I aggree that prefmaster is a silly concept.
I aggree that dahdi-base (and my patch) ignore timing field (and that's sucks).

I've implemented a work-around with as small patch as possible to make it working. Not more.
That was done for 2 reasons:
1. I need a working dahdy_dynamic (in current release it's not usable).
2. Even this small patch not included/tested.
So I'll have to make and maintain my own bundle (as I made with zaptel).

But I really hope, It will be fixed :-) sometime...

By: Tzafrir Cohen (tzafrir) 2009-02-08 06:40:56.000-0600

Suppose we want to have a single select_sync_master() function[1]. How could DAHDI choose better?

Here is the general order I'd like:

1. Digital spans synched from an external source
 * Order among them: choose manually?
 * PRefer PRI to BRI?
2. Other spans from a hardware device (that provides clock)
 * Order among them: manual.
3. dahdi_dummy
4. Spans that provide no timing at all.

Cases of spans that provide no timing:
* Dynamic (dahdi_dynamic) spans that are timing providers (to the remote side).
* Astribank (xpp) span in a NOTOPEN alarm.

At the moment DAHDI has no way of getting manual preference order from the user. The "timing" parameter of the span line seems useful, but has a few problems:

* It already overloads two separate selections:
 1. Is the timing on this span set from here or from the remote side?
 2. What is the priority among different spans in the current device?
For instance, I'd like to give a dynamic span that is a timing provider a different priority than such span from a card. But both get timing=0.

* DAHDI does not know its value. THe real value is never saved anywhere. We could save the value after a successful call to the span_config method of the span. But we can't be sure if the span didn't change it (see wct4xxp).


[1] 'master' on its own has other meanings in the code, which is why I prefer 'sync_master'

By: Pavel Selivanov (biohumanoid) 2009-02-10 00:06:40.000-0600

I'm happy you've asked it :-)

As far as I know, in most drivers syncsrc is used as a span priority in an adapter. The lowest non-alarm span is used as a sync source for the whole adapter.
So, "timing" field can be(and should be) used for dahdi arbitrage.

I'd like this simplified order:
1. a non-alarm OPEN SPAN with a lower priority (except DUMMY)
2. DUMMY span
dahdi_dynamic should use syncsrc field (removing checkmaster function, ...).
Now it's possible as long as dahdi_span have sync_tick callback.
If a SPAN have syncsrc !=0, and dahdi's master is other SPAN, then driver must use (if can) sync_tick callback to provide timing for a SPAN.

A quick example:
#board 1
span=1,1,0,ccs,hdb3,crc4
span=2,2,0,ccs,hdb3,crc4
#board 2
span=3,3,0,ccs,hdb3,crc4
dynamic=eth,eth0/00:02:b3:35:43:9c,24,4

All SPAN's have good line.
SPAN 1 is a master.
SPAN 2 using timing from line 1.
SPAN 3 isn't a master for dahdi, so, using sync_tick.
SPAN 4 using sync_tick.

SPAN 1 & SPAN 2 changed to RED ALARM.
dahdi changing it's master to SPAN 3.
SPAN 3 switching to use timing from line.
SPAN 4 using sync_tick.

and so on.
If we want SPAN 3 & SPAN 4 to be timing providers - we can change it's priority to zero.

I think it's a good to give users a choice.
DAHDI should't not differ SPAN, dSPAN, PRI, BRI,...

By: Shaun Ruffell (sruffell) 2009-03-05 14:03:17.000-0600

biohumanoid, I'm confused by the sync_tick callback idea that you have here.   What do you envision would happen in the driver when it's sync_tick callback is called?



By: Michael Haigh (madrouter) 2009-03-06 03:37:17.000-0600

I think the only driver that actually supports the sync_tick callback at the moment is the XPP driver.  Essentially it provides a mechanism to synchronise a device from the DAHDI clock.

For example - I have a live deployment (using my version of the patches, plus biohumanoid's patches from bug 0012406) where I have a box with Sangoma spans physically connected to the carrier.  It also has a number of dahdi_dynamic_eth spans which connect to a variety of remote boxes.  The remote boxes all have Xorcom XR0056s hanging off them which then attach to local PABXs.

Clock configuration is that on the box connected to the carrier the carrier provides clock to DAHDI (which then naturally clocks the dynamic spans).  On the remote boxes I have the dynamic spans configured to provide clock to DAHDI and then have /proc/xpp/sync set to DAHDI which tells the Xorcom to get it's clock from DAHDI.  The remote PABXs are configured to take clock from the Xorcom.

All that was done to make faxes work (and they do work perfectly) without having to muck around with T.38 gateways and other such horrible things.

By: Pavel Selivanov (biohumanoid) 2009-03-06 04:06:16.000-0600

A brief history:
There was no sync_tick callback at all.

Once, someone made dynamic SPAN to inter-connect 2 PC's (zaptels).
But one of 2 dynamic SPAN's (2 PC's) MUST send the first packet to initiate a responce. Moreover, one of 2 PC's should be a master (provide timing).
So, he made:
if (span == master) {
/* If we have dynamic stuff, call the ioctl with 0,0 parameters
                  make it run */
               if (dahdi_dynamic_ioctl)
                       dahdi_dynamic_ioctl(0,0);

By: Pavel Selivanov (biohumanoid) 2009-03-06 04:19:27.000-0600

A brief history (an addition to madrouter):
There was no sync_tick callback at all.

Once, someone made dynamic SPAN to inter-connect 2 PC's (zaptels).
But one of 2 dynamic SPAN's (2 PC's) MUST send the first packet to initiate a talk. Moreover, one of 2 PC's should be a master (provide timing).
So, he made an ugly hack:
if (span == master) {
 /* If we have dynamic stuff, call the ioctl with 0,0 parameters
 make it run */
 if (dahdi_dynamic_ioctl)
   dahdi_dynamic_ioctl(0,0);
}

DAHDI(Xorcom) have added better code:
               for (x=0;x<maxspans;x++) {
                       struct dahdi_span       *s = spans[x];
                       
                       if (s && s->sync_tick)
                               s->sync_tick(s, s == master);
               }

So:
1. No more need to dahdi_dynamic_ioctl(0,0)
2. dahdi_dynamic can now work as an ordinary SPAN.
No need to have own "timing" parameter.
arbitrage can now be removed in dahdi_dynamic, but it MUST be implemented in dahdi-base.c...

SLIP/SKIP is bad for Fax, dacs=, CCS signaling...

By: Shaun Ruffell (sruffell) 2009-03-06 13:25:16.000-0600

madrouter, biohumanoid: Thanks for that.  I haven't yet done anything with, or thought about, dynamic spans so I do see how the sync_tick can be used there.  I still need to read through the xpp drivers to see how it makes use of the sync_tick.

The drivers that I work on can not use any host software based mechanism to control the rate at which samples are shifted out on the interface.  So, for me, the master span was only used to determine when conferencing took place, etc..

I'm looking at this issue because I'm thinking about moving the functionality of dahdi_dummy directly into dahdi-base.c, and then allow dahdi to to determine if it should use internal timing or timing from an attached span like is being discussed here.  That would be one less thing for new users to understand and hopefully reduce support related to questions about timing.



By: Shaun Ruffell (sruffell) 2009-03-06 13:48:09.000-0600

I'm also wondering out loud, with the cards that do not have such stringent requirements for interrupt to isr latency, whether the sync_tick method could be used to allow a board to mask off it's interrupt and be polled along with the other hardware.  I know there have been some discussion about this subject that predates my involvement with the project.

But, for example, if a wcte12xp or wctdm24xxp supported card is not the master span, it could implement the sync_tick function and then disable interrupts on it's interface and then poll in the sync_tick function.

By: Michael Haigh (madrouter) 2009-03-06 15:52:00.000-0600

I believe the way the XPP driver works is to have an adjustable oscillator in the hardware.  The software keeps track of when the DAHDI clock ticks (via the sync_tick callback) and when the hardware ticks and compares the rate of the two.  If the hardware is ticking faster than DAHDI it commands the hardware to slow down and if it's ticking faster it commands the hardware to slow down.

The problem with just masking off the interrupt for cards that aren't master is that almost certainly their local oscillators will be running at a different speed to the DAHDI clock, so you're going to get slips/skips.  If the current hardware doesn't have an adjustable oscillator then there's no way of really dealing with the problem.

With regard to putting dahdi_dummy into dahdi-base, be careful that you don't screw things up for people who know what they're doing for the sake of newbies who can't RTFM.  An example of where letting dahdi figure things out "automatically" is bad would be a test lab like the one I have set up.  Two machines are configured identically with only dahdi_dynamic_eth spans.  However - one of the machines needs to actually provide a real clock, so I load dahdi_dummy and clock the first machine from that.  If dahdi tries to be clever and figure things out the behaviour would be undefined.

By: Michael Haigh (madrouter) 2009-03-06 16:03:35.000-0600

Another thought about solving the support issue.  Why not let Asterisk ask for the dahdi_dynamic module to load?  If Asterisk starts and figures out that there's no dahdi spans at all (can't open /dev/dahdi/pseudo??) then it tries to load dahdi_dummy and then tries to open /dev/dahdi/pseudo again.  If there's still no /dev/dahdi/pseudo (maybe the user didn't install dahdi) then it throws warnings about not having a timing source.

By: Shaun Ruffell (sruffell) 2009-03-06 16:10:48.000-0600

Just so I understand, how would an automatic timing source in dahdi-base cause problems with your test environment?  If on the machine that is receiving timing from the dynamic span suffers a loss of network connection, you would not want dahdi to be able to generate it's own timing internally in order to continue to process any meetme conferences that may be running on pseudo channels?

As long as that dynamic span doesn't try and start to provide timing to the upstream device, what would break?  And if a dynamic_span is configured to take timing from the network, it doesn't seem like it should ever then switch to provide timing (just like ISDN spans, etc..)

By: Shaun Ruffell (sruffell) 2009-03-06 16:33:07.000-0600

Why might it be better to have asterisk load dahdi_dummy?  If the logic is such that dahdi should have an optional timing source at all times, then it doesn't seem like getting asterisk involved gets us anything, other than a little bit less code loaded up that may be unused much of the time.

Also, one of the cases that kpfleming brought up is this:  Someone has a single span connected to a provider and are conferencing a group of sip channels together in meetme.  If the span goes down, the conference will become intelligible since there is not timing source to use to conference the channels together even though none of the participants are using the span in this case.  If asterisk is responsible for loading dahdi_dummy, then the window of time to switch over would be greater and it involves more components in the process of getting the alternate timing up or supporting kernels in the future that do not want to use loadable modules but instead want everything compiled in by default.



By: Michael Haigh (madrouter) 2009-03-07 02:26:48.000-0600

Having thought about it a bit deeper you're right - probably nothing would break.  On the machine that I want to be the master clock source under normal running the dynamic spans would be configured not to be a timing source so it would use dahdi_dummy.  On the machine that is meant to be receiving clock from the dynamic spans, the dynamic spans would be configured with non-zero timing priority so if they weren't in alarm it would use those but if it lost the network it would switch to it's local dahdi_dummy.

Ok - I obviously didn't think about it properly before thinking it could break something :(

By: Shaun Ruffell (sruffell) 2009-03-09 09:41:05

tzafrir:  Is there any reason that you can think of to not to use the simplified order that biohumanoid proposes for determining which span is going to be the master?

By: Kaloyan Kovachev (knk) 2009-03-09 10:01:11

I am testing the patch provided from madrouter and it works fine (after patching the wct4xxp driver too). I have actually changed it a bit:

      /* Good a time as any to check our clock config and
          to switch to other master if current master in alarm */
      master = NULL;
      for (x=1; x<maxspans; x++) {
              if (spans[x] && !spans[x]->alarms) {
                      if (!master || (spans[x]->syncprio > 0 && (spans[x]->syncprio < master->syncprio))) {
                              if (spans[x]->syncprio > 0) {
                                      module_printk(KERN_NOTICE, "Span ('%s') is new master with syncprio %d which is better than %d\n", spans[x]->name, spans[x]->syncprio, (master) ? master->syncprio : 0);
                              }
                              master = spans[x];
                      }
              }
      }

so the first span will become a master if none of them have priority set and dummy is not loaded. And in case it is loaded it will be used as last resort (because of the hardcoded prio of 100), so it is perfectly safe to leave it loaded.
About the order preference for PRI/BRI/analog etc. may be some weight/offset hardcoded in the driver will do the trick in addition to the config file clock value

By: Tzafrir Cohen (tzafrir) 2009-03-09 10:28:49

I'm not sure I follow. Simply use the "timing" value in the span line?

* What about analog cards that have no "timing" defined?
* How do you deliver the information that certain spans should have a priority that is lower than a "local device" (that is: they can provide no timing whatsoever)?

By: Michael Haigh (madrouter) 2009-03-09 10:51:10

Using the timing value from the span line was exactly what I had in mind when I wrote the patch.  This gives the behaviour as documented in the sample system.conf file.  If you want a span never to be a clock source just set it's timing value to 0.

Not sure I see the need for any weighting of PRI/BRI/Analog cards as sync source.  If a user has more than one card, chances are they have some idea what they're doing (as opposed to someone who's just experimenting with a couple of SIP phones and maybe an analog trunk or two) and so should be setting their clocking.

The other thing is - if they've got two different cards in the system they're already in trouble unless they're using Xorcoms because unless you've got some ugly solution like a sync cable there's no way the two cards are going to be in sync.  I don't think any of the other hardware manufacturers have any way of letting DAHDI clock the card rather than the other way around.

By: Kaloyan Kovachev (knk) 2009-03-09 10:53:55

yes <timing source> value in system.conf ... for wct4xpp i have added "span->syncprio = lc->sync;" in t4_spanconfig only

For analog cards you have that set to 0 and they will not be master if there are other cards or dummy and in case you do not have any of them you will still have no timing.

With weight 0 for PRI, weight 1 for BRI and weight 5 for eth: if they are all (+ dummy) present, up and set to timing=1 you get 1 for PRI, 11 for BRI and 51 for eth - that's the order we want and if none of them is up dummy comes to play with prio 100

By: Michael Haigh (madrouter) 2009-03-09 10:59:32

I really don't like this whole weighting thing.  It makes assumptions about how the user's network is built that may not be valid.  Going back to some live deployments I have - I have both dynamic ethernet spans and PRI spans.  I WANT the dynamic spans to be the master and to have the PRIs clocked by Dahdi (and have had to replace all my Digium/Sangoma cards with Xorcom kit as a result).  With the weighting I would have to artificially make my PRI spans less desirable as a clock source, which would result in a VERY confusing system.conf!!

By: Pavel Selivanov (biohumanoid) 2009-03-11 01:50:50

> about moving the functionality of dahdi_dummy directly into dahdi-base.c
Please, don't do it.
Reasons:
1. You can load dahdy_dummy in /etc/init.d/dahdi
2. You can load dummy from dahdi.ko with "request_module"
3. You can load dummy with udev rules
Don't reinvent the wheel.

Some systems don't need dummy at all.
Quick example: blackfin based devices (DSP) with limited MIPS...

By: Pavel Selivanov (biohumanoid) 2009-03-11 02:14:39

tzafrir> * What about analog cards that have no "timing" defined?
They do have an internal oscillator, so they can be used as a timing source.

tzafrir>* How do you deliver the information that certain spans should have a priority that is lower than a "local device" (that is: they can provide no timing whatsoever)?
Any device driver is interrupt driven. Interrupts are produced using internal oscillator (Analog) or from external oscillator{from line} (PRI/BRI/...).

KNR>I am testing the patch provided from madrouter and it works fine
Yes, syncprio must be used for choosing a master for DAHDI.
But you have to fix all DAHDI-based device driver (because only god knows how they are using syncprio :-) )

KNK> With weight 0 for PRI, weight 1 for BRI and weight 5 for eth
wow, wow....
1. You should do priority suffix in asterisk gui, freepbx, any script.
2. The users are not always dump. They want flexibility, but you are going to fix it in the kernel...

madrouter>I don't think any of the other hardware manufacturers have any way of letting DAHDI clock the card rather than the other way around.
Our hardware do (one of them http://www.parabel-labs.com/products/asteroid/ ).
Our hardware can tune an internal oscillator using timing from dahdi. So no slips/skips...

That's why I'm interrested to have correct master selection.
DAHDI can not even tell the user, that there are slips/skips !!! No slips/skips counters exported at all !!!

By: Kaloyan Kovachev (knk) 2009-03-11 03:01:36

Well i don't think weights are necessary (even the opposite - better not to have them) and simply making the code do what the docs (comments inside system.conf) say is the right think.

biohumanoid: Yes all dahdi drivers should be updated to set syncprio, but without this there is no guarantee which span will be master.
For example one may have two PRI boards - to PSTN and to PBX, but if the one to PBX is registered first it will be chosen as master, while you must sync to the PSTN in this case. The same will happen if PSTN is just on the second port of the same card

By: Shaun Ruffell (sruffell) 2009-03-11 10:56:11

biohumanoid: I start with the assumption that we would like dahdi to always be able to provide a source of timing for any conferences that are in progress in the kernel, even if spans go into alarm.  If this is not the case, then I guess the discussion is moot.

biohumanoid> 1. You can load dahdy_dummy in /etc/init.d/dahdi

What is the best way for /etc/init.d/dahdi to determine when it should load dahdi_dummy?  Currently the /proc/dahdi/ directory is checked for a present span, but this can cause problems on reload if a user has a reference open to /proc/dahdi (like it's their current directory) which would prevent new files in that directory from showing up on reload.  Then dahdi_dummy is reloaded anyway, registers a span that messes with the span configuration.  Also doesn't allow for dahdi_dummy to be loaded dynamically in response to alarm conditions.

biohumanoid> 2. You can load dummy from dahdi.ko with "request_module"

But if the module is loaded when the span goes into alarm, it seems like it might lead to some tricky code to either determine when the system is being started up (when their are often alarms as the framers are being configured), or that dahdi_dummy.ko would just ended up being loaded by request_module most of the time.

biohumanoid> 3. You can load dummy with udev rules

How could one use udev to load dahdi_dummy dynmically in response to an alarm, unless there is a sysfs attribute used on spans to indicate when it goes into alarm that could be monitored by udev?

If the functionality of dahdi_dummy is moved into the dahdi-base.c, it wouldn't need to "tick" or consume CPU cycles unless it is actually the source of timing, but it increase slightly the size of the dahdi.ko file....but not by much.



By: Shaun Ruffell (sruffell) 2009-03-11 11:06:00

biohumanoid, also, analog cards can be, and are a source of timing, but I think the issue with fitting them into this model is that they don't have timing configured relative to the other spans in the configuration files currently without adding new parameters.

Using your system and the current configs, how would you configure dahdi to prefer an analog card over the internal oscillator on a span that is configured to provide timing to its far side?

By: Tzafrir Cohen (tzafrir) 2009-03-11 11:17:56

One small thing to consider: dahdi_dummy today creates a span. Simply adding it as a built-in dummy span would probably break existing configurations.

By: Shaun Ruffell (sruffell) 2009-03-11 11:23:07

tzafrir, that is a good point. My thought was  that the "dummy span" would be special and have all user-visibility removed.  It just becomes an implementation detail of a dahdi that knows how to always find a timing source.



By: Kaloyan Kovachev (knk) 2009-03-11 11:30:33

back to weights sorry ... but instead of weight better name is hwprio ... in case no preferred sync master is found, which is not in alarm and is not dummy, re-walk the non alarm spans for built-in hwprio.

and quick and dirty one ... sync-clock-order line at the end of dahdi/system.conf with coma separated list of span numbers configured previously, which stores the info in dedicated list inside dahdi-base which is walked in order for non alarm state and starts it's own 'ticks' only if no master was found and stops them before selecting a new master the next time

By: Shaun Ruffell (sruffell) 2010-11-06 15:58:18

Just making a note that I find myself re-reading this discussion.  Yesterday I realized that DAHDI_CONFIG_CORE_TIMER doesn't work with "spans that provide no timing at all" from tzafrir's comment [ https://issues.asterisk.org/view.php?id=13205#99672 ].

I was trying to demonstrate to someone how to use dynamic_loc spans to simulate a PRI link on a local machine and it wasn't working since one of the dynamic spans became the master.  The call chain was 'coretimer_func()' ->  'process_masterspan()' -> 'dahdi_dynamic_ioctl(0,0)' -> '__ztdynamic_run()' -> 'dahdi_receive()' -> 'process_masterspan()' -> XXX

So if DAHDI should be able to do masterspan processing out of the context of a span, I think the best way is to (finally) close out this issue and among other things implement a way to mark a span as never providing timing.

If anyone has any new thoughts please add them....  otherwise I'll update some of these patches, post some candidates.



By: Shaun Ruffell (sruffell) 2010-11-06 16:19:22

For the casually interested, 0001-dahdi-wip-Fix-dynamic_loc-span-operation-with-DAHDI-.patch is how I temporarily prevented recursive calls to process_masterspan, but it would be moot if the local spans never became the master span to begin with.

By: Digium Subversion (svnbot) 2011-01-03 12:25:47.000-0600

Repository: dahdi
Revision: 9576

U   linux/trunk/drivers/dahdi/dahdi-base.c
U   linux/trunk/drivers/dahdi/dahdi_dynamic.c

------------------------------------------------------------------------
r9576 | sruffell | 2011-01-03 12:25:46 -0600 (Mon, 03 Jan 2011) | 16 lines

dahdi_dynamic: Use dahdi_span_ops.sync_tick for driving dynamic spans.

Dynamic spans that are unable to provide their own timing, like dahdi
local spans, typically derived their timing source from
dahdi_dynamic_ioctl(0,0) call in process_masterspan.

This change uses the sync_tick member of dahdi_span_ops instead so that
dynamic operations do not happen on a span until it is fully registered.
Also removes the check for dahdi_dynamic_ioctl in process masterspan for
those users that never load a dynamic span.

This was originally suggested in a comment on:
(issue DAHLIN-25)

Signed-off-by: Shaun Ruffell <sruffell@digium.com>
Acked-by: Kinsey Moore <kmoore@digium.com>
------------------------------------------------------------------------

http://svn.digium.com/view/dahdi?view=rev&revision=9576

By: Digium Subversion (svnbot) 2011-01-03 12:26:10.000-0600

Repository: dahdi
Revision: 9581

U   linux/trunk/drivers/dahdi/dahdi-base.c
U   linux/trunk/drivers/dahdi/dahdi_dynamic_loc.c
U   linux/trunk/include/dahdi/kernel.h

------------------------------------------------------------------------
r9581 | sruffell | 2011-01-03 12:26:09 -0600 (Mon, 03 Jan 2011) | 18 lines

dahdi: Allow spans to advertise if they can provide any timing.

Some spans, specifically dynamic local spans, should never be the timing
master since they are dependent on some other timing source driving
them.

The bit in 'struct dahdi_span' is named cannot provide timing so that by
default the other drivers will set it to 0.

This is loosely related to issue DAHLIN-25 but doesn't address any of the
other elements of that issue about how to allow the user to configure
what the master span order of succession is.

(issue DAHLIN-25)
Reported by: biohumanoid

Signed-off-by: Shaun Ruffell <sruffell@digium.com>
Acked-by: Kinsey Moore <kmoore@digium.com>
------------------------------------------------------------------------

http://svn.digium.com/view/dahdi?view=rev&revision=9581

By: Digium Subversion (svnbot) 2011-02-02 13:28:42.000-0600

Repository: dahdi
Revision: 9729

U   linux/trunk/drivers/dahdi/dahdi-base.c

------------------------------------------------------------------------
r9729 | sruffell | 2011-02-02 13:28:41 -0600 (Wed, 02 Feb 2011) | 16 lines

dahdi: Analog spans do not have to be marked RUNNING to become master.

Fixes a regression introduced in r9611. Analog spans would never become
the master since the SPANSTART ioctl is not typically called on them,
and therefore they were never marked RUNNING.  The result could be audio
problems.

I'm marking this as related to issue 13205 since it's generally all
related to how should the drivers select which span is the master.  I
also mark as related to issue 16165 because the problems experienced are
a result of the same fundamental issue.

(issue DAHLIN-25)(issue DAHLIN-159)

Reported-and-Tested-by: Tzafrir Cohen <tzafrir.cohen@xorcom.com>
Signed-off-by: Shaun Ruffell <sruffell@digium.com>
------------------------------------------------------------------------

http://svn.digium.com/view/dahdi?view=rev&revision=9729

By: Pavel Selivanov (biohumanoid) 2011-02-15 22:59:19.000-0600

Sorry gentlemen I've left this discussion for a time.

Glad to see you've started patching master selection code.
Even this simple patch will make life easier, and will be a good starting point.

I have some things to ask/note about (__dahdi_find_master_span):
1. Why don't you check span->alarms for alarms ?
Not sure which condition is better,
!span->alarms
or
!(span->alarms & (DAHDI_ALARM_RED | DAHDI_ALARM_BLUE | DAHDI_ALARM_LOOPBACK))

2.Why do you specially care about analog cards ?
The do have an oscillator, and it's customers choice to use it or not to use it as a timing source for DAHDI.

3. Why don't you try to avoid dahdi_dummy as a sync master ?
!span->channels
You still have dahdi_dummy module, and it's good (I'll write a bit later)

4. can_provide_timing – what for ?
dahdy_dynamic_loc &= ~DAHDI_FLAG_RUNNING
and it will not be the master...

5. pref_master not used

I'll tell you my vision on synchronization in a few next posts, based on my experience/troubles.
I don't insist I'm right, no way.



By: Pavel Selivanov (biohumanoid) 2011-02-15 22:59:32.000-0600

What is the best master for TDM systems ?
SPAN(analog/digital), DynamicSPAN(analog/digital), PC timer ?
It highly depends on customers configuration, but it should not be hardcoded no way.
The best master is the expected master (by the custemer), because only he/she know, what he/she want.

Samples:
2 DynamicSPAN devices. 1-st provide timing for DAHDI, DAHDI provide timing for 2-nd device.
Device #2 will tune it's oscillator for the incoming ethernet packets.
That's the way my devices can work, XORCOM devices can work, someone else can work.

1 SPAN device, 1 DynamicSPAN device.
SPAN device provide timing for DAHDI, DAHDI provide timing for Dynamic – DynamicSPAN device.

dahdi_dummy, DynamicSPAN.
Dummy provide timing (hrtimer) for DAHDI, DAHDI provide timing for Dynamic – DynamicSPAN device.

It's a samples from a real life, that's why I have to maintain my patched release by release...

By: Pavel Selivanov (biohumanoid) 2011-02-15 22:59:48.000-0600

Patching DAHDI: STEP1
1.Using SPAN/DPSAN
2.Using Dummy SPAN

Good SPAN:
1.no alarm
2.running
We will have some limits, but we can live with it.
If we want adapter2:span1 to be the master – we have to load it's module first.
It's not perfect, but it's a way (right now, in 2.4.0 you don't have even this....)

Patching DAHDI: STEP2
1.dahdi-base.c: saving dahdi_lineconfig->sync to span->sync
the same for dahdi_dynamic
2.taking into account span->sync priority

Thus, user can choose, what will be the master as flexible, as possible.
I't really major In the world of TDM...

I've looked, how  dahdi_lineconfig->sync is used in current drivers – it should not be a problem for them (they don't care the absolute values, they are searching for the lower).

Sample:
2 cards * 2 SPANS.
span=1,1,0,ccs,hdb3,crc4
span=2,0,0,ccs,hdb3,crc4
span=3,2,0,ccs,hdb3,crc4
span=4,0,0,ccs,hdb3,crc4
span 1 will be the master.
If SPAN1 will fail – SPAN2 will be the master.

By: Pavel Selivanov (biohumanoid) 2011-02-15 23:06:06.000-0600

If you feel my suggestions is good - let me know, I'll produce a patch and will test it on a hardware before posting.
I't prefer to implement STEP1, then STEP2.

STEP2 will also require to patch dahdi_dynamic (remove checkmaster function, as long as it's going to be implemented in dahdi-base in a good way).
Then there will be only one checkmaster in the whole DAHDI driver, as it must be in TDM system.

I don't want to make produce patches if you don't like the idea :-)

By: Shaun Ruffell (sruffell) 2011-02-16 02:07:39.000-0600

biohumanoid: I'm happy that you're commenting on this issue again!

You bring up some good points, but instead of addressing them (specifically about the details in the code) perhaps we should just focus only on flow charting how the master span and timing sync for the different potential clock domains should be selected?  There's the global master, the sync source on multiport cards, and cards that can potentially share timing via cables.  Am I missing any others?

Did you happen to see this recent thread: http://thread.gmane.org/gmane.comp.telephony.pbx.asterisk.devel/45223/focus=45233

And I hate to say this, but given all the different thoughts, perhaps this should be taken to the asterisk-dev list in general so we can reply inline to the different threads?



By: Pavel Selivanov (biohumanoid) 2011-02-16 05:16:21.000-0600

sruffell: You bring up some good points, but instead of addressing them (specifically about the details in the code) perhaps we should just focus only on flow charting how the master span and timing sync for the different potential clock domains should be selected?

reasonable.

sruffell: There's the global master, the sync source on multiport cards, and cards that can potentially share timing via cables. Am I missing any others?

Even more simple:
1. Global master - Master for the whole DAHDI, and, should be the master for all cards/spans.
2. SPANS which can provide timing.
One of this SPAN's must became a sync "Global master".
3. SPANS which can't provide timing.

I think, we shouldn't differ cards with a timing cable, cards without timing cable, dynamic span devices.
Any 2 cards can have "external timing cable" - we can connect card1:line4 with card2:line1, and
span=1,1,0, span=2,0,0, span=3,0,0, span=4,0,0
span=5,2,0, span=6,0,0, span=7,0,0, span=8,0,0
So, SPAN1 will receive sync from TELCO, SPAN5 will receive sync from SPAN4(from TELCO).
Card1 & card2 will be in sync, but DAHDI will know nothing about it...
But the user will.

Lets examine sync source on multiport cards.
span=1,1,0, span=2,0,0, span=3,0,0, span=4,0,0 #card1
span=5,2,0, span=6,0,0, span=7,0,0, span=8,0,0 #card2
card1 will use port1 (SPAN1) a sync source for the whole card. It's good.
card2 will use port1 (SPAN5) a sync source for the while card.
SPAN1 will be the "Global Master", and all cards will be in sync with it.

By: Pavel Selivanov (biohumanoid) 2011-02-16 05:22:30.000-0600

sruffell: And I hate to say this, but given all the different thoughts, perhaps this should be taken to the asterisk-dev list in general so we can reply inline to the different threads?

We can summarize and post it, but I don't like mailing lists.
A lot's of SPAM and a lot's of mails, but I will do it if it will help.

http://www.asterisk.org/support/mailing-lists
There are no dahdi mailing list, but asterisk-dev is not for DAHDI.
Isn't it ?

By: Tzafrir Cohen (tzafrir) 2011-02-16 05:51:25.000-0600

The asterisk-dev mailing list is also the forum for the development of DAHDI. Looking forward for your post there.

By: Pavel Selivanov (biohumanoid) 2011-03-10 00:07:49.000-0600

I've posted my proposal to asterisk mailing list
http://lists.digium.com/pipermail/asterisk-dev/2011-February/048102.html
Seems no one interested it...



By: Shaun Ruffell (sruffell) 2011-03-10 10:48:46.000-0600

Sorry about the delay, but both myself and others that I've spoken with are quite interested. I'll reply on the mailing list today.

By: Pavel Selivanov (biohumanoid) 2013-01-16 06:10:42.857-0600

There are still some troubles with locating true MASTER in DAHDI 2.6.1.
1. ::master not initialized (::master = NULL).
2. even span with alarm can become a master, if old_master == NULL.
3. Nobody call __dahdi_find_master_span in _dahdi_unassign_span.
_dahdi_unassign_span have own find_master loop. What for ?
4. dynamic spans still can not become a master span.
WHY ?
5. dummy span can not become a master span (it's good to have a backup span with good 1ms ticks, provided by HPET).
WHY ?

Provided patches can be used to fix this annoing bugs.
It's done to have logic, compatible with current DAHDI logic (it will not break current installations).
To make it clear, I've wrote a number of possible cases (possible installations).

Terms:
dummy - dummy SPAN. SPAN, provided by dahdi_dummy. Have 0 channels.
card0.0 - fixed SPAN, card #0, port 0, span=...
dynamic0 - dynamic SPAN, dynamic=...
master - master SPAN.

== case 0
dummy
card0.0 - clock source (sync from E1)
card0.1 - backup clock source  (sync from the same E1)
dynamic0 - clock slave (sync from sync_tick, so, from E1)
dynamic1 - clock slave (sync from sync_tick, so, from E1)

- card0.0 = OK
 master = card0.0
- card0.0 = RED
 master = card0.1
- card0.0 = card0.1 = RED
 master = dummy (slave dynamic spans have cannot_provide_timing=1).

== case 1
card0.0 - clock source (sync from E1)
card0.1 - backup clock source  (sync from the same E1)
dynamic.0 - clock slave (sync from sync_tick, so, from E1)
dynamic.1 - clock slave (sync from sync_tick, so, from E1)

- card0.0 = OK
 master = card0.0
- card0.0 = RED
 master = card0.1
- card0.0 = card0.1 = RED
 master = NULL (slave dynamic spans have cannot_provide_timing=1).
 _process_masterspan will be called by the timer 4 times every 4ms (true for x86. actually HZ/250)...



== case 2
dummy
dynamic0 - clock source (sync from E1)
dynamic1 - backup clock source (sync from the same E1)

- dynamic0 = OK
 master = dynamic0 (master dynamic spans have cannot_provide_timing=0).
- dynamic0 = RED
 master = dynamic1 (slave dynamic spans have cannot_provide_timing=1).
- dynamic0 = dynamic1 = RED
 master = dummy

== case 3
dynamic0 - clock source (sync from E1)
dynamic1 - backup clock source (sync from the same E1)

- dynamic0 = OK
 master = dynamic0 (master dynamic spans have cannot_provide_timing=0).
- dynamic0 = RED
 master = dynamic1 (slave dynamic spans have cannot_provide_timing=1).
- dynamic0 = dynamic1 = RED
 master = NULL
 _process_masterspan will be called by the timer 4 times every 4ms (true for x86. actually HZ/250)...



== case 4
dummy
dynamic0 - clock source (sync from E1)
dynamic1 - clock slave (sync from dynamic master SPAN)

- dynamic0 = OK
 master = dynamic0 (master dynamic spans have cannot_provide_timing=0).
- dynamic0 = RED
 master = dummy (slave dynamic spans have cannot_provide_timing=1).

== case 5
dynamic0 - clock source (sync from E1)
dynamic1 - clock slave (sync from dynamic master SPAN)

- dynamic0 = OK
 master = dynamic0 (master dynamic spans have cannot_provide_timing=0).
- dynamic0 = RED
 master = NULL (slave dynamic spans have cannot_provide_timing=1).
 _process_masterspan will be called by the timer 4 times every 4ms (true for x86. actually HZ/250)...



= case 6
dummy
dynamic0 - clock slave (sync from DAHDI, via sync_tick)
dynamic1 - clock slave (sync from DAHDI, via sync_tick)

 master = dummy

= case 7
dynamic0 - clock slave
dynamic1 - clock slave

 master = NULL (slave dynamic spans have cannot_provide_timing=1).
 _process_masterspan will be called by the timer 4 times every 4ms (true for x86. actually HZ/250)...


= case 8
card0.0 - clock source (E1 slave)
card1.0 - clock slave (card 1 should take sync tick from DAHDI, so, from card0).

- card0.0 = OK
 master = card0.0
- card0.0 = RED
 master = card1.0 (if card will not set cannot_provide_timing)
 master = NULL (if card will not set cannot_provide_timing).
 So, card1 driver must pass sync_tick(s, 1) to hardware, but ignore sync_tick(s, 0), to prevent self-tune itself.


By: Pavel Selivanov (biohumanoid) 2013-01-16 06:12:46.753-0600

Patch file provided https://issues.asterisk.org/jira/secure/attachment/45926/dahdi-base.c.patch .
Please, let me know if have any questions.


By: Pavel Selivanov (biohumanoid) 2013-01-18 04:24:50.376-0600

patch over current git trunk.
2013.01.17-dahdi-base.c.patch


By: Shaun Ruffell (sruffell) 2013-01-19 16:18:49.182-0600

{quote}
There are still some troubles with locating true MASTER in DAHDI 2.6.1.
1. ::master not initialized (::master = NULL).
{quote}

This is not a problem because it's declared static.  Static global variables are by definition initialized to 0 or NULL.

{quote}
2. even span with alarm can become a master, if old_master == NULL.
{quote}

This is again not a problem with core timer. If there is only one span, we want that span to become the "master". If the span in alarm is not providing timing, the coretimer will kick in and tick the spans.

{quote}
3. Nobody call __dahdi_find_master_span in _dahdi_unassign_span.
_dahdi_unassign_span have own find_master loop. What for ?
{quote}

I agree with you here. It would be better to have unassign span call something like __dahdi_find_master_span.  Would need some tweaking though because it probably is the way it is to eliminate the window where master would unnecessarily be NULL when the master is unassigned. So probably need to change __dahdi_find_master_span() to require chan_lock to be held when call and not take that lock internally.

{quote}
4. dynamic spans still can not become a master span.
WHY ?
{quote}

Why do you say that? dynamic local spans are the only one that cannot be the master span, since those are the only spans that set cannot_provide_timing in struct dahdi_span. Local spans cannot become the master since they have no way to "tick" on their own unlike all other spans. Granted, I've not set this up in awhile but I'm surprised that you say this.

{quote}
5. dummy span can not become a master span (it's good to have a backup span with good 1ms ticks, provided by HPET).
WHY ?
{quote}

Because the dummy_span is a concept that is expired. If dynamic_eth spans need a 1ms timer running, it should set one up itself instead of relying on the sync_tick from the core_timer.  For the vast majority of DAHDI users, they don't need a 1ms tick to mix audio that typically comes in 20ms chunks. It just adds unnecessary overhead.

By: Pavel Selivanov (biohumanoid) 2013-01-21 03:58:00.184-0600

{quote}
This is not a problem because it's declared static. Static global variables are by definition initialized to 0 or NULL.
{quote}
Yep, sorry.

{quote}
This is again not a problem with core timer. If there is only one span, we want that span to become the "master". If the span in alarm is not providing timing, the coretimer will kick in and tick the spans.
{quote}
Yes, DAHDI will roll using core timer ticks. But it will not be reflected in procfs.
cat /proc/dahdi/1 will display "(MASTER)", what is not truth.

{quote}
Why do you say that? dynamic local spans are the only one that cannot be the master span, since those are the only spans that set cannot_provide_timing in struct dahdi_span. Local spans cannot become the master since they have no way to "tick" on their own unlike all other spans. Granted, I've not set this up in awhile but I'm surprised that you say this.
{quote}
dynamic will fail test_bit(DAHDI_FLAGBIT_RUNNING, &s->flags), as nobody set this flag !
It's fixed in patch for dynamic, probably I should write it there...
https://issues.asterisk.org/jira/browse/DAHLIN-26 .

{quote}
Because the dummy_span is a concept that is expired. If dynamic_eth spans need a 1ms timer running, it should set one up itself instead of relying on the sync_tick from the core_timer. For the vast majority of DAHDI users, they don't need a 1ms tick to mix audio that typically comes in 20ms chunks. It just adds unnecessary overhead.
{quote}
It theory - I would agree, but in practice some bad use-cases:

1. "static" span - from PSTN. dynamic span - to local PBX. PSTN - sync source for DAHDI, local PBX.
span=1,1,0,cas,hdb3,crc4
dynamic=eth,eth0/00:02:b3:35:43:9c,30,0
If span is in alarm - dynamic will be ticked by core_timer.
core timer will do 1000/HZ ticks (4 in my case) every HZ/1000 ms.
On first tick, dynamic will run tasklet.
ticks 2-4 will be lost, as tasklet will not even start.
It will be rejected with "if (likely(!taskletpending))"

2. "dynamic" span 1 - from PSTN. "dynamic" span 2- to local PBX. PSTN - sync source for DAHDI, local PBX.
dynamic=eth,eth0/00:02:b3:35:43:9c,30,1
dynamic=eth,eth0/00:02:b3:35:43:9d,30,0
If dynamic 1 will lost link (RED alarm), will burn - dynamic 2 will be ticked by core_timer.

So, it's good to have a backup sync source.
Moreover, patched __dahdi_find_master_span even more compact, than the original one (nobody loose nothing).

It's also possible to rework tasklet in dynamic (to run tasklet untill taskletsched != taskletrun, but deadlock protection).

Thank you for a dialog, I appreciate it.