DAHLIN-00329: Support multiple dynamic spans with FIFO

[Home]

Summary: DAHLIN-00329: Support multiple dynamic spans with FIFO

Reporter: Michael Walton (mike@farsouthnet.com) Labels:

Date Opened: 2013-10-30 03:14:08 Date Closed: 2019-05-31 09:36:08

Priority: Major Regression?

Status: Closed/Complete Components: dahdi (the module) dahdi_dynamic

Versions: 2.7.0 Frequency of
Occurrence

Related
Issues:

Environment: Kernel versions 2.6.24, 2.6.38 and 3.2.0 tested Attachments: ( 0) 2013.12.04_kernel.h.patch
( 1) 2013.12.04_pavel_dahdi_dynamic.c.patch
( 2) dahdi_dummy.c.patch
( 3) dahdi_dynamic.c.patch
( 4) dahdi-base.c.patch
( 5) dahdi-dynamic-multispan-fifo.patch
( 6) kernel.h.patch

Description: The dynamic span driver works for one span, but is unable to handle the phase differences caused by line and network jitter and clock drift in a multiple span situation. The introduction of a configurable receive FIFO, proper master clock priority switching and slip/skip processing solves this problem. Additionally, to support configurations where no telephony hardware is present, the highres timer from (defunct) dahdi_dummy is re-introduced into dahdi base to replace the core timer functionality (which is limited to 250Hz) with a true 1000Hz tick. As with the core timer, this is the fallback timing when no span provides master timing.

Dynamic multispan introduces:
* Configurable fifo on incoming dynamic frames
* Master/slave dynamic spans with priority switching based on alarm status
* All transmit and receive processing is done in the dahdi_dynamic_run tasklet to ensure proper handling of jitter and phase differences
* Unreachable spans (RED alarm) switch to a poll mode to prevent network being flooded by 1ms frames that go nowhere
* Slip/skip statistics per span available on /proc/dahdi/dahdi_dynamic_stats
* High res timer added to dahdi-base to replace core timer with true 1ms tick timer

Comments: By: Michael Walton (mike@farsouthnet.com) 2013-10-30 03:15:26.125-0500

DAHDI dynamic multispan with fifo patch
By: Shaun Ruffell (sruffell) 2013-10-31 09:24:56.217-0500

Thanks Michael. I'll take a look at this and hopefully get biohumanoid to comment as well since I know he's the most active user of dynamic_spans that participates in the issue tracker. It might take me a week or so.

Also....and no problem if you don't have the time. Also, it's a little different for DAHDI and Asterisk currently, but feel free to attach patches generated with git here. That way the credit for authorship is more easily tracked through different systems.
By: Shaun Ruffell (sruffell) 2013-10-31 09:27:46.793-0500

Ahh...just scanning through the patch I already see that you have a note where you based some sections on biohumanoid's (Pavel Selivanov) original work.
By: Pavel Selivanov (biohumanoid) 2013-11-07 03:09:48.628-0600

Hi.
I've failed to check patches faster, sorry about that.

I have a number of notes, I'll start them one by one.
By: Pavel Selivanov (biohumanoid) 2013-11-07 03:10:01.291-0600

Timing patch in dahdi-base.c.

A little theory needed, to understand, how it works and how it should work.
It's not just a theory, it's a practice, as we user TDMoE in our hardware.

== Original DAHDI provides timing via coretimer, which, in most cases (see HZ in .config) give us ticks every 4ms (250Hz).
(1)If DAHDI have a master span and at least one dahdi_receive call for 4ms - DAHDI will think the timer is OK.
dahdi_receive->_process_masterspan->atomic_inc(&core_timer.count)
(2)If DAHDI have no master span, or no dahdi_receive calls - DAHDI will call _process_masterspan 4 times every 4ms.
while (ms_since_start > msecs_processed(&core_timer))
_process_masterspan();

So, case (1):
DAHDI will have excellent timing source (1ms)
or
DAHDI will have an awfull timing source (in the worst case, 3.9999ms, so, 250Hz instead of 1000Hz).
Yes, it's possible. It's a bug, but it's rare.
if (atomic_read(&core_timer.count) ==
atomic_read(&core_timer.last_count)) {
} else {
atomic_set(&core_timer.count, 0);
atomic_set(&core_timer.last_count, 0);
core_timer.start_interval = now;
}

case (2)
DAHDI will call _process_masterspan 4 times every 4ms.
It will be good for asterisk (queue), but bad for TDMoE, if hardware have rx FIFO less than 5ms.

== What will you patch bring us?
If kernel have HIGHRESTIMER,
dahdi_hr_int will be called every 1S to check, if there was at least one tick from master SPAN.
dahdi_hr_int will be called every 1mS if no ticks from master span.

if (atomic_read(&core_timer.count) == atomic_read(&core_timer.last_count)) {
_process_masterspan();
} else {
/* board driver is spinning DAHDI, relax, check back in 1 second */
if (debug)
printk(KERN_DEBUG "dahdi_hr_int: slow hrtimer tick\n");

hrtimer_forward(htmr, hrtimer_get_expires(htmr), ktime_set(1, 0));

if (!core_timer.dahdi_receive_used) {
core_timer.dahdi_receive_used = 1;
printk(KERN_NOTICE "dahdi_hr_int: disabled master core hrtimer\n");
}
}
So,
DAHDI will have excellent timing source (1ms)
or
DAHDI will have an awful timing source (in the worst case, 999ms, so, 1Hz instead of 1000Hz).

What configuration can we have with TDMoE ?
(1) Two TDMoE devices, master, master.
(2) Two TDMoE devices, slave, master.
(3) non-dynamic SPAN, one TDMoE device. slave, master.

(1) - We need either you patch with HIGHRESTIMER, or dahdi_dummy device (with HIGHRESTIMER also) to provide timing for TDMoE.
(2) - We don't need "software" timing, as we'll get it from TDMoE device.
(3) - We don't need "software" timing, as we'll get it from non-dynamic SPAN device.

Neither patched, nor original dahdi-base will not let TDMoE device became "(MASTER)" SPAN.
It's sad, as it is widely used on our hardware...
if (!can_provide_timing(s))
continue;

I think, it's better to leave dahdi's timing as is, but enable dahdi_dummy in Makefile.
1. It’s rare situation (one of three).
2. dahdi-base is 285kb of (dirty) code, with a lot of #if #else #endif. That’s enougth.
Classic joke:
#define i j /* Wish you happy debugging. */

It’s better to use dummy instead. Changes are minimal.
find_master should look like this (obsole code):
static void __dahdi_find_master_span(void)
{
struct dahdi_span *s, *old_master, *new_master = NULL;
unsigned long flags;

spin_lock_irqsave(&chan_lock, flags);
old_master = master;
/* Searching for the FIRST reliable sync source with reverse for. */
list_for_each_entry_reverse(s, &span_list, spans_node) {
if (!can_provide_timing(s) || s->alarms)
continue;
if (dahdi_is_digital_span(s) &&
!test_bit(DAHDI_FLAGBIT_RUNNING, &s->flags))
continue;
if ((0 == s->channels) && !new_master) /* dummy is better than nothing (coretimer). */
new_master = s;
new_master = s; /* But real span is better than dummy. */
}
master = new_master;
spin_unlock_irqrestore(&chan_lock, flags);

if ((debug & DEBUG_MAIN) && (old_master != master))
module_printk(KERN_NOTICE, "Master changed to '%s'\n",
master ? master->name : "no master (core timer)");
}

That's all up to the timing in dahdi-base.
I’ll write about dynamic in 4 hours.
Thank you.

By: Pavel Selivanov (biohumanoid) 2013-11-07 03:25:49.866-0600

In two words - I like you fixes like flood protection, tasklet in module_param.

What I don't like, and made it in another way:
I don't like a lot's of #if #else #endif, around all the code.
kfifo (FINALLY) finished changing their API.
That's great! I've wrapped it a bit.

#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,35)
# define kfifo_in(fifo,buf,n) __kfifo_put(fifo,buf,n)
# define kfifo_in_spinlocked(fifo,buf,n,lock) kfifo_put(fifo,buf,n)
# define kfifo_out(fifo,buf,n) __kfifo_get(fifo,buf,n)
# define kfifo_out_spinlocked(fifo,buf,n,lock) kfifo_get(fifo,buf,n)
# define kfifo_size(fifo) (fifo->size)
# define my_kfifo_alloc(size,mask,lock) kfifo_alloc(size,mask,lock)
# define my_kfifo_free(fifo) kfifo_free(fifo)
#else // >= 2.6.35
#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,36) /* 2.6.35 <= x < 2.6.36 */
#define kfifo_in_spinlocked(fifo, buf, n, lock) \
kfifo_in_locked(fifo, buf, n, lock)
#define kfifo_out_spinlocked(fifo, buf, n, lock) \
kfifo_out_locked(fifo, buf, n, lock)
#endif // < KERNEL_VERSION(2,6,36)
inline struct kfifo *my_kfifo_alloc(unsigned int size, gfp_t gfp_mask, spinlock_t *lock) {
int ret;
struct kfifo *fifo = kmalloc(sizeof(struct kfifo), gfp_mask);
ret = kfifo_alloc (fifo, size, gfp_mask);
return fifo;
}
inline void my_kfifo_free(struct kfifo *fifo) {
kfifo_free (fifo);
kfree (fifo);
}
#endif // < KERNEL_VERSION(2,6,35)

Now we can use ->timing field instead of DAHDI_FLAG_RUNNING to notify, we're online.

What is msgbuf2 for ? Not used.

kfifo_alloc have an alignment bug in kernels (still not fixed).
It will crash kernel on some channels number (like 128)...
sprintf should be changed to more up-to-date seq_printf.

I'll attach my current patches (to DAHDI 2.6.1).
I will upgdete it to DAHDI 2.7.0.1, and will attach them here.
It can take time, as I'll have a business trip on Sun.
I'll try to update untill Sun.

Thank you all.

By: Michael Walton (mike@farsouthnet.com) 2013-11-07 08:08:39.124-0600

Hi Pavel

Thanks for the comments. I have a question, you say "Neither patched, nor original dahdi-base will not let TDMoE device became "(MASTER)" SPAN". We also use this on our hardware, we have many installations where DAHDI master timing is derived from our TDMoE device. I have not done too much testing with 2.7.0.1, we use 2.4.1 in production. My question: why can the TDMoE device not become master span?

Also, the "DAHDI will have an awful timing source (in the worst case, 999ms, so, 1Hz instead of 1000Hz)", I don't agree. This is only for a short time to allow switching back to the core timer when no more masters are available. Won't happen in normal circumstances.

msgbuf2 is used to temporarily store the FIFO contents from kfifo_out in dahdi_process_rxfifo(), before copying into the channel readchunk buffers. We could share msgbuf with sendmessage I guess?

Mike
By: Pavel Selivanov (biohumanoid) 2013-11-07 11:28:52.065-0600

> Neither patched, nor original dahdi-base will not let TDMoE device became "(MASTER)" SPAN"
Oops, sorry, I read can_provide_timing instead of cannot_provide_timing (zero by default).
TDMoE device can became a MASTER SPAN. dahdi_dummy cannot.

>Also, the "DAHDI will have an awful timing source (in the worst case, 999ms, so, 1Hz instead of 1000Hz)", I don't agree. This is only for a short time to allow switching back to the core timer when no more masters are available. Won't happen in normal circumstances.
Let's imagine, you have a buggy hardware/switch/ethernet driver, or just wrong configuration (your device is configured as a slave to TDMoE, while dynamic configured with timing >=1).
It's real situations we had. We had 3com driver (grouping incomming packets 2 per 2), wrong configuration.
I'm not talking, original dahdi-base will deal it, just pointing one more bug for the community.
I'm just writing my opinion - the kernel driver should be as tiny as possible.
Mixing 2 timing implementations is not good.

>We could share msgbuf with sendmessage I guess?
I did it.

By: Pavel Selivanov (biohumanoid) 2013-11-07 11:32:59.964-0600

I've just updated my patches to 2.7.0.1.
Please take a look at it.

I can inject your fixes (or you can) over my up-to-date patches.

By: Pavel Selivanov (biohumanoid) 2013-11-07 11:58:36.970-0600

One more comment/question.

#define ENABLE_TASKLETS

So, non-master SPAN call flow will be
dahdi_dynamic_receive->dahdi_ec_span,dahdi_receive

master SPAN call flow will be
dahdi_dynamic_receive->dahdi_ec_span,dahdi_receive->dahdi_dynamic_run->tasklet_hi_schedule->dahdi_dynamic_tasklet->__dahdi_dynamic_run->dahdi_process_rxfifo,dahdi_ec_span, dahdi_receive.

So, the following problems.
1. dahdi_receive will be called twice...
2. dahdi_ec_span will be called in interrupt conext.
EC is very resource hungry, it really should be called from tasklet.

ENABLE_TASKLETS is not actual anymore.
It's not the question, will the tasklet be used or not.
The question - will it be used on RX or TX.
Why don't we just use one the same dahdi_dynamic_tlet for both schemas ?

By: Pavel Selivanov (biohumanoid) 2013-11-07 12:06:52.004-0600

Please, let me know if you would like to add you patches over my patches, or you would like me to do it ?
I'd like to rework tasklet also, to make it more compact and clear...

My tomezone is GMT+6.
From 10 Nov till 24 Nov - GMT+1.

By: Michael Walton (mike@farsouthnet.com) 2013-11-08 00:30:32.395-0600

I agree, always needs to be tasklets.

It would be great if you could add my patches over yours and re-submit. One thing though, I think that bringing hrtimer into dahdi-base makes sense, it will also help other users by improving the behaviour when no other timing source is available. Overall, the way I have done the timing control in dahdi_dynamic works pretty well for master and slave TDMoE, and also supports properly local timing sources (our product uses both TDMoE and PCIe based timing). I don't think I understand what part of it doesn't work for your situation. So my summary:

1. Your kfifo improvements should be in
2. I think the call to dahdi_receive and dahdi_ec_span in dahdi_dynamic_receive is a merge error (my bad), these lines should be removed as far as I can tell. This will address your comment 07/Nov/13 7:58 PM.
3. My opinion is that dahdi_dummy has been phased out in favour of core timer in dahdi-base, we should keep to this and add hrtimer to dahdi-base as per my patch (Shaun/Russ comment?)

By: Shaun Ruffell (sruffell) 2013-11-08 01:56:44.466-0600

Regarding 1: I haven't looked at the patches, but kfifo might still need to be wrapped in some #ifdefs or back ported since DAHDI still needs to compile against 2.6.18 at least until all major distributions drop support for it, but that won't be for another couple of years.

Regarding 3: I'm not convinced that hrtimer needs to be added to dahdi-base. When there aren't any hardware cards installed on the system, audio is typically only mixed in 20ms chunks to/from user space. Even if HZ is set to 100, the timer still fires twice as fast as the typical audio chunk size. I think he overhead of running the hrtimer interrupt isn't warranted for the "default" case. I'm not a big dynamic eth span user, but I imagine it would be best for dahdi_dynamic_eth, when intended to be the timing master, to set an hrtimer internally and use. The core timer would then see it "ticking" and go to the slower rate looking for problems.
By: Pavel Selivanov (biohumanoid) 2013-12-04 07:22:27.318-0600

Gentlemen, please forgive me, being extremely busy.

Please, take a look at the patch.
Hope you'll like it.

I've took my latest patches dahdi_dynamic.c.patch and integrates patches from Michael.
Feel free to change credit.

1. Injected anti-flood by Michael.
2. Reworked tasklet scenario (describe later)
3. Fixed a bit txcnt incrementing.
4. Prepared a STUB to implement rx tasklet for every dynamic SPAN.

As I wrote, it's not the question, will we use tasklet in dynamic.
The question is - where.
We MUST call dev_queue_xmit with interrupt enabled.

So, ENABLE_TASKLET is useless, and absolutely wrong word.
So, I removed all the code #if ENABLE_TASKLETS.

modprobe rxtasklet=0
TDMoE data will be processed(echocancelled) in interrupt context.
All TX will be done in tasklet.
Thus, data can be processed on different CPU cores.

modprobe rxtasklet=1 (default)
TDMoE data will be processed(echocancelled) in tasklet
All TX will be done in tasklet, so, all SPANS will be processed in one tasklet on one CPU core.
All TX will be done (indirectly) in tasklet.

TODO: improve rxtasklet=1, by creating rx tasklet.
So, every SPAN will be processed in separate tasklet.
If the card is good - every new packet will come from another CPU core, so, tasklet will be started on another core.