[Home]

Summary:DAHLIN-00290: dahdi_dynamic_eth have a huge jitter on transmit
Reporter:Pavel Selivanov (biohumanoid)Labels:
Date Opened:2012-04-18 08:03:53Date Closed:2019-05-31 09:54:50
Priority:MajorRegression?No
Status:Closed/CompleteComponents:dahdi_dynamic_eth
Versions:2.6.1 Frequency of
Occurrence
Frequent
Related
Issues:
Environment:tasklets enabled in dahdi_dynamic.c (#define ENABLE_TASKLETS)Attachments:( 0) 0001-dahdi_dynamic-Use-a-tasklet-for-flushing-dynamic-dri.patch
( 1) 0002-Revert-dahdi_dynamic_eth-Move-tx-packet-flushing-to-.patch
( 2) dahdi_dynamic_eth.c.patch
( 3) dahdi_dynamic.c.patch
Description:At r10562 "Move tx packet flushing to process context".
Seems, using a common "thread" can produce a huge jitter.
Since this "common" thread can be used by any driver - it can't guarantee 1ms...
Comments:By: Pavel Selivanov (biohumanoid) 2012-04-18 10:06:45.761-0500

Seems, troubles started by applying patches from.
https://issues.asterisk.org/jira/browse/DAHLIN-245

Before:
dahdi_dynamic, by default, done all job in tasklet, so no troubles with dev_queue_xmit.
After:
dahdi_dynamic, by default, is working in interrupt context.

I'll do my best to cleanup patches https://issues.asterisk.org/jira/browse/DAHLIN-26 as soon as I can.


By: Shaun Ruffell (sruffell) 2012-04-18 10:16:15.575-0500

Hmm. I had problems when the work was done in the tasklet with running patloop test.  I made the change in [r9577 "dahdi_dynamic: Do not enable tasklets by default for dynamic_spans"|http://svnview.digium.com/svn/dahdi?view=revision&revision=9577] to make patloop tests work reliably for me.

So...I guess I'll need to think about this some more.  I want to get away from the assumption that the masterspan is called every 1ms for performance reasons. There must be a way to rework the dynamic_eth drivers to make this work.

What is the error that you're seeing?  Are you seeing patloop test problems on certain machines?  I ran patloop test between machines linked with TDMoE spans during the last round of changes to dynamic spans without any problems.

By: Shaun Ruffell (sruffell) 2012-04-18 10:52:47.139-0500

Looking through that code again, I think what needs to happen is that the dynamic_run can be split up into two parts. One part that runs in the interrupt handler that calls dahdi_transmit / dahdi_dynamic_sendmessage and then the flushing can be done in the workqueue.  I'll prepare a patch.

By: Shaun Ruffell (sruffell) 2012-04-18 11:29:21.781-0500

Attached two patches that I tested against trunk. There are still patgen / pattest errors when running on debug kernel because the tasklet is pushed of to the ksoftirqd thread, but on a release 3.4-rc3 kernel it seems to sit at about 5% in SI context without any hiccups.

Is the issue you saw resolved when applying these two patches?

By: Pavel Selivanov (biohumanoid) 2012-04-19 07:11:09.089-0500

I've submitted patches we and our clients have being using for years.
You can test a ready to use bundle http://parabel.ru/d/dahdi_2.6.0%2B2.6.0-parabel_2.6.0.tar.bz2

I didn't submitted it to https://issues.asterisk.org/jira/browse/DAHLIN-26
because there are a number of thing I'd like to do/cleanup:
1. replace snprintf with seq_prints

2. implement per-span tasklet
NIC (if it's good) will call dahdi_dynamic_hook from different CPU's, so, tasklet will fit us.

3. rework dahdi_dynamic_sync_tick, put it in a tasklet.
dahdi_dynamic_sync_tick should do all transmissions.
So, we'll get rid of dualism - (MASTER) in DAHDI & Clock Source in dahdi_dynamic.
It will always be the same (as it should).


By: Pavel Selivanov (biohumanoid) 2012-04-19 07:16:48.428-0500

I didn't tested your patches, sorry.
I can do it tomorrow (I'm at GMT+6).

>What is the error that you're seeing? Are you seeing patloop test problems on certain machines? I ran patloop test between machines linked with TDMoE spans during the last round of changes to dynamic spans without any problems.

I'm using "nethdlc" feature to test + (selfmade) trafgen + crc16.
On hardware, with fifo up to 5ms, I get sliperr=skiperr (with unpatched DAHDI).


By: Pavel Selivanov (biohumanoid) 2012-04-19 07:29:32.095-0500

Simple test to force jitter/errors:
#top
s(sleep) 0.001


By: Pavel Selivanov (biohumanoid) 2013-01-16 06:59:46.659-0600

Any news on problem ?
Actually, I think it (workqueue or tasklet) should be moved to dahdi_dynamic.

dynamic already has tasklet.
dynamic should start tasklet on rx (default untill dahdi 2.6.x), on tx(will solve this bug), or not used at all.
We can set a flag in dahdi_dynamic_driver, irq_safe_tx (zero by default).


By: Shaun Ruffell (sruffell) 2013-01-16 10:42:23.872-0600

I believe this issue is addressed in [(f44b252472ad "dahdi_dynamic: Use a tasklet for flushing dynamic drivers.")|http://git.asterisk.org/gitweb/?p=dahdi/linux.git;a=commitdiff;h=f44b252472ad76c87b682d56e2d603aba751fb29] and [(ef065a5e2a "Revert "dahdi_dynamic_eth: Move tx packet flushing to process context.")|http://git.asterisk.org/gitweb/?p=dahdi/linux.git;a=commitdiff;h=ef065a5e2a26ba7bc9c9c15158e7f38445f3523f].  Do you still have the same issues with the master branch of DAHDI?  

DAHDI-Linux has switched to git for the primary source control, so you can grab it like:

{{git clone git://git.asterisk.org/dahdi/linux dahdi-linux}}

By: Pavel Selivanov (biohumanoid) 2013-01-18 03:32:58.883-0600

Sorry, knew nothing about git repository, so, knew nothing on fixes...

It's working now.

some hints:
1. why don't you use the same (one) tasklet for either receive or transmit job ?
Too much #if #else #endif ...

2. ENABLE_TASKLET is not correct anymore.
dynamic will always have a tasklet (either for receive or transmit).
It's better to choose via module parameter.
modprobe dahdi_dynamic tasklet=0/1, receive/transmit.

3. it's probably a good idea to start flush tasklet if (irqs_disabled()).
Otherwise, flush can be called directly.

By: Shaun Ruffell (sruffell) 2013-01-20 19:47:32.254-0600

{quote}
some hints:
1. why don't you use the same (one) tasklet for either receive or transmit job ?
Too much #if #else #endif ...
{quote}

Ok, perhaps.  I thought it would make things clearer to indicate that only the flush of the transmit was happening in the flush_tlet.  But, perhaps not.

{quote}
2. ENABLE_TASKLET is not correct anymore.
dynamic will always have a tasklet (either for receive or transmit).
It's better to choose via module parameter.
modprobe dahdi_dynamic tasklet=0/1, receive/transmit.
{quote}

I'm not crazy about this, but then I'm not a huge dynamic span user.  What I think would be preferable is to make things work reliably without exposing a tuning knob like that. I believe that by just using the flush tasklet, things should be better now, but I could be mistaken.

{quote}
3. it's probably a good idea to start flush tasklet if (irqs_disabled()).
Otherwise, flush can be called directly.
{quote}

Isn't it only called in the context of dahdi_receive?  If not, then I agree with you.

By: Pavel Selivanov (biohumanoid) 2013-01-21 04:08:22.066-0600

{quote}
Ok, perhaps. I thought it would make things clearer to indicate that only the flush of the transmit was happening in the flush_tlet. But, perhaps not.
{quote}
#if is an evil...
In this driver, it will just produce future troubles.

{quote}
I'm not crazy about this, but then I'm not a huge dynamic span user. What I think would be preferable is to make things work reliably without exposing a tuning knob like that. I believe that by just using the flush tasklet, things should be better now, but I could be mistaken.
{quote}
Actually, both modes works reliable.
But one will produce less latency (echo cancel will be done in tasklet), while second will give better system response).

{quote}
Isn't it only called in the context of dahdi_receive? If not, then I agree with you.
{quote}
dahdi_receive could already be called from tasklet/workqueue, of another driver or even dahdi_dummy.
So, dynamic (with timing=0) could be called out of interrupt context.