Summary: | DAHLIN-00290: dahdi_dynamic_eth have a huge jitter on transmit | ||
Reporter: | Pavel Selivanov (biohumanoid) | Labels: | |
Date Opened: | 2012-04-18 08:03:53 | Date Closed: | 2019-05-31 09:54:50 |
Priority: | Major | Regression? | No |
Status: | Closed/Complete | Components: | dahdi_dynamic_eth |
Versions: | 2.6.1 | Frequency of Occurrence | Frequent |
Related Issues: | |||
Environment: | tasklets enabled in dahdi_dynamic.c (#define ENABLE_TASKLETS) | Attachments: | ( 0) 0001-dahdi_dynamic-Use-a-tasklet-for-flushing-dynamic-dri.patch ( 1) 0002-Revert-dahdi_dynamic_eth-Move-tx-packet-flushing-to-.patch ( 2) dahdi_dynamic_eth.c.patch ( 3) dahdi_dynamic.c.patch |
Description: | At r10562 "Move tx packet flushing to process context". Seems, using a common "thread" can produce a huge jitter. Since this "common" thread can be used by any driver - it can't guarantee 1ms... | ||
Comments: | By: Pavel Selivanov (biohumanoid) 2012-04-18 10:06:45.761-0500 Seems, troubles started by applying patches from. https://issues.asterisk.org/jira/browse/DAHLIN-245 Before: dahdi_dynamic, by default, done all job in tasklet, so no troubles with dev_queue_xmit. After: dahdi_dynamic, by default, is working in interrupt context. I'll do my best to cleanup patches https://issues.asterisk.org/jira/browse/DAHLIN-26 as soon as I can. By: Shaun Ruffell (sruffell) 2012-04-18 10:16:15.575-0500 Hmm. I had problems when the work was done in the tasklet with running patloop test. I made the change in [r9577 "dahdi_dynamic: Do not enable tasklets by default for dynamic_spans"|http://svnview.digium.com/svn/dahdi?view=revision&revision=9577] to make patloop tests work reliably for me. So...I guess I'll need to think about this some more. I want to get away from the assumption that the masterspan is called every 1ms for performance reasons. There must be a way to rework the dynamic_eth drivers to make this work. What is the error that you're seeing? Are you seeing patloop test problems on certain machines? I ran patloop test between machines linked with TDMoE spans during the last round of changes to dynamic spans without any problems. By: Shaun Ruffell (sruffell) 2012-04-18 10:52:47.139-0500 Looking through that code again, I think what needs to happen is that the dynamic_run can be split up into two parts. One part that runs in the interrupt handler that calls dahdi_transmit / dahdi_dynamic_sendmessage and then the flushing can be done in the workqueue. I'll prepare a patch. By: Shaun Ruffell (sruffell) 2012-04-18 11:29:21.781-0500 Attached two patches that I tested against trunk. There are still patgen / pattest errors when running on debug kernel because the tasklet is pushed of to the ksoftirqd thread, but on a release 3.4-rc3 kernel it seems to sit at about 5% in SI context without any hiccups. Is the issue you saw resolved when applying these two patches? By: Pavel Selivanov (biohumanoid) 2012-04-19 07:11:09.089-0500 I've submitted patches we and our clients have being using for years. You can test a ready to use bundle http://parabel.ru/d/dahdi_2.6.0%2B2.6.0-parabel_2.6.0.tar.bz2 I didn't submitted it to https://issues.asterisk.org/jira/browse/DAHLIN-26 because there are a number of thing I'd like to do/cleanup: 1. replace snprintf with seq_prints 2. implement per-span tasklet NIC (if it's good) will call dahdi_dynamic_hook from different CPU's, so, tasklet will fit us. 3. rework dahdi_dynamic_sync_tick, put it in a tasklet. dahdi_dynamic_sync_tick should do all transmissions. So, we'll get rid of dualism - (MASTER) in DAHDI & Clock Source in dahdi_dynamic. It will always be the same (as it should). By: Pavel Selivanov (biohumanoid) 2012-04-19 07:16:48.428-0500 I didn't tested your patches, sorry. I can do it tomorrow (I'm at GMT+6). >What is the error that you're seeing? Are you seeing patloop test problems on certain machines? I ran patloop test between machines linked with TDMoE spans during the last round of changes to dynamic spans without any problems. I'm using "nethdlc" feature to test + (selfmade) trafgen + crc16. On hardware, with fifo up to 5ms, I get sliperr=skiperr (with unpatched DAHDI). By: Pavel Selivanov (biohumanoid) 2012-04-19 07:29:32.095-0500 Simple test to force jitter/errors: #top s(sleep) 0.001 By: Pavel Selivanov (biohumanoid) 2013-01-16 06:59:46.659-0600 Any news on problem ? Actually, I think it (workqueue or tasklet) should be moved to dahdi_dynamic. dynamic already has tasklet. dynamic should start tasklet on rx (default untill dahdi 2.6.x), on tx(will solve this bug), or not used at all. We can set a flag in dahdi_dynamic_driver, irq_safe_tx (zero by default). By: Shaun Ruffell (sruffell) 2013-01-16 10:42:23.872-0600 I believe this issue is addressed in [(f44b252472ad "dahdi_dynamic: Use a tasklet for flushing dynamic drivers.")|http://git.asterisk.org/gitweb/?p=dahdi/linux.git;a=commitdiff;h=f44b252472ad76c87b682d56e2d603aba751fb29] and [(ef065a5e2a "Revert "dahdi_dynamic_eth: Move tx packet flushing to process context.")|http://git.asterisk.org/gitweb/?p=dahdi/linux.git;a=commitdiff;h=ef065a5e2a26ba7bc9c9c15158e7f38445f3523f]. Do you still have the same issues with the master branch of DAHDI? DAHDI-Linux has switched to git for the primary source control, so you can grab it like: {{git clone git://git.asterisk.org/dahdi/linux dahdi-linux}} By: Pavel Selivanov (biohumanoid) 2013-01-18 03:32:58.883-0600 Sorry, knew nothing about git repository, so, knew nothing on fixes... It's working now. some hints: 1. why don't you use the same (one) tasklet for either receive or transmit job ? Too much #if #else #endif ... 2. ENABLE_TASKLET is not correct anymore. dynamic will always have a tasklet (either for receive or transmit). It's better to choose via module parameter. modprobe dahdi_dynamic tasklet=0/1, receive/transmit. 3. it's probably a good idea to start flush tasklet if (irqs_disabled()). Otherwise, flush can be called directly. By: Shaun Ruffell (sruffell) 2013-01-20 19:47:32.254-0600 {quote} some hints: 1. why don't you use the same (one) tasklet for either receive or transmit job ? Too much #if #else #endif ... {quote} Ok, perhaps. I thought it would make things clearer to indicate that only the flush of the transmit was happening in the flush_tlet. But, perhaps not. {quote} 2. ENABLE_TASKLET is not correct anymore. dynamic will always have a tasklet (either for receive or transmit). It's better to choose via module parameter. modprobe dahdi_dynamic tasklet=0/1, receive/transmit. {quote} I'm not crazy about this, but then I'm not a huge dynamic span user. What I think would be preferable is to make things work reliably without exposing a tuning knob like that. I believe that by just using the flush tasklet, things should be better now, but I could be mistaken. {quote} 3. it's probably a good idea to start flush tasklet if (irqs_disabled()). Otherwise, flush can be called directly. {quote} Isn't it only called in the context of dahdi_receive? If not, then I agree with you. By: Pavel Selivanov (biohumanoid) 2013-01-21 04:08:22.066-0600 {quote} Ok, perhaps. I thought it would make things clearer to indicate that only the flush of the transmit was happening in the flush_tlet. But, perhaps not. {quote} #if is an evil... In this driver, it will just produce future troubles. {quote} I'm not crazy about this, but then I'm not a huge dynamic span user. What I think would be preferable is to make things work reliably without exposing a tuning knob like that. I believe that by just using the flush tasklet, things should be better now, but I could be mistaken. {quote} Actually, both modes works reliable. But one will produce less latency (echo cancel will be done in tasklet), while second will give better system response). {quote} Isn't it only called in the context of dahdi_receive? If not, then I agree with you. {quote} dahdi_receive could already be called from tasklet/workqueue, of another driver or even dahdi_dummy. So, dynamic (with timing=0) could be called out of interrupt context. |