[Home]

Summary:DAHLIN-00060: dahdi_dummy does not tick on some systems
Reporter:Tzafrir Cohen (tzafrir)Labels:
Date Opened:2008-11-19 08:34:40.000-0600Date Closed:2009-04-29 12:48:34
Priority:MinorRegression?No
Status:Closed/CompleteComponents:dahdi_dummy
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) no_dahdi_dummy_rtc.patch
( 1) no_dahdi_dummy_rtc-2.patch
( 2) no_dahdi_dummy_rtc-3.patch
Description:On some systems dahdi_dummy (or ztdummy, in the case of Zaptel) fails to "tick" DAHDI and hence DAHDI does not provide a working timing source.

Indications that DAHDI (Zaptel) provides no timing source:

1. dahdi_test (zttest) does not give an error on startup, but hangs.
2. Asterisk >= 1.4.20 fails to start, and gives the ugly "no timing source" error message:

ERROR[10981]: asterisk.c:3036 main: Asterisk has detected a problem
with your DAHDI configuration and will shutdown for your protection.
You have options:
       1. You only have to compile DAHDI support into Asterisk if you
need it.  One option is to recompile without DAHDI support.
       2. You only have to load DAHDI drivers if you want to take
advantage of DAHDI services.  One option is to unload DAHDI modules if
you don't need them.
       3. If you need DAHDI services, you must correctly configure DAHDI.

An indication that dahdi_dummy should be the timing source for dahdi could be:

 lsmod | grep ^dahdi
 dahdi                 231888  1 dahdi_dummy

Or to see that /proc/dahdi/1 is dahdi_dummy and is listed as "MASTER".


I have seen various suggestions on how to solve this. None seems to be a silver bullet.
Comments:By: Shaun Ruffell (sruffell) 2008-11-19 09:44:41.000-0600

Do you have any clue about what might be different on those systems where dahdi_dummy 'ticks' DAHDI and those where it does not?

By: thomas yates (nullpointer) 2008-11-27 18:20:52.000-0600

dont know if this will help, but i have the same issue (dahdi doesnt seem to "tick"); i will try and offer what few details i have.

asterisk 1.6.0.1
dahdi-linux 2.0.0
dahdi-tools 2.0.0
libpri 1.4.7
spandsp 0.0.6 (pre 1)

CentOS 5.2 on an IBM 326 server (64 bit dual proc 2 GHZ opteron, dual disk in RAID1 config)

[root@localhost asterisk]# uname -a
Linux localhost.localdomain 2.6.18-92.1.18.el5 #1 SMP Wed Nov 12 09:19:49 EST 2008 x86_64 x86_64 x86_64 GNU/Linux

i have added these entries to /etc/modprobe.d/dahdi

options dahdi debug=1
options dahdi_dummy debug=2

doing "service dahdi start" and tailing /var/log/messages gives:

Nov 27 13:44:52 localhost kernel: dahdi: Telephony Interface Registered on major 196
Nov 27 13:44:52 localhost kernel: dahdi: Version: 2.0.0
Nov 27 13:44:52 localhost kernel: dahdi: Registered Span 1 ('DAHDI_DUMMY/1') with 0 channels
Nov 27 13:44:52 localhost kernel: dahdi: Span ('DAHDI_DUMMY/1') is new master
Nov 27 13:44:52 localhost kernel: dahdi_dummy: RTC rate is 1024
Nov 27 13:44:52 localhost kernel: dahdi: Registered tone zone 0 (United States / North America)

and no "tick" debug info - no more debug info from dahdi after hours pass


[root@localhost modprobe.d]# lsmod | grep ^dahdi
dahdi_dummy            38984  0
dahdi                 231760  1 dahdi_dummy

[root@localhost modprobe.d]# cat  /proc/dahdi/1
Span 1: DAHDI_DUMMY/1 "DAHDI_DUMMY/1 (source: RTC) 1" (MASTER)


both of these entries are as expected (i assume)

however, the following results in nothing being returned (i am a noob - is this bad?)

strings /lib/modules/2.6.18-92.1.18.el5/dahdi/dahdi.ko | grep source

worth noting i have no telephony hardware installed, so i have followed the advice on this bug: http://bugs.digium.com/view.php?id=13966

i commented out all the modules in /etc/dahdi/modules, and added
noload => codec_dahdi.so

to /etc/asterisk/modules.conf

thanks for all you do - even tho i am a noob, i REALLY appreciate how wonderful asterisk is!



By: Tzafrir Cohen (tzafrir) 2008-11-28 02:03:37.000-0600

The relevant command is:

 strings /lib/modules/2.6.18-92.1.18.el5/dahdi/dahdi_dummy.ko | grep source

If (as in the case of your system) dahdi_dummy is used, The timing source is also printed in the first line on /proc/dahdi/1 (or whatever span number dhadh_dummy gets after it is loaded)

For kernel 2.6.18 it should be RTC.

By: Tzafrir Cohen (tzafrir) 2008-12-28 03:59:56.000-0600

May be related: on one Elastix system with kernel 2.6.18-53.1.4.el5 and with Zaptel 1.4 SVN on a system with an Intel ICH7 chipset. ztdummy is loaded in the Zaptel init.d script and provides no timing. After unloading and loading the module on the system it suddenly starts to provide timing.

By: yan83330 (yan83330) 2009-01-26 12:14:11.000-0600

Did someone have a solution to fix the problem ?

By: Shaun Ruffell (sruffell) 2009-01-26 13:17:37.000-0600

yan83330 do you have a system that exhibits this problem reliably even when dahdi_dummy is unloaded and reloaded?

By: yan83330 (yan83330) 2009-01-27 03:51:57.000-0600

On my system, when i launch "/etc/init.d/dahdi start" before "/etc/init.d/asterisk start", Asterisk does not start, and i have the bad dahdi configuration error message.
I have done many stop and start on Asterisk and dahdi, and i still have the same problem.
nullpointer, do you still have the problem ? If not, what should i do to make dahdi_dummy work correctly.
I have installed 3 more asterisk/dahdi servers without encountering this problem !
Thanks for your help.

By: thomas yates (nullpointer) 2009-01-27 04:38:18.000-0600

yan83330-
that is exactly what i experience; if i start asterisk without DAHDI running, it works fine. however, if DAHDI is running first, asterisk will not start, and will give the error messages you describe.

also worth noting, if i start asterisk, AND THEN START DAHDI, asterisk will work for about 30 seconds of a call, then will lock up (silence).

tzafrir -
i do still have the problem on my dual proc 64 bit AMD box.

i moved to a (literally) $20 32 bit INTEL dell desktop for development, just to keep moving.

i'm guessing its a hardware issue - i do, however, intend to circle back around, wipe the 64 bit box, and give it another shot to see if i can get it working (maybe i did something dumb in the install process?). i also have to wonder if using an even older SPANDSP might remedy the situation?

thanks for following up though, i do appreciate it.

By: Shaun Ruffell (sruffell) 2009-01-27 13:28:15.000-0600

nullpointer:  If it happens reliably on your dual proc 64 bit AMD box, would you be willing to provide me access to that box via SSH for troubleshooting?  If so perhaps you can find my on IRC or email me directly to setup a reverse SSH login.

By: Adrien Laurent (adrienlaurent) 2009-02-02 21:41:11.000-0600

Same problem with Asterisk 1.4.23.1, Dahdi 2.1.0.3 on "Dual-Core AMD Opteron(tm) Processor 2212".

It looks like to be an amd related problem...

By: Shaun Ruffell (sruffell) 2009-02-04 14:49:21.000-0600

I've finally been able to get onto a system that exhibits this problem.  It appears to specifically be a problem with the periodic interrupt from the real time clock.  If you disable the use of the real-time clock by commenting out the line that says "#define USE_RTC", then dahdi_dummy will fall back to the original method of using kernel timers, and tick once again.

What I still cannot answer is why the periodic interrupt stops being called on this system.  But perhaps what is needed is either a parameter to allow the user forcibly use the kernel timers instead of the real-time clock (or high-res clock for that matter if that is what they want to do) and possibly even during startup change the driver to see if it is ticking at a resonable rate, and if not, automatically fall back to the kernel timers.

Although, I would rather know why it is stopping on this system before spending any time working around it.

By: Shaun Ruffell (sruffell) 2009-02-04 14:56:12.000-0600

I uploaded the no_dahdi_dummy_rtc.patch in order to be a little more explicit about the change to get dahdi_dummy.c to tick again.

By: Shaun Ruffell (sruffell) 2009-02-04 15:13:05.000-0600

Although what probably happen here is to make a change suggested by bmd.  Instead of depending on getting an interrupt 1000 times a second, run a timer as close to 1000 times a second as we can and calculate how many times dahdi_receive and dahdi_transmit should be called based on the processors timestamp counter.  With CONFIG_HZ values 250 and greater, this would still make sure that the dahdi_receive function is called 1000 times, even if it is not evenly distributed at 1ms intervals.

But then, there is movement to eliminate the need for dahdi_dummy for conferencing and asterisk timing anyway...

By: Rafael Angulo (rafuchoucv) 2009-03-25 16:26:07

I found the problem running asterisknow beta2 1.5.0 with Asterisk 1.4.24 virtualized using XEN 3.2, I think is not a bug, since a virtualized machine should have problems with RTC, but anyway the solution rebuilding DAHDI with "#define USE_RTC" commented out works just fine.

By: Shaun Ruffell (sruffell) 2009-03-27 18:58:59

I've uploaded no_dahdi_dummy_rtc-2.patch which implements essentially what bmd suggested.  It also completely eliminates any hint of support for the real time clock.  On my test system, over time it even produced more accurate results with timer test than dahdi_dummy which used the real time clock.

Could someone running asterisk under Xen try it out?

By: Shaun Ruffell (sruffell) 2009-03-27 19:07:50

Just uploaded no_dahdi_dummy-rtc-3.patch which removes a potential race condition on unload.

By: Tzafrir Cohen (tzafrir) 2009-03-28 15:10:21

1. Does the patch assume HZ=250? There are actually hosts with a 1000HZ kernel (some even have it especially built for Asterisk/ztdummy)

2. Is Linux26 with HZ=250 good enough?

By: Shaun Ruffell (sruffell) 2009-03-28 15:40:49

This patch doesn't make any assumptions about the HZ.  It will run if HZ is even lower than 100.  

And running the interval every 4ms is good enough I believe.  dahdi_dummy is used when there isn't another span from which to derive timing.  Therefore, it is used to mix channels from user space, which is typically dealing with 20ms audio chunks.  Therefore mixing those channels every 4ms is still a little overkill.  But at least this will still allow the processor to stay idle longer if there aren't any channels to mix.  Probably should be configured to run every HZ/100 (or every 10ms) for even more potential power savings / performance improvements.

By: Shaun Ruffell (sruffell) 2009-03-30 10:43:01

On a poweredge 2600 (with a super long system management interrupt) with CentOS 5.2 and kernel version 2.6.18-92.1.22.el5, I ran the timertest from dahdi-tools with a dahdi_dummy in three configurations.

With USE_RTC commented out, timer test was off by 118ms after running for 97 seconds.
Timer Expired (97118 ms)!

Using the RTC on this system was better, as it was off 57ms after running for 15 minutes.
Timer Expired (925057 ms)!

Using no_dahdi_dummy_rtc-3.patch, it was off 38ms after running for 22 hours.
Timer Expired (81369038 ms)!

By: David Woolley (davidw) 2009-03-30 11:24:34

A couple of things to think about.

I've heard that the RTC doesn't co-exist well with HPET timers <http://groups.google.co.uk/group/comp.protocols.time.ntp/msg/dc9c9851d4c5aa50>.

The trend with recent Linux kernels is towards tickless timing, i.e. not to actually run the CTC timer at the nominal rate, but to only interrupt when the kernel thinks there is work to do.  I think it also rounds times so as to consolidate multiple events into one interrupt.  I believe it is optional, but distributors may well default it.

By: Jason Parker (jparker) 2009-04-13 11:01:32

Reporter of DAHLIN-91 has given positive confirmation that no_dahdi_dummy_rtc-3.patch works correctly for him in Xen.

So, +1 from me.

By: Ask Bjørn Hansen (ask) 2009-04-18 21:33:31

We've only done some limited use of it, but the patch continues to work fine.

- ask@develooper.com (reporter of ASTERISK-1472884)

By: David Backeberg (dbackeberg) 2009-04-27 16:39:29

I've been testing this out and it's working well. Using it as the backend for about twenty rooms, about 12 hours a day, about 5 days a week. This fixed our inability to access RTC while dahdi_dummy was loaded.

By: Digium Subversion (svnbot) 2009-04-29 12:48:29

Repository: dahdi
Revision: 6524

U   linux/trunk/drivers/dahdi/dahdi_dummy.c

------------------------------------------------------------------------
r6524 | sruffell | 2009-04-29 12:48:28 -0500 (Wed, 29 Apr 2009) | 12 lines

dahdi_dummy: Remove real-time clock support.

This removes support for using the real-time clock as a timing source in
dahdi_dummy.  Instead, the normal kernel timers method is now more accurate
since it keeps track of how much real time has passed to determine how many
times to call dahdi_receive and dahdi_transmit.  This method was originally
suggested by bmd.

(closes issue DAHLIN-60)
(closes issue DAHLIN-91)
Reported by: tzafrir
Tested by: dbackeberg, ask
------------------------------------------------------------------------

http://svn.digium.com/view/dahdi?view=rev&revision=6524