Summary: | DAHLIN-00333: dahdi scratchy varying with system load | ||||
Reporter: | Thomas B. Clark (tbclark3) | Labels: | |||
Date Opened: | 2014-01-18 14:23:53.000-0600 | Date Closed: | 2014-04-09 11:25:19 | ||
Priority: | Major | Regression? | |||
Status: | Closed/Complete | Components: | dahdi (the module) | ||
Versions: | 2.8.0.1 | Frequency of Occurrence | |||
Related Issues: |
| ||||
Environment: | Fedora 20, kernel 3.12.6, Intel Xeon CPU E3-1275, 32 GB RAM, Digium single span T1 PCI (not PCIx) card | Attachments: | |||
Description: | At low system low, call quality is so scratchy that it is almost impossible to hear the dial tone. dahdi_test confirms:
8192 samples in 8640.936 system clock sample intervals (94.520%) 8192 samples in 8637.144 system clock sample intervals (94.566%) 8192 samples in 8591.728 system clock sample intervals (95.121%) 8192 samples in 8596.584 system clock sample intervals (95.061%) 8192 samples in 8665.929 system clock sample intervals (94.215%) --- Results after 7 passes --- Best: 95.121% -- Worst: 94.215% -- Average: 94.661537% Cummulative Accuracy (not per pass): 94.662 However, I remembered with my last server (built with commodity components, including an i5 processor), the same thing would happen before starting mister house, which is a home automation program that runs continuously. I never did figure out why, but today wondered if it might be system load. So, repeating dahdi_test after starting up boinc-client, and bringing the load up to about 50%: 8192 samples in 8232.911 system clock sample intervals (99.501%) 8192 samples in 8518.016 system clock sample intervals (96.020%) 8192 samples in 8202.080 system clock sample intervals (99.877%) 8192 samples in 8539.712 system clock sample intervals (95.755%) 8192 samples in 8191.784 system clock sample intervals (99.997%) 8192 samples in 8558.552 system clock sample intervals (95.525%) 8192 samples in 8213.704 system clock sample intervals (99.735%) 8192 samples in 8443.520 system clock sample intervals (96.930%)^C --- Results after 8 passes --- Best: 99.997% -- Worst: 95.525% -- Average: 97.917618% Cummulative Accuracy (not per pass): 97.918 It’s still not usable, but much better. Now I can hear dial tone, but with intermittent scratchiness that corresponds to the obvious cycling of the quality of dahdi_test. By varying the load on the system, I can change the cumulative accuracy, but it doesn’t ever go above about 98%. | ||||
Comments: | By: Shaun Ruffell (sruffell) 2014-01-18 16:39:30.884-0600 Interesting, so on this system, increasing system load improves the performance. This sounds like some sort of cstate / powersave feature. I.e., when the CPU thinks the system is under light loads it allows the processor to go into a powersave mode, from which it can't wake up quickly enough. I image dmseg is also indicating that there are "hard underruns" from the wcte12xp driver? Are there any BIOS settings related to performance / power or cstates which you can disable that changes the result? By: Shaun Ruffell (sruffell) 2014-01-18 16:57:27.508-0600 And just to show that, generally, low system load doesn't correlate to poor dahdi_test results with TE122: {noformat} # dahdi_scan | grep -e ^devicetype devicetype=Wildcard TE122 (VPMADT032) # uptime 21:52:23 up 11:19, 1 user, load average: 0.29, 0.27, 0.11 # dahdi_test -c 10 Opened pseudo dahdi interface, measuring accuracy... 99.983% 99.979% 99.984% 99.984% 99.984% 99.984% 99.984% 99.984% 99.984% 99.983% --- Results after 10 passes --- Best: 99.984% -- Worst: 99.979% -- Average: 99.983085% Cummulative Accuracy (not per pass): 99.983 # uptime 21:52:51 up 11:20, 1 user, load average: 0.19, 0.25, 0.10 {noformat} By: Thomas B. Clark (tbclark3) 2014-01-18 20:44:23.588-0600 Thanks Shaun! Good analysis, and you may be right. I just figured out that my fancy new Xeon processor has only two available governors: performance and powersave, and it defaults to powersave. My next window of opportunity will be tomorrow night. I will move the card back into the new server, bind its IRQ to cpu7 and change the governor of cpu7 to performance and limit its idle states. Hopefully it will fix the problem, and only require one cpu to run full blast. By: Thomas B. Clark (tbclark3) 2014-01-19 19:24:03.455-0600 After adding the following to the dahdi startup script: #assign all interrupts from t1xxp to cpu7, and disable idle IRQ=`grep t1xxp /proc/interrupts|awk '{ print $1 }'|sed 's/://'` echo t1xxp is running on IRQ $IRQ echo assigning IRQ $IRQ to cpu7 echo 80 > /proc/irq/$IRQ/smp_affinity echo disabling idle on cpu7 /usr/bin/cpupower -c 7 idle-set -d 5 /usr/bin/cpupower -c 7 idle-set -d 4 /usr/bin/cpupower -c 7 idle-set -d 3 My dahdi_test output now shows: 8192 samples in 8192.056 system clock sample intervals (99.999%) 8192 samples in 8191.559 system clock sample intervals (99.995%) 8192 samples in 8191.992 system clock sample intervals (100.000%) 8192 samples in 8191.873 system clock sample intervals (99.998%) 8192 samples in 8191.840 system clock sample intervals (99.998%) 8192 samples in 8191.720 system clock sample intervals (99.997%) 8192 samples in 8191.864 system clock sample intervals (99.998%)^C --- Results after 7 passes --- Best: 100.000% -- Worst: 99.995% -- Average: 99.997893% Cummulative Accuracy (not per pass): 99.998 Problem solved. Thanks Shaun!! By: Shaun Ruffell (sruffell) 2014-01-20 10:35:22.360-0600 Cool! Also, I see that you are using t1xxp which is probably why you might be having more issues with this since the newer cards don't *need* to always be serviced at 1ms intervals but the older ones do. Just taking a note here: I do think there is still an opportunity for improvement here. I believe the drivers should register their power management quality of service requirements to prevent users from needing to make the same changes you did. The kernel documentation for this interface is: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/power/pm_qos_interface.txt By: Shaun Ruffell (sruffell) 2014-05-21 10:12:43.473-0500 Linking to DAHLIN-269. I think these issues might also be related this recent mailing list discussion: http://thread.gmane.org/gmane.comp.telephony.pbx.asterisk.user/279990 |