[Home]

Summary:DAHLIN-00188: VPMADT032 is crashing after upgarde to 2.3.0
Reporter:Isaac Gal (isaacgal)Labels:
Date Opened:2010-05-05 03:09:37Date Closed:2010-07-25 19:30:45
Priority:MajorRegression?No
Status:Closed/CompleteComponents:wcte12xp
Versions:2.3.0 Frequency of
Occurrence
Related
Issues:
Environment:Attachments:
Description:just finished upgrading from 2.1.04 to 2.3.0 and calls strated to drop from time to time.

I noticed the following in the console log:

wcte12xp 0000:02:08.0: VPMADT032 is reenabled.
wcte12xp 0000:02:08.0: VPMADT032 is non-responsive.  Resetting.
wcte12xp 0000:02:08.0: VPM present and operational (Firmware version 120)
wcte12xp 0000:02:08.0: Error pinging DSP (2)
wcte12xp 0000:02:08.0: Failed to configure the ports.  Retrying.
wcte12xp 0000:02:08.0: VPM present and operational (Firmware version 120)
wcte12xp 0000:02:08.0: VPMADT032 is reenabled.
wcte12xp 0000:02:08.0: VPMADT032 is non-responsive.  Resetting.
wcte12xp 0000:02:08.0: VPM present and operational (Firmware version 120)
wcte12xp 0000:02:08.0: VPMADT032 is reenabled.
wcte12xp 0000:02:08.0: VPMADT032 is non-responsive.  Resetting.
wcte12xp 0000:02:08.0: VPM present and operational (Firmware version 120)
wcte12xp 0000:02:08.0: VPMADT032 is reenabled.
wcte12xp 0000:02:08.0: VPMADT032 is non-responsive.  Resetting.
wcte12xp 0000:02:08.0: VPM present and operational (Firmware version 120)
wcte12xp 0000:02:08.0: VPMADT032 is reenabled.
wcte12xp 0000:02:08.0: VPMADT032 is non-responsive.  Resetting.
wcte12xp 0000:02:08.0: VPM present and operational (Firmware version 120)
wcte12xp 0000:02:08.0: VPMADT032 is reenabled.
wcte12xp 0000:02:08.0: VPMADT032 is non-responsive.  Resetting.
wcte12xp 0000:02:08.0: VPM present and operational (Firmware version 120)
wcte12xp 0000:02:08.0: VPMADT032 is reenabled.
wcte12xp 0000:02:08.0: VPMADT032 is non-responsive.  Resetting.

i guess there is a bug in the 2.3.0 and VPMADT032 that causes it because going back to 2.2.0.2 and this issue doesnt happen anymore eventhough new issues arrived with faxing and echo cancellation, but atleast calls are not dropping anymore.

the VPMADT032 firmware is 120
asterisk 1.4.30

Comments:By: Isaac Gal (isaacgal) 2010-05-05 07:17:35

i also noticed when vpmsupport=0 on dahdi 2.3 issue doesnt happen.
so we can narrow this down to the vpm module support in dahdi 2.3

vpm module totaly functioning on dahdi 2.2.0.2 so this isnt a bad hardware case.


regards

By: grecco (grecco) 2010-05-05 08:47:05

I'm having the same messages after upgrade from version dahdi-linux-2.2.1 ===> dahdi-linux-2.3.0

By: Shaun Ruffell (sruffell) 2010-05-05 11:50:18

I believe I have code in trunk intended to address this. The commits are:

http://svn.asterisk.org/view/dahdi?view=revision&revision=8576

and

http://svn.asterisk.org/view/dahdi?view=revision&revision=8560


Would you be willing to update to the current trunk of dahdi-linux and report your results?

By: Kerin Millar (kerframil) 2010-05-07 15:04:45

I'm experiencing this problem too. I've applied commits 8560, 8575 (as applicable to "drivers/dahdi/voicebus/GpakCust.c") and 8576 to a stock dahdi-2.3.0 base for testing and will report back early next week.

In the meantime, I just wanted to point out something that I noticed in the kernel ring buffer after rebooting:

[    9.248084] wcte12xp 0000:03:02.0: PCI INT A -> GSI 35 (level, low) -> IRQ 35
[   10.537270] wcte12xp 0000:03:02.0: Setting up global serial parameters for E1
[   10.537760] wcte12xp 0000:03:02.0: Found a Wildcard TE122
[   25.379214] wcte12xp 0000:03:02.0: Missed interrupt. Increasing latency to 9 ms in order to compensate.
[   25.555003] wcte12xp 0000:03:02.0: Missed interrupt. Increasing latency to 15 ms in order to compensate.
[   25.693790] wcte12xp 0000:03:02.0: Missed interrupt. Increasing latency to 21 ms in order to compensate.
[   25.696741] wcte12xp 0000:03:02.0: Host failed to service card interrupt within 128 ms which is a hardunderun.
[   25.733952] wcte12xp 0000:03:02.0: Host failed to service card interrupt within 128 ms which is a hardunderun.
[   32.206230] wcte12xp 0000:03:02.0: Booting VPMADT032
[   32.561291] wcte12xp 0000:03:02.0: VPM present and operational (Firmware version 120)
[   55.362964] wcte12xp 0000:03:02.0: Span configured for CCS/HDB3/CRC4

The messages with the timestamps 25.696741 and 25.733952 complain of failing to service a card interrupt - a message which I have never seen before. By contrast, here's an excerpt from the prior boot:

[    9.221907] wcte12xp 0000:03:02.0: PCI INT A -> GSI 35 (level, low) -> IRQ 35
[   10.526093] wcte12xp 0000:03:02.0: Setting up global serial parameters for E1
[   10.528207] wcte12xp 0000:03:02.0: Found a Wildcard TE122
[   25.154052] wcte12xp 0000:03:02.0: Missed interrupt. Increasing latency to 9 ms in order to compensate.
[   25.314814] wcte12xp 0000:03:02.0: Missed interrupt. Increasing latency to 15 ms in order to compensate.
[   25.430977] wcte12xp 0000:03:02.0: Missed interrupt. Increasing latency to 21 ms in order to compensate.
[   31.926671] wcte12xp 0000:03:02.0: Booting VPMADT032
[   32.281685] wcte12xp 0000:03:02.0: VPM present and operational (Firmware version 120)

The only change made was the addition of the aforementioned three patches. I don't know if this is significant, but figured that I should mention it.

By: Kerin Millar (kerframil) 2010-05-10 14:32:50

One working day elapsed and over 449 calls outbound alone, notching up almost 12 hours of call time. At this point, I would have expected the "VPMADT032 is non-responsive" complaint to have arisen up to maybe a dozen times but it has not. Tentatively, I'd say that the aformentioned patches seem to have addressed the bug. The only thing that seems odd compared to the unpatched instance is the hardunderrun warning that occurs during initialisation, as mentioned in the previous comment.

By: Tom Chadwin (tomchadwin) 2010-05-17 10:05:58

I have two 2.3.0 boxes. On one, I have:

dahdi: Telephony Interface Registered on major 196
dahdi: Version: 2.3.0
wctdm24xxp 0000:00:0e.0: Port 1: Installed -- AUTO FXO (FCC mode)
wctdm24xxp 0000:00:0e.0: Port 2: Not installed
wctdm24xxp 0000:00:0e.0: Port 3: Not installed
wctdm24xxp 0000:00:0e.0: Port 4: Not installed
wctdm24xxp 0000:00:0e.0: VPM100: Not Present
wctdm24xxp 0000:00:0e.0: Host failed to service card interrupt within 128 ms which is a hardunderun.
wctdm24xxp 0000:00:0e.0: Booting VPMADT032
wctdm24xxp 0000:00:0e.0: VPM present and operational (Firmware version 120)
wctdm24xxp 0000:00:0e.0: Found a Wildcard TDM: Wildcard TDM410P (0 digital modules, 1 analog module)
wctdm24xxp 0000:00:0e.0: Missed interrupt. Increasing latency to 5 ms in order to compensate.
wctdm24xxp 0000:00:0e.0: Missed interrupt. Increasing latency to 11 ms in order to compensate.
wctdm24xxp 0000:00:0e.0: Missed interrupt. Increasing latency to 17 ms in order to compensate.
wctdm24xxp 0000:00:0e.0: Missed interrupt. Increasing latency to 23 ms in order to compensate.
wctdm24xxp 0000:00:0e.0: ERROR: Unable to service card within 25 ms and unable to further increase latency.
wctdm24xxp 0000:00:0e.0: ERROR: Unable to service card within 25 ms and unable to further increase latency.
wctdm24xxp 0000:00:0e.0: ERROR: Unable to service card within 25 ms and unable to further increase latency.
wctdm24xxp 0000:00:0e.0: ERROR: Unable to service card within 25 ms and unable to further increase latency.
wctdm24xxp 0000:00:0e.0: ERROR: Unable to service card within 25 ms and unable to further increase latency.
wctdm24xxp 0000:00:0e.0: ERROR: Unable to service card within 25 ms and unable to further increase latency.
dahdi: Registered tone zone 4 (United Kingdom)
wctdm24xxp 0000:00:0e.0: ERROR: Unable to service card within 25 ms and unable to further increase latency.

This seems similar to my previously working 2.2.0.2 (2.2.1 crashed Asterisk in my set-up). The other box, however, has this:

dahdi: Telephony Interface Registered on major 196
dahdi: Version: 2.3.0
wctdm24xxp 0000:00:0e.0: Port 1: Installed -- AUTO FXO (FCC mode)
wctdm24xxp 0000:00:0e.0: Port 2: Installed -- AUTO FXO (FCC mode)
wctdm24xxp 0000:00:0e.0: Port 3: Not installed
wctdm24xxp 0000:00:0e.0: Port 4: Not installed
wctdm24xxp 0000:00:0e.0: VPM100: Not Present
wctdm24xxp 0000:00:0e.0: Host failed to service card interrupt within 128 ms which is a hardunderun.
wctdm24xxp 0000:00:0e.0: Booting VPMADT032
wctdm24xxp 0000:00:0e.0: VPMADT032 Failed! Unable to ping the DSP (2)!
wctdm24xxp 0000:00:0e.0: Found a Wildcard TDM: Wildcard TDM410P (0 digital modules, 2 analog modules)
wctdm24xxp 0000:00:0e.0: Missed interrupt. Increasing latency to 5 ms in order to compensate.
wctdm24xxp 0000:00:0e.0: Host failed to service card interrupt within 128 ms which is a hardunderun.
dahdi_echocan_mg2: Registered echo canceler 'MG2'
wctdm24xxp 0000:00:0e.0: Host failed to service card interrupt within 128 ms which is a hardunderun.
dahdi: Registered tone zone 4 (United Kingdom)
eth0: link up, 100Mbps, full-duplex, lpa 0xC5E1
wctdm24xxp 0000:00:0e.0: Missed interrupt. Increasing latency to 11 ms in order to compensate.
wctdm24xxp 0000:00:0e.0: Missed interrupt. Increasing latency to 17 ms in order to compensate.
wctdm24xxp 0000:00:0e.0: Missed interrupt. Increasing latency to 23 ms in order to compensate.
wctdm24xxp 0000:00:0e.0: ERROR: Unable to service card within 25 ms and unable to further increase latency.
wctdm24xxp 0000:00:0e.0: ERROR: Unable to service card within 25 ms and unable to further increase latency.
wctdm24xxp 0000:00:0e.0: ERROR: Unable to service card within 25 ms and unable to further increase latency.
wctdm24xxp 0000:00:0e.0: ERROR: Unable to service card within 25 ms and unable to further increase latency.
wctdm24xxp 0000:00:0e.0: ERROR: Unable to service card within 25 ms and unable to further increase latency.
wctdm24xxp 0000:00:0e.0: ERROR: Unable to service card within 25 ms and unable to further increase latency.
wctdm24xxp 0000:00:0e.0: ERROR: Unable to service card within 25 ms and unable to further increase latency.

The VPMADT032 fails to boot. Ignore the MG2 line - trying software e/c on this box, as I now have terrible echo without the hw e/c.

Is this related? Or do I just have one duff VPMADT032?

By: Shaun Ruffell (sruffell) 2010-05-21 13:03:54

tomchadwin:  That may be related.  I would update to trunk and still see if you have a problem.

However, I would be concerned about the latency growing to 25ms.  Typically, that means something on your system is holding off interrupts (slow disk controller, poorly implemented frame buffer, real slow serial console, etc..) and will all cause data integrity problems on the channel.

By: Alec Davis (alecdavis) 2010-06-16 04:52:18

I'm sure it's not the VPMADT032 module, as I've removed mine and still get the missed interrupts - 100 seconds after boot.

Hardware is a TE122 and a TDM800P.

Jun 16 21:42:41 asterix kernel: [   30.459448] dahdi: Telephony Interface Registered on major 196
Jun 16 21:42:41 asterix kernel: [   30.459448] dahdi: Version: SVN-trunk-r8762
Jun 16 21:42:41 asterix kernel: [   30.511672] ACPI: PCI Interrupt 0000:05:09.0[A] -> GSI 18 (level, low) -> IRQ 18
Jun 16 21:42:43 asterix kernel: [   32.981678] wcte12xp 0000:05:09.0: Setting up global serial parameters for E1
Jun 16 21:42:43 asterix kernel: [   32.982286] wcte12xp 0000:05:09.0: Found a Wildcard TE122
Jun 16 21:42:43 asterix kernel: [   33.043783] ACPI: PCI Interrupt 0000:05:04.0[A] -> GSI 16 (level, low) -> IRQ 16
Jun 16 21:42:46 asterix kernel: [   36.725403] wctdm24xxp 0000:05:04.0: Boosting ringer on slot 1 (89V peak)
Jun 16 21:42:46 asterix kernel: [   36.725462] wctdm24xxp 0000:05:04.0: Port 1: Installed -- AUTO FXS/DPO
Jun 16 21:42:48 asterix kernel: [   38.932572] wctdm24xxp 0000:05:04.0: Boosting ringer on slot 2 (89V peak)
Jun 16 21:42:48 asterix kernel: [   38.932572] wctdm24xxp 0000:05:04.0: Port 2: Installed -- AUTO FXS/DPO
Jun 16 21:42:50 asterix kernel: [   40.982725] wctdm24xxp 0000:05:04.0: Boosting ringer on slot 3 (89V peak)
Jun 16 21:42:50 asterix kernel: [   40.982725] wctdm24xxp 0000:05:04.0: Port 3: Installed -- AUTO FXS/DPO
Jun 16 21:42:51 asterix kernel: [   43.009703] wctdm24xxp 0000:05:04.0: Boosting ringer on slot 4 (89V peak)
Jun 16 21:42:51 asterix kernel: [   43.009703] wctdm24xxp 0000:05:04.0: Port 4: Installed -- AUTO FXS/DPO
Jun 16 21:42:52 asterix kernel: [   43.950906] wctdm24xxp 0000:05:04.0: Port 5: Installed -- AUTO FXO (FCC mode)
Jun 16 21:42:53 asterix kernel: [   44.771029] wctdm24xxp 0000:05:04.0: Port 6: Installed -- AUTO FXO (FCC mode)
Jun 16 21:42:53 asterix kernel: [   45.075401] wctdm24xxp 0000:05:04.0: Port 7: Not installed
Jun 16 21:42:53 asterix kernel: [   45.375994] wctdm24xxp 0000:05:04.0: Port 8: Not installed
Jun 16 21:42:53 asterix kernel: [   45.375994] wctdm24xxp 0000:05:04.0: ALAW override parameter detected.  Device will be operating in ALAW
Jun 16 21:42:53 asterix kernel: [   45.379994] wctdm24xxp 0000:05:04.0: VPM100: Not Present
Jun 16 21:42:53 asterix kernel: [   45.385776] wctdm24xxp 0000:05:04.0: Found a Wildcard TDM: Wildcard TDM800P (0 digital modules, 6 analog mo                          dules)
Jun 16 21:42:53 asterix kernel: [   45.445176] dahdi: Registered tone zone 10 (New Zealand)
Jun 16 21:42:53 asterix kernel: [   45.449910] wcte12xp 0000:05:09.0: Span configured for CCS/HDB3/CRC4
Jun 16 21:43:48 asterix kernel: [  100.616182] wctdm24xxp 0000:05:04.0: Missed interrupt. Increasing latency to 5 ms in order to compensate.
Jun 16 21:44:03 asterix kernel: [  115.958815] wctdm24xxp 0000:05:04.0: Missed interrupt. Increasing latency to 11 ms in order to compensate.
Jun 16 21:44:03 asterix kernel: [  115.958815] wcte12xp 0000:05:09.0: Missed interrupt. Increasing latency to 9 ms in order to compensate.
Jun 16 21:44:03 asterix kernel: [  115.958815] wctdm24xxp 0000:05:04.0: Missed interrupt. Increasing latency to 16 ms in order to compensate.
Jun 16 21:44:03 asterix kernel: [  115.958815] wcte12xp 0000:05:09.0: Missed interrupt. Increasing latency to 15 ms in order to compensate.
Jun 16 21:44:06 asterix kernel: [  120.414604] wctdm24xxp 0000:05:04.0: Missed interrupt. Increasing latency to 19 ms in order to compensate.
Jun 16 21:44:06 asterix kernel: [  120.414604] wcte12xp 0000:05:09.0: Missed interrupt. Increasing latency to 19 ms in order to compensate.
Jun 16 21:44:06 asterix kernel: [  120.430858] wctdm24xxp 0000:05:04.0: ERROR: Unable to service card within 25 ms and unable to further incre                          ase latency.
Jun 16 21:44:06 asterix kernel: [  120.431124] wcte12xp 0000:05:09.0: ERROR: Unable to service card within 25 ms and unable to further increas                          e latency.
Jun 16 21:44:12 asterix kernel: [  126.476151] wcte12xp 0000:05:09.0: ERROR: Unable to service card within 25 ms and unable to further increas                          e latency.
Jun 16 21:44:12 asterix kernel: [  126.476366] wctdm24xxp 0000:05:04.0: ERROR: Unable to service card within 25 ms and unable to further incre                          ase latency.
Jun 16 21:44:12 asterix kernel: [  126.490280] wcte12xp 0000:05:09.0: ERROR: Unable to service card within 25 ms and unable to further increas                          e latency.

By: Alec Davis (alecdavis) 2010-06-16 05:00:08

Previously I'd swapped the TE122 for a TE110P (just to keep configurations happy), and left the VPMADT032 on the TDM800P, all was well, never failed.

So have TE110P + TDM800P (with VPMADT032)

dmesg and /var/log showed no problems, had been running for 29 hours, before reload.

Jun 16 21:29:28 asterix kernel: [107407.150323] dahdi: Telephony Interface Registered on major 196
Jun 16 21:29:28 asterix kernel: [107407.150323] dahdi: Version: SVN-trunk-r8762
Jun 16 21:29:28 asterix kernel: [107407.172208] FALC version: 00000000
Jun 16 21:29:28 asterix kernel: [107407.172275] TE110P: Setting up global serial parameters for E1 FALC V1.2
Jun 16 21:29:28 asterix kernel: [107407.172332] TE110P: Successfully initialized serial bus for card
Jun 16 21:29:28 asterix kernel: [107407.172957] Found a Wildcard: Digium Wildcard TE110P T1/E1
Jun 16 21:29:28 asterix kernel: [107407.187556] ACPI: PCI Interrupt 0000:05:04.0[A] -> GSI 16 (level, low) ->                                  IRQ 16
Jun 16 21:29:30 asterix kernel: [107409.315987] wctdm24xxp 0000:05:04.0: Boosting ringer on slot 1 (89V peak)
Jun 16 21:29:30 asterix kernel: [107409.316016] wctdm24xxp 0000:05:04.0: Port 1: Installed -- AUTO FXS/DPO
Jun 16 21:29:31 asterix kernel: [107411.059664] wctdm24xxp 0000:05:04.0: Boosting ringer on slot 2 (89V peak)
Jun 16 21:29:31 asterix kernel: [107411.059692] wctdm24xxp 0000:05:04.0: Port 2: Installed -- AUTO FXS/DPO
Jun 16 21:29:33 asterix kernel: [107412.802708] wctdm24xxp 0000:05:04.0: Boosting ringer on slot 3 (89V peak)
Jun 16 21:29:33 asterix kernel: [107412.802737] wctdm24xxp 0000:05:04.0: Port 3: Installed -- AUTO FXS/DPO
Jun 16 21:29:35 asterix kernel: [107414.543050] wctdm24xxp 0000:05:04.0: Boosting ringer on slot 4 (89V peak)
Jun 16 21:29:35 asterix kernel: [107414.543079] wctdm24xxp 0000:05:04.0: Port 4: Installed -- AUTO FXS/DPO
Jun 16 21:29:35 asterix kernel: [107415.062874] wctdm24xxp 0000:05:04.0: Port 5: Installed -- AUTO FXO (FCC m                                 ode)
Jun 16 21:29:36 asterix kernel: [107415.884411] wctdm24xxp 0000:05:04.0: Port 6: Installed -- AUTO FXO (FCC m                                 ode)
Jun 16 21:29:36 asterix kernel: [107416.184662] wctdm24xxp 0000:05:04.0: Port 7: Not installed
Jun 16 21:29:37 asterix kernel: [107416.484663] wctdm24xxp 0000:05:04.0: Port 8: Not installed
Jun 16 21:29:37 asterix kernel: [107416.484663] wctdm24xxp 0000:05:04.0: ALAW override parameter detected.  D                                 evice will be operating in ALAW
Jun 16 21:29:37 asterix kernel: [107416.488663] wctdm24xxp 0000:05:04.0: VPM100: Not Present
Jun 16 21:29:42 asterix kernel: [107421.438156] wctdm24xxp 0000:05:04.0: Booting VPMADT032
Jun 16 21:29:42 asterix kernel: [107421.503174] wctdm24xxp 0000:05:04.0: VPM present and operational (Firmwar                                 e version 120)
Jun 16 21:29:44 asterix kernel: [107424.135991] wctdm24xxp 0000:05:04.0: Found a Wildcard TDM: Wildcard TDM80                                 0P (0 digital modules, 6 analog modules)
Jun 16 21:29:44 asterix kernel: [107424.144000] dahdi: Registered tone zone 10 (New Zealand)
Jun 16 21:29:44 asterix kernel: [107424.144000] TE110P: Span configured for CCS/HDB3/CRC4

By: Shaun Ruffell (sruffell) 2010-06-23 15:42:23

alecdavis:  In light of the fact that you could just swap out the TE122 for a TE110, this patch doesn't make sense.  But I did try to reproduce what you reported here without luck.

Just for a data point, could you try the http://svn.asterisk.org/svn/dahdi/linux/team/sruffell/dahdi-linux-wcte12xp-latency branch and see if you still have the issues with latency growing?

By: Alec Davis (alecdavis) 2010-06-23 21:42:46

using the branch specified in ~123800 I am now not seeing the missed interrupt's.
As the behaviour of the voicebus interrupt is like it was in dahdi 2.2, all processing done at interrupt time, not deffered to a tasklet.

Still, have both TE122 and TDM800P (with 1x 4port FXS, 2x 1port FXO's and VPMADT032) installed.



By: Digium Subversion (svnbot) 2010-07-25 19:30:43

Repository: dahdi
Revision: 8982

U   linux/trunk/drivers/dahdi/voicebus/voicebus.c
U   linux/trunk/drivers/dahdi/voicebus/voicebus.h

------------------------------------------------------------------------
r8982 | sruffell | 2010-07-25 19:30:42 -0500 (Sun, 25 Jul 2010) | 14 lines

wcte12xp, wctdm24xxp: Return buffer processing to interrupt handler.

In revision 8095, I had moved most of the buffer processing out of the
interrupt handler and into a tasklet. The intended result was to enable
multiple cards to interleave with one another.  But once again I was
bitten by the fact that there are some systems that for one reason or
another do not process their tasklets in a timely enough manner for the
real-time nature of TDM processing.  This commit moves this processing
back into the interrupt handler by default.  It also limits the number
of frames that the interrupt handler will process at any given time
which appears to achieve the same intended result.

(closes issue DAHLIN-188)
Tested by: alecdavis
------------------------------------------------------------------------

http://svn.digium.com/view/dahdi?view=rev&revision=8982