[Home]

Summary:DAHLIN-00204: [patch] Race condition in handling writes to proslic LINEFEED register
Reporter:Alec Davis (alecdavis)Labels:
Date Opened:2010-07-27 06:32:36Date Closed:2010-07-30 13:48:27
Priority:MinorRegression?No
Status:Closed/CompleteComponents:wctdm24xxp
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) wctdm24xxp_fxs_answer.diff.txt
( 1) wctdm24xxp_fxs_answer.diff2.txt
( 2) wctdm24xxp_fxs_answer.diff3.txt
( 3) wctdm24xxp_fxs_answer.diff4.txt
Description:set_lasttxhook_interruptible times out waiting for SLIC_LF_OPPENDING on  fxs->lasttxhook to go.

Even changing the timeout in set_lasttxhook_interruptible to 2 Seconds, the bottom half of <b>if (SLIC_LF_OPPENDING & fxs->lasttxhook)  {</b> randomly get executed.





****** ADDITIONAL INFORMATION ******

This has become apparent with the recent FXS anwser/hangup polarityswitch changes. refer to https://issues.asterisk.org/view.php?id=17318

It can be seen by ringing an FXS port and monitoring the actual line reversal as you answer, some times the line reverses as expected, other time not.

Comments:By: Alec Davis (alecdavis) 2010-07-27 06:38:36

uploaded wctdm24xxp_fxs_answer.diff.txt which has the debug printing.

No fixes yet :(

example: /var/log/kern.log output just as answering the ringing phone.

Jul 27 23:36:04 asterix kernel: [502125.649772] wctdm24xxp 0000:05:04.0: Setting FXS hook state to 0 (00)
Jul 27 23:36:05 asterix kernel: [502126.274793] wctdm24xxp 0000:05:04.0: Setting FXS hook state to 2 (00)
Jul 27 23:36:05 asterix kernel: [502126.674785] wctdm24xxp 0000:05:04.0: Setting FXS hook state to 1 (00)
Jul 27 23:36:05 asterix kernel: [502126.963805] wctdm24xxp 0000:05:04.0: Setting FXS hook state to 2 (00)
Jul 27 23:36:06 asterix kernel: [502127.565772] wctdm24xxp 0000:05:04.0: Setting FXS hook state to 1 (00)
Jul 27 23:36:06 asterix kernel: [502127.600222] wctdm24xxp 0000:05:04.0: wctdm: Card 3 Going off hook
Jul 27 23:36:07 asterix kernel: [502128.856725] wctdm24xxp 0000:05:04.0: Channel 3 TIMEOUT: Set Reverse Polarity

By: Alec Davis (alecdavis) 2010-07-28 07:00:30

uploaded wctdm24xxp_fxs_answer.diff2.txt which has minimal debug printing.

Works around the current problem, where a Proslic LF register is not set correctly (or is overwritten by a successive write), by forcibly rewriting it.

In addition I noticed a click after 3 seconds of answering the call on the FXS port. This is the ohttimer expiring, and messing with the line. Now if the ohttimer expires and we are offhook, well just leave the lines as is, now same as the TDM400 (wctdm.c).

debug output:
<b>Jul 28 23:44:08 asterix kernel: [10477.181154] wctdm24xxp 0000:05:04.0: Setting FXS hook state to 1 (00)
Jul 28 23:44:09 asterix kernel: [10477.381995] wctdm24xxp 0000:05:04.0: Setting FXS hook state to 2 (00)
Jul 28 23:44:09 asterix kernel: [10477.433995] wctdm24xxp 0000:05:04.0: SLIC_LF OK: intcount=65536 channel=3 shadow=44 lasttxhook=04</b>
Jul 28 23:44:09 asterix kernel: [10477.784457] wctdm24xxp 0000:05:04.0: Setting FXS hook state to 1 (00)
Jul 28 23:44:09 asterix kernel: [10477.948457] wctdm24xxp 0000:05:04.0: SLIC_LF OK: intcount=66048 channel=3 shadow=22 lasttxhook=02
Jul 28 23:44:11 asterix kernel: [10479.792862] wctdm24xxp 0000:05:04.0: wctdm: Card 3 Going off hook
Jul 28 23:44:11 asterix kernel: [10479.792862] wctdm24xxp 0000:05:04.0: Channel 3 OnHookTransfer abort
Jul 28 23:44:11 asterix kernel: [10479.792862] wctdm24xxp 0000:05:04.0: Channel 3 Set Reverse Polarity
Jul 28 23:44:11 asterix kernel: [10479.885550] wctdm24xxp 0000:05:04.0: wctdm: Card 3 Going on hook
Jul 28 23:44:11 asterix kernel: [10479.957470] wctdm24xxp 0000:05:04.0: wctdm: Card 3 Going off hook
Jul 28 23:44:11 asterix kernel: [10480.072591] wctdm24xxp 0000:05:04.0: SLIC_LF RETRY: intcount=67840 channel=3 shadow=55 lasttxhook=16
Jul 28 23:44:11 asterix kernel: [10480.331673] wctdm24xxp 0000:05:04.0: SLIC_LF OK: intcount=68096 channel=3 shadow=66 lasttxhook=06
Jul 28 23:44:14 asterix kernel: [10483.293787] wctdm24xxp 0000:05:04.0: wctdm: Card 3 Going on hook
Jul 28 23:44:15 asterix kernel: [10484.546109] wctdm24xxp 0000:05:04.0: Channel 3 Set Normal Polarity
Jul 28 23:44:15 asterix kernel: [10484.546109] wctdm24xxp 0000:05:04.0: Setting FXS hook state to 0 (00)
Jul 28 23:44:15 asterix kernel: [10484.678337] wctdm24xxp 0000:05:04.0: SLIC_LF OK: intcount=71936 channel=3 shadow=11 lasttxhook=01
Jul 28 23:44:19 asterix kernel: [10488.960417] wctdm24xxp 0000:05:04.0: Setting VMWI on channel 3, messages=0, lrev=0
Jul 28 23:44:19 asterix kernel: [10489.194835] wctdm24xxp 0000:05:04.0: SLIC_LF OK: intcount=76032 channel=3 shadow=11 lasttxhook=01
Jul 28 23:44:19 asterix kernel: [10489.595981] wctdm24xxp 0000:05:04.0: SLIC_LF OK: intcount=76288 channel=3 shadow=22 lasttxhook=02
Jul 28 23:44:22 asterix kernel: [10492.627399] wctdm24xxp 0000:05:04.0: Channel 3 OnHookTransfer stop
Jul 28 23:44:22 asterix kernel: [10492.662574] wctdm24xxp 0000:05:04.0: SLIC_LF OK: intcount=78848 channel=3 shadow=11 lasttxhook=01

The bolded lines above, still shows that the sanity check is missing the odd LF setting.



By: Alec Davis (alecdavis) 2010-07-29 05:18:42

uploaded wctdm24xxp_fxs_answer.diff3.txt
Self contained patch against DAHDI trunk, ignore previous patches, they are all included.

<b><u>Problem:</u></b>
'wctdm_proslic_recheck_sanity()' is executed (4 times??) every ~256ms which monitors and clears the SLIC_LF_OPPENDING on fxs->lasttxhook when the LF state has been reached.
 
'set_lasttxhook_interruptible()' times out after 100ms waiting from the previous write to the SLIC_LF register, by waiting for SLIC_LF_OPPENDING to be cleared, thus the write will fail.

<b><u>Solution:</u></b>
The monitoring and clearing of SLIC_LF_OPPENDING needs to be more frequently than 256ms, like every 1ms.

This patch adds wctdm_proslic_check_oppending which does the monitoring and clearing of SLIC_LF_OPPENDING, which is execute every 1ms. It also if the correct state is not reached within 100ms, resends the lasttxhook state.

<b><u>Testing:</u></b>
Now with a TDM800P, I never get a retry as I was with diff2 patch, and with the additional debug, printing the intcount, can see that the correct LF state is reached within a few ms.

<b><u>debug output:</u></b><pre>[101823.408182] wctdm24xxp 0000:05:04.0: Setting FXS hook state to 2 (04) intcount=92596
[101823.417951] wctdm24xxp 0000:05:04.0: SLIC_LF OK: card=3 shadow=24 lasttxhook=04 intcount=92602
[101823.809558] wctdm24xxp 0000:05:04.0: Setting FXS hook state to 1 (02) intcount=92996
[101823.817951] wctdm24xxp 0000:05:04.0: SLIC_LF OK: card=3 shadow=42 lasttxhook=02 intcount=93002
[101824.010533] wctdm24xxp 0000:05:04.0: Setting FXS hook state to 2 (04) intcount=93196
[101824.026535] wctdm24xxp 0000:05:04.0: SLIC_LF OK: card=3 shadow=14 lasttxhook=04 intcount=93202
[101824.030194] wctdm24xxp 0000:05:04.0: wctdm: Card 3 Going off hook
[101824.031335] wctdm24xxp 0000:05:04.0: Setting FXS hook state to 1 (02) intcount=93207
[101824.031335] wctdm24xxp 0000:05:04.0: Channel 3 OnHookTransfer abort
[101824.039336] wctdm24xxp 0000:05:04.0: SLIC_LF OK: card=3 shadow=42 lasttxhook=02 intcount=93214
[101824.044172] wctdm24xxp 0000:05:04.0: wctdm: Card 3 Going on hook
[101824.048170] wctdm24xxp 0000:05:04.0: Channel 3 Set Reverse Polarity
[101824.052172] wctdm24xxp 0000:05:04.0: SLIC_LF OK: card=3 shadow=66 lasttxhook=06 intcount=93226
[101824.196034] wctdm24xxp 0000:05:04.0: wctdm: Card 3 Going off hook
[101864.205656] wctdm24xxp 0000:05:04.0: wctdm: Card 3 Going on hook
[101865.481320] wctdm24xxp 0000:05:04.0: Channel 3 Set Normal Polarity
[101865.485320] wctdm24xxp 0000:05:04.0: Setting FXS hook state to 0 (01) intcount=129293
[101865.493326] wctdm24xxp 0000:05:04.0: SLIC_LF OK: card=3 shadow=11 lasttxhook=01 intcount=129298
[101871.233452] wctdm24xxp 0000:05:04.0: Setting VMWI on channel 3, messages=0, lrev=0
[101871.233452] wctdm24xxp 0000:05:04.0: SLIC_LF OK: card=3 shadow=11 lasttxhook=01 intcount=133309
[101871.551413] wctdm24xxp 0000:05:04.0: Channel 3 OnHookTransfer start
[101871.559973] wctdm24xxp 0000:05:04.0: SLIC_LF OK: card=3 shadow=22 lasttxhook=02 intcount=133322
[101874.666063] wctdm24xxp 0000:05:04.0: Channel 3 OnHookTransfer stop
[101874.692390] wctdm24xxp 0000:05:04.0: SLIC_LF OK: card=3 shadow=11 lasttxhook=01 intcount=136314
</pre>



By: Shaun Ruffell (sruffell) 2010-07-29 12:58:34

Cool.  Patch passes my tests here as well.  One thing I did notice is that the CPU overhead for the interrupt handler increased by 1.5% (add about 1.5us to the processing time) on my debug kernel.  Probably means on a production kernel it wouldn't be that large of a change, but could the OPPENDING bit be checked at a slower rate than every millisecond?  But otherwise I think the checkpatch.pl errors could be cleaned up and it committed.

By: Alec Davis (alecdavis) 2010-07-29 13:18:28

[101824.010533] wctdm24xxp 0000:05:04.0: Setting FXS hook state to 2 (04) intcount=93196
[101824.026535] wctdm24xxp 0000:05:04.0: SLIC_LF OK: card=3 shadow=14 lasttxhook=04 intcount=93202
[101824.030194] wctdm24xxp 0000:05:04.0: wctdm: Card 3 Going off hook
[101824.031335] wctdm24xxp 0000:05:04.0: Setting FXS hook state to 1 (02) intcount=93207

The above SLIC_LF times between Setting from 2 to 1 when going off hook, is 93196 -> 93207 = 11ms, so I'd think there's not too much room to increase.

I was however thinking, if the 1ms hook detector ran on 'odd' intcounts, and the oppending check ran on 'even' intcounts, the interrupt load would be distrubuted.

By: Digium Subversion (svnbot) 2010-07-30 13:48:26

Repository: dahdi
Revision: 9063

U   linux/trunk/drivers/dahdi/wctdm24xxp/base.c
U   linux/trunk/drivers/dahdi/wctdm24xxp/wctdm24xxp.h

------------------------------------------------------------------------
r9063 | alecdavis | 2010-07-30 13:48:26 -0500 (Fri, 30 Jul 2010) | 19 lines

SLIC_LF_OPPENDING confict between wctdm_proslic_recheck_sanity() and set_lasttxhook_interruptible()

Problem:

'wctdm_proslic_recheck_sanity()' is executed (4 times??) every ~256ms which monitors and clears the SLIC_LF_OPPENDING on fxs->lasttxhook when the LF state has been reached.
 
'set_lasttxhook_interruptible()' times out after 100ms waiting from the previous write to the SLIC_LF register, by waiting for SLIC_LF_OPPENDING to be cleared, thus the write will fail.

Solution:
adds wctdm_proslic_check_oppending which does the monitoring and clearing of SLIC_LF_OPPENDING, which is execute every 1ms. It also if the correct state is not reached within 100ms, resends the lasttxhook state.


(closes issue DAHLIN-204)
Reported by: alecdavis
Patches:
     wctdm24xxp_fxs_answer.diff4.txt uploaded by alecdavis (license 585)
Tested by: alecdavis, sruffell


------------------------------------------------------------------------

http://svn.digium.com/view/dahdi?view=rev&revision=9063