[Home]

Summary:DAHLIN-00241: Tone Dial Fails After Sleep
Reporter:Vladimir Mikhelson (vmikhelson)Labels:
Date Opened:2011-05-13 23:02:49Date Closed:2019-05-31 09:57:02
Priority:MajorRegression?No
Status:Closed/CompleteComponents:wctdm wctdm24xxp
Versions:2.4.1.2 2.6.0 Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 0001-Lock-all-memory-pages-to-prevent-delays-on-incoming-.patch
( 1) 0001-Lock-all-pages-when-running-with-elevated-priorities.patch
Description:After an analog phone was not used for a while first number dialed fails due to incorrect DTMF detection. If the same number is redialed immediately after the failure the call goes through with no problems.

Tested on Panasonic KX-T3175 and on Panasonic KX-TGA4000.

Asterisk 1.8.4, 1.8.11.0
FreePBX 2.9.0.1, 2.10.0.4


****** ADDITIONAL INFORMATION ******

The problem started manifesting itself at the time I switched to Asterisk 1.8.x. Never had this problem with all versions of Asterisk 1.6.x.

I observe some 'A' and 'D' DTMF detections in the logs.  I decided to show the following one as it was detected in the close proximity of the DTMF detection failure:
[May 13 21:28:37] DTMF[7712] channel.c: DTMF begin 'A' received on DAHDI/2-1
[May 13 21:28:37] DTMF[7712] channel.c: DTMF begin passthrough 'A' on DAHDI/2-1
[May 13 21:28:37] DTMF[7712] channel.c: DTMF end 'A' received on DAHDI/2-1, duration 0 ms
[May 13 21:28:37] DTMF[7712] channel.c: DTMF end accepted with begin 'A' on DAHDI/2-1
[May 13 21:28:37] DTMF[7712] channel.c: DTMF end passthrough 'A' on DAHDI/2-1

First attempt to dial "432":
[May 13 21:47:17] DTMF[10865] channel.c: DTMF begin '4' received on DAHDI/5-1
[May 13 21:47:17] DTMF[10865] channel.c: DTMF begin ignored '4' on DAHDI/5-1
[May 13 21:47:17] DTMF[10865] channel.c: DTMF end '4' received on DAHDI/5-1, duration 89 ms
[May 13 21:47:17] DTMF[10865] channel.c: DTMF end passthrough '4' on DAHDI/5-1

Second attempt to dial "432"
[May 13 21:48:28] DTMF[10948] channel.c: DTMF begin '4' received on DAHDI/5-1
[May 13 21:48:28] DTMF[10948] channel.c: DTMF begin ignored '4' on DAHDI/5-1
[May 13 21:48:28] DTMF[10948] channel.c: DTMF end '4' received on DAHDI/5-1, duration 89 ms
[May 13 21:48:28] DTMF[10948] channel.c: DTMF end passthrough '4' on DAHDI/5-1
[May 13 21:48:29] DTMF[10948] channel.c: DTMF begin '3' received on DAHDI/5-1
[May 13 21:48:29] DTMF[10948] channel.c: DTMF begin ignored '3' on DAHDI/5-1
[May 13 21:48:29] DTMF[10948] channel.c: DTMF end '3' received on DAHDI/5-1, duration 89 ms
[May 13 21:48:29] DTMF[10948] channel.c: DTMF end passthrough '3' on DAHDI/5-1
[May 13 21:48:29] DTMF[10948] channel.c: DTMF begin '2' received on DAHDI/5-1
[May 13 21:48:29] DTMF[10948] channel.c: DTMF begin ignored '2' on DAHDI/5-1
[May 13 21:48:29] DTMF[10948] channel.c: DTMF end '2' received on DAHDI/5-1, duration 89 ms
[May 13 21:48:29] DTMF[10948] channel.c: DTMF end passthrough '2' on DAHDI/5-1



[root@pbx 1.8.4]# dahdi_hardware
pci:0000:00:0e.0     wctdm+       e159:0001 Wildcard TDM400P REV I
pci:0000:00:10.0     wctdm24xxp+  d161:8005 Wildcard TDM410P

[root@pbx 1.8.4]# dahdi_scan
[1]
active=yes
alarms=OK
description=Wildcard TDM410P Board 1
name=WCTDM/0
manufacturer=Digium
devicetype=Wildcard TDM410P
location=PCI Bus 00 Slot 17
basechan=1
totchans=4
irq=9
type=analog
port=1,FXO
port=2,FXO
port=3,FXO
port=4,none
[2]
active=yes
alarms=OK
description=Wildcard TDM400P REV I Board 5
name=WCTDM/4
manufacturer=Digium
devicetype=Wildcard TDM400P REV I
location=PCI Bus 00 Slot 15
basechan=5
totchans=4
irq=10
type=analog
port=5,FXS
port=6,FXS
port=7,FXS
port=8,FXS
Comments:By: Shaun Ruffell (sruffell) 2012-04-04 12:55:17.631-0500

I'm sweeping through the old issues. I believe this is not a problem in the driver, but is most likely an issue with the server needing to page back in code to handle to handle the FXS port up in user space.

If there is still an issue or you believe I'm mistaken, please feel free to reopen.

By: Vladimir Mikhelson (vmikhelson) 2012-04-04 18:19:45.866-0500

Shaun,

It is still an issue.  I have tried multiple workaround and troubleshooting steps to no avail.

Often times multiple tones are being skipped.  The problem is intermittent though.

If your theory with paging is correct is there any way to prevent a specific "priority" module from being swapped out of RAM?

On the other hand I do not believe swapping is the cause as I sometimes observe initial DTMF tones being processed, then one, two, or three missing, then the rest of the number dialed being processed properly.

Thank you,
Vladimir


By: Shaun Ruffell (sruffell) 2012-04-05 00:33:23.723-0500

I attached a patch against the current head of Asterisk 1.8, 0001-Lock-all-memory-pages-to-prevent-delays-on-incoming-.patch, but I think you should be able to do something similar on any version.

You can apply it against a checkout of 1.8 like:
{code}
# curl "https://issues.asterisk.org/jira/secure/attachment/43151/0001-Lock-all-memory-pages-to-prevent-delays-on-incoming-.patch" | patch -p1
{code}

It makes sense that you might only get the first digit and then miss some other digits if asterisk, after receiving and decoding the first digit, starts paging in other applications or modules in order to run the dialplan.

Also, make sure you're running asterisk with the -p flag.

By: Vladimir Mikhelson (vmikhelson) 2012-04-05 01:23:26.063-0500

Shaun,

I have applied the patch you provided.

I will first test without adding the "-p" parameter to the asterisk as the system is low on resources and I am concerned about possible lock-ups.  Immediately after restart the memory usage is 91%, swap usage is 26%.  Both values are higher than normal.

Please let me know whether having the "-p" in place is critical for a successful testing.

Thank you,
Vladimir


By: Vladimir Mikhelson (vmikhelson) 2012-04-05 10:46:23.644-0500

Shaun,

Unfortunately negative.

The first outbound call placed today failed. The number dialed was 1-847-849-2100.  Following is the excerpt from the full log.

{noformat}
[2012-04-05 09:27:44] VERBOSE[24092] sig_analog.c:     -- Starting simple switch on 'DAHDI/5-1'
[2012-04-05 09:27:51] VERBOSE[24092] pbx.c:     -- Executing [148492100@from-internal:1] ResetCDR("DAHDI/5-1", "") in new stack
[2012-04-05 09:27:51] VERBOSE[24092] pbx.c:     -- Executing [148492100@from-internal:2] NoCDR("DAHDI/5-1", "") in new stack
[2012-04-05 09:27:51] VERBOSE[24092] pbx.c:     -- Executing [148492100@from-internal:3] Progress("DAHDI/5-1", "") in new stack
[2012-04-05 09:27:51] VERBOSE[24092] pbx.c:     -- Executing [148492100@from-internal:4] Wait("DAHDI/5-1", "1") in new stack
[2012-04-05 09:27:52] VERBOSE[24092] pbx.c:     -- Executing [148492100@from-internal:5] Progress("DAHDI/5-1", "") in new stack
[2012-04-05 09:27:52] VERBOSE[24092] pbx.c:     -- Executing [148492100@from-internal:6] Playback("DAHDI/5-1", "silence/1&cannot-complete-as-diale
d&check-number-dial-again,noanswer") in new stack
[2012-04-05 09:27:53] VERBOSE[24092] file.c:     -- <DAHDI/5-1> Playing 'silence/1.ulaw' (language 'en')
[2012-04-05 09:27:54] VERBOSE[24092] file.c:     -- <DAHDI/5-1> Playing 'cannot-complete-as-dialed.ulaw' (language 'en')
[2012-04-05 09:27:56] VERBOSE[24092] file.c:     -- <DAHDI/5-1> Playing 'check-number-dial-again.ulaw' (language 'en')
[2012-04-05 09:27:59] VERBOSE[24092] pbx.c:     -- Executing [148492100@from-internal:7] Wait("DAHDI/5-1", "1") in new stack
[2012-04-05 09:27:59] VERBOSE[24092] pbx.c:   == Spawn extension (from-internal, 148492100, 7) exited non-zero on 'DAHDI/5-1'
[2012-04-05 09:27:59] VERBOSE[24092] pbx.c:     -- Executing [h@from-internal:1] Hangup("DAHDI/5-1", "") in new stack
[2012-04-05 09:27:59] VERBOSE[24092] pbx.c:   == Spawn extension (from-internal, h, 1) exited non-zero on 'DAHDI/5-1'
[2012-04-05 09:27:59] VERBOSE[24092] sig_analog.c:     -- Hanging up on 'DAHDI/5-1'
[2012-04-05 09:27:59] VERBOSE[24092] chan_dahdi.c:     -- Hungup 'DAHDI/5-1'

{noformat}

And here is the excerpt from the dtmf log:

{noformat}
[2012-04-05 09:27:45] DTMF[24092] channel.c: DTMF begin '1' received on DAHDI/5-1
[2012-04-05 09:27:45] DTMF[24092] channel.c: DTMF begin ignored '1' on DAHDI/5-1
[2012-04-05 09:27:46] DTMF[24092] channel.c: DTMF end '1' received on DAHDI/5-1, duration 63 ms
[2012-04-05 09:27:46] DTMF[24092] channel.c: DTMF end passthrough '1' on DAHDI/5-1
[2012-04-05 09:27:46] DTMF[24092] channel.c: DTMF begin '4' received on DAHDI/5-1
[2012-04-05 09:27:46] DTMF[24092] channel.c: DTMF begin ignored '4' on DAHDI/5-1
[2012-04-05 09:27:46] DTMF[24092] channel.c: DTMF end '4' received on DAHDI/5-1, duration 51 ms
[2012-04-05 09:27:46] DTMF[24092] channel.c: DTMF end passthrough '4' on DAHDI/5-1
[2012-04-05 09:27:46] DTMF[24092] channel.c: DTMF begin '8' received on DAHDI/5-1
[2012-04-05 09:27:46] DTMF[24092] channel.c: DTMF begin ignored '8' on DAHDI/5-1
[2012-04-05 09:27:46] DTMF[24092] channel.c: DTMF end '8' received on DAHDI/5-1, duration 38 ms
[2012-04-05 09:27:46] DTMF[24092] channel.c: DTMF end passthrough '8' on DAHDI/5-1
[2012-04-05 09:27:47] DTMF[24092] channel.c: DTMF begin '4' received on DAHDI/5-1
[2012-04-05 09:27:47] DTMF[24092] channel.c: DTMF begin ignored '4' on DAHDI/5-1
[2012-04-05 09:27:47] DTMF[24092] channel.c: DTMF end '4' received on DAHDI/5-1, duration 51 ms
[2012-04-05 09:27:47] DTMF[24092] channel.c: DTMF end passthrough '4' on DAHDI/5-1
[2012-04-05 09:27:47] DTMF[24092] channel.c: DTMF begin '9' received on DAHDI/5-1
[2012-04-05 09:27:47] DTMF[24092] channel.c: DTMF begin ignored '9' on DAHDI/5-1
[2012-04-05 09:27:47] DTMF[24092] channel.c: DTMF end '9' received on DAHDI/5-1, duration 51 ms
[2012-04-05 09:27:47] DTMF[24092] channel.c: DTMF end passthrough '9' on DAHDI/5-1
[2012-04-05 09:27:47] DTMF[24092] channel.c: DTMF begin '2' received on DAHDI/5-1
[2012-04-05 09:27:47] DTMF[24092] channel.c: DTMF begin ignored '2' on DAHDI/5-1
[2012-04-05 09:27:47] DTMF[24092] channel.c: DTMF end '2' received on DAHDI/5-1, duration 63 ms
[2012-04-05 09:27:47] DTMF[24092] channel.c: DTMF end passthrough '2' on DAHDI/5-1
[2012-04-05 09:27:47] DTMF[24092] channel.c: DTMF begin '1' received on DAHDI/5-1
[2012-04-05 09:27:47] DTMF[24092] channel.c: DTMF begin ignored '1' on DAHDI/5-1
[2012-04-05 09:27:47] DTMF[24092] channel.c: DTMF end '1' received on DAHDI/5-1, duration 63 ms
[2012-04-05 09:27:47] DTMF[24092] channel.c: DTMF end passthrough '1' on DAHDI/5-1
[2012-04-05 09:27:48] DTMF[24092] channel.c: DTMF begin '0' received on DAHDI/5-1
[2012-04-05 09:27:48] DTMF[24092] channel.c: DTMF begin ignored '0' on DAHDI/5-1
[2012-04-05 09:27:48] DTMF[24092] channel.c: DTMF end '0' received on DAHDI/5-1, duration 63 ms
[2012-04-05 09:27:48] DTMF[24092] channel.c: DTMF end passthrough '0' on DAHDI/5-1
[2012-04-05 09:27:48] DTMF[24092] channel.c: DTMF begin '0' received on DAHDI/5-1
[2012-04-05 09:27:48] DTMF[24092] channel.c: DTMF begin ignored '0' on DAHDI/5-1
[2012-04-05 09:27:48] DTMF[24092] channel.c: DTMF end '0' received on DAHDI/5-1, duration 63 ms
[2012-04-05 09:27:48] DTMF[24092] channel.c: DTMF end passthrough '0' on DAHDI/5-1

{noformat}

As you see both logs show digits 8 and 7 (sequentially 2nd and 4th) missing.  Also there is a 3 second delay between the last digit being recognized and beginning of dialplan processing which I believe is normal.

The second call was placed from the same extension in about an hour by using a built-in "redial" function on the phone.  Here are the respective logs.

Full:

{noformat}

[2012-04-05 10:35:45] VERBOSE[24438] sig_analog.c:     -- Starting simple switch on 'DAHDI/5-1'
[2012-04-05 10:35:52] VERBOSE[24438] pbx.c:     -- Executing [18478492100@from-internal:1] Macro("DAHDI/5-1", "user-callerid,LIMIT,") in new stack
[2012-04-05 10:35:52] VERBOSE[24438] pbx.c:     -- Executing [s@macro-user-callerid:1] Set("DAHDI/5-1", "AMPUSER=421") in new stack
[2012-04-05 10:35:52] VERBOSE[24438] pbx.c:     -- Executing [s@macro-user-callerid:2] GotoIf("DAHDI/5-1", "0?report") in new stack
[2012-04-05 10:35:52] VERBOSE[24438] pbx.c:     -- Executing [s@macro-user-callerid:3] ExecIf("DAHDI/5-1", "1?Set(REALCALLERIDNUM=421)") in new st
ack

<skipped>

[2012-04-05 10:35:53] VERBOSE[24438] pbx.c:     -- Executing [s@macro-dialout-trunk:29] Set("DAHDI/5-1", "the_num=+18478492100") in new stack
[2012-04-05 10:35:53] VERBOSE[24438] pbx.c:     -- Executing [s@macro-dialout-trunk:30] Dial("DAHDI/5-1", "gtalk/GV8612/+18478492100@voice.google.com,300,") in new stack
[2012-04-05 10:35:53] VERBOSE[24438] app_dial.c:     -- Called gtalk/GV8612/+18478492100@voice.google.com
[2012-04-05 10:35:53] DEBUG[20293] chan_gtalk.c: redirect +18478492100@voice.google.com/srvres-MTAuMjI5LjEzNC4xNDo5ODg1
[2012-04-05 10:35:53] VERBOSE[24438] app_dial.c:     -- Gtalk/+18478492100@voice.google.com-d631 is ringing
[2012-04-05 10:36:13] DEBUG[20293] chan_gtalk.c: The client is guest
[2012-04-05 10:36:13] VERBOSE[24438] app_dial.c:     -- Gtalk/+18478492100@voice.google.com-d631 answered DAHDI/5-1
[2012-04-05 10:36:48] VERBOSE[24438] pbx.c:     -- Executing [h@macro-dialout-trunk:1] Macro("DAHDI/5-1", "hangupcall,") in new stack
[2012-04-05 10:36:48] VERBOSE[24438] pbx.c:     -- Executing [s@macro-hangupcall:1] GotoIf("DAHDI/5-1", "1?theend") in new stack
[2012-04-05 10:36:48] VERBOSE[24438] pbx.c:     -- Goto (macro-hangupcall,s,3)
[2012-04-05 10:36:48] VERBOSE[24438] pbx.c:     -- Executing [s@macro-hangupcall:3] Hangup("DAHDI/5-1", "") in new stack
[2012-04-05 10:36:48] VERBOSE[24438] app_macro.c:   == Spawn extension (macro-hangupcall, s, 3) exited non-zero on 'DAHDI/5-1' in macro 'hangupcall'
[2012-04-05 10:36:48] VERBOSE[24438] features.c:   == Spawn extension (macro-dialout-trunk, h, 1) exited non-zero on 'DAHDI/5-1'
[2012-04-05 10:36:48] ERROR[24438] cdr_odbc.c: Unable to retrieve database handle.  CDR failed.
[2012-04-05 10:36:48] VERBOSE[24438] app_macro.c:   == Spawn extension (macro-dialout-trunk, s, 30) exited non-zero on 'DAHDI/5-1' in macro 'dialout-trunk'
[2012-04-05 10:36:48] VERBOSE[24438] pbx.c:   == Spawn extension (from-internal, 18478492100, 6) exited non-zero on 'DAHDI/5-1'
[2012-04-05 10:36:48] VERBOSE[24438] sig_analog.c:     -- Hanging up on 'DAHDI/5-1'
[2012-04-05 10:36:48] VERBOSE[24438] chan_dahdi.c:     -- Hungup 'DAHDI/5-1'

{noformat}

DTMF:

{noformat}
[2012-04-05 10:35:47] DTMF[24438] channel.c: DTMF begin '1' received on DAHDI/5-1
[2012-04-05 10:35:47] DTMF[24438] channel.c: DTMF begin ignored '1' on DAHDI/5-1
[2012-04-05 10:35:47] DTMF[24438] channel.c: DTMF end '1' received on DAHDI/5-1, duration 51 ms
[2012-04-05 10:35:47] DTMF[24438] channel.c: DTMF end passthrough '1' on DAHDI/5-1
[2012-04-05 10:35:47] DTMF[24438] channel.c: DTMF begin '8' received on DAHDI/5-1
[2012-04-05 10:35:47] DTMF[24438] channel.c: DTMF begin ignored '8' on DAHDI/5-1
[2012-04-05 10:35:47] DTMF[24438] channel.c: DTMF end '8' received on DAHDI/5-1, duration 63 ms
[2012-04-05 10:35:47] DTMF[24438] channel.c: DTMF end passthrough '8' on DAHDI/5-1
[2012-04-05 10:35:47] DTMF[24438] channel.c: DTMF begin '4' received on DAHDI/5-1
[2012-04-05 10:35:47] DTMF[24438] channel.c: DTMF begin ignored '4' on DAHDI/5-1
[2012-04-05 10:35:47] DTMF[24438] channel.c: DTMF end '4' received on DAHDI/5-1, duration 63 ms
[2012-04-05 10:35:47] DTMF[24438] channel.c: DTMF end passthrough '4' on DAHDI/5-1
[2012-04-05 10:35:48] DTMF[24438] channel.c: DTMF begin '7' received on DAHDI/5-1
[2012-04-05 10:35:48] DTMF[24438] channel.c: DTMF begin ignored '7' on DAHDI/5-1
[2012-04-05 10:35:48] DTMF[24438] channel.c: DTMF end '7' received on DAHDI/5-1, duration 63 ms
[2012-04-05 10:35:48] DTMF[24438] channel.c: DTMF end passthrough '7' on DAHDI/5-1
[2012-04-05 10:35:48] DTMF[24438] channel.c: DTMF begin '8' received on DAHDI/5-1
[2012-04-05 10:35:48] DTMF[24438] channel.c: DTMF begin ignored '8' on DAHDI/5-1
[2012-04-05 10:35:48] DTMF[24438] channel.c: DTMF end '8' received on DAHDI/5-1, duration 63 ms
[2012-04-05 10:35:48] DTMF[24438] channel.c: DTMF end passthrough '8' on DAHDI/5-1
[2012-04-05 10:35:48] DTMF[24438] channel.c: DTMF begin '4' received on DAHDI/5-1
[2012-04-05 10:35:48] DTMF[24438] channel.c: DTMF begin ignored '4' on DAHDI/5-1
[2012-04-05 10:35:48] DTMF[24438] channel.c: DTMF end '4' received on DAHDI/5-1, duration 51 ms
[2012-04-05 10:35:48] DTMF[24438] channel.c: DTMF end passthrough '4' on DAHDI/5-1
[2012-04-05 10:35:48] DTMF[24438] channel.c: DTMF begin '9' received on DAHDI/5-1
[2012-04-05 10:35:48] DTMF[24438] channel.c: DTMF begin ignored '9' on DAHDI/5-1
[2012-04-05 10:35:48] DTMF[24438] channel.c: DTMF end '9' received on DAHDI/5-1, duration 63 ms
[2012-04-05 10:35:48] DTMF[24438] channel.c: DTMF end passthrough '9' on DAHDI/5-1
[2012-04-05 10:35:48] DTMF[24438] channel.c: DTMF begin '2' received on DAHDI/5-1
[2012-04-05 10:35:48] DTMF[24438] channel.c: DTMF begin ignored '2' on DAHDI/5-1
[2012-04-05 10:35:49] DTMF[24438] channel.c: DTMF end '2' received on DAHDI/5-1, duration 63 ms
[2012-04-05 10:35:49] DTMF[24438] channel.c: DTMF end passthrough '2' on DAHDI/5-1
[2012-04-05 10:35:49] DTMF[24438] channel.c: DTMF begin '1' received on DAHDI/5-1
[2012-04-05 10:35:49] DTMF[24438] channel.c: DTMF begin ignored '1' on DAHDI/5-1
[2012-04-05 10:35:49] DTMF[24438] channel.c: DTMF end '1' received on DAHDI/5-1, duration 63 ms
[2012-04-05 10:35:49] DTMF[24438] channel.c: DTMF end passthrough '1' on DAHDI/5-1
[2012-04-05 10:35:49] DTMF[24438] channel.c: DTMF begin '0' received on DAHDI/5-1
[2012-04-05 10:35:49] DTMF[24438] channel.c: DTMF begin ignored '0' on DAHDI/5-1
[2012-04-05 10:35:49] DTMF[24438] channel.c: DTMF end '0' received on DAHDI/5-1, duration 63 ms
[2012-04-05 10:35:49] DTMF[24438] channel.c: DTMF end passthrough '0' on DAHDI/5-1
[2012-04-05 10:35:49] DTMF[24438] channel.c: DTMF begin '0' received on DAHDI/5-1
[2012-04-05 10:35:49] DTMF[24438] channel.c: DTMF begin ignored '0' on DAHDI/5-1
[2012-04-05 10:35:49] DTMF[24438] channel.c: DTMF end '0' received on DAHDI/5-1, duration 51 ms
[2012-04-05 10:35:49] DTMF[24438] channel.c: DTMF end passthrough '0' on DAHDI/5-1

{noformat}


By: Shaun Ruffell (sruffell) 2012-04-05 10:53:55.045-0500

The patch I attached will only change the behavior if the -p option is used when running. Also, the fact that your system resources are low supports the theory that the OS is paging out parts of Asterisk to free up resouces, hence the need to page them back in when a new call comes in.

Could you try with the -p flag? Also do you know why the resources are low on that system? Do you have Asterisk modules loaded which you do not need?

By: Vladimir Mikhelson (vmikhelson) 2012-04-05 11:29:09.999-0500

Shaun,

What I mean by low resources is the natural machine configuration.

It is a Pentium III box with 384M RAM.

Do you think it is safe to run with "-p"?

On the side note.  I also experience the following analog DAHDI related issues:

1. Intermittent Caller ID failure on FXO lines (Digium ticket opened)
2. Static noise on FXS lines if a genuine Digium TDM410P card is being used.  No problem with a generic TDM400P. (Digium opened RMA for this one)
3. Inconsistent DTMF forwarding to GTALK from DAHDI/FXS. The problem does not exist in SIP to GTALK or in DAHDI/FXS to SIP or IAX scenarios. https://issues.asterisk.org/jira/browse/ASTERISK-19576. I still need to submit what Matt requested.
4. Intermittent call supervision issue on incoming DAHDI/FXO bridged to DAHDI/FXS calls where Asterisk fails to detect a "caller dropped the call" condition and I receive a 50 second voice message with CO tones.

Do you think there is a possible relationship here?

To answer your second question I can probably spend some time with menuselect and pick the minimal configuration.

As I mentioned initially the problem was first noticed when I switched to 1.8.  It was also the time when I started compiling vs. using yum update as the depository does not contain gtalk / jabber.  Potentially I could have included more modules than the depository maintainers included.  I tried to get details on how they compile but hit a wall of silence.  Ideally I would like to compile just the portion I need and continue using the depository.

I hope I did not provide too much of irrelevant information.

Thank you,
Vladimir


By: Vladimir Mikhelson (vmikhelson) 2012-04-05 12:24:54.402-0500

Shaun,

Another loosely related observation.  After I upgraded to 1.8.11.0 I do not get the following debug output any more:

[2012-04-02 18:31:19] DEBUG[31621] chan_dahdi.c: CallerID number: 3197302568, name: UNIV ADMISSIONS, flags=0

Thank you,
Vladimir


By: Shaun Ruffell (sruffell) 2012-04-05 13:09:49.531-0500

It should be safe to run with -p, and on an underpowered box, you probably are even *more* interested in making sure that Asterisk is running at a higher priority than other things that may be running on the box.

With the exception of the static, I could imagine how all the things you described could be related.

Regarding resources, on my lowest powered system that I use (with FXS ports by the way) I can place calls between two FXS ports and I still have over 267M free. So you should be able to get your configuration to a state where you run reliably.

{noformat}
# free
            total       used       free     shared    buffers     cached
Mem:        318088     313544       4544          0      42136     221152
-/+ buffers/cache:      50256     267832
Swap:       327772       1176     326596

# top -b -n 1 | grep -e asterisk -e PID
 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                          
1316 root      20   0 41964  13m 6876 S  0.0  4.2   4:15.20 asterisk  
{noformat}

By: Shaun Ruffell (sruffell) 2012-04-05 13:26:44.398-0500

Regarding the caller ID information, you will need to make sure you're running with the current head of the 2.6 branch of DAHDI. You'll probably need [r10481|http://svnview.digium.com/svn/dahdi?view=revision&revision=10481]. If you're using 2.5.0.2 of DAHDI-Linux this won't affect you.

By: Vladimir Mikhelson (vmikhelson) 2012-04-05 23:52:48.734-0500

Shaun,

It looks like I need some help here.

First I tried modifying /etc/init.d/asterisk by appending "-p" to the ASTARGS. Upon running "asterisk restart" I received a seg. fault. Removed "-p" and all was back to normal.

Then I decided to use freepbx_emgine script.  There I appended the "-p" parameter to the safe_asterisk call.  As a result Asterisk loaded but was not responsive, e.g. attempt to issue "asterisk -r" returned "Remote Asterisk cannot be contacted" or something close to that.

After I restored this script to the original and restarted Asterisk I am getting the following output on CLI every second or so:

   -- Remote UNIX connection
   -- Remote UNIX connection disconnected
   -- Remote UNIX connection
   -- Remote UNIX connection disconnected
   -- Remote UNIX connection
   -- Remote UNIX connection disconnected
   -- Remote UNIX connection

I am rebooting the box.

As I am running DAHDI-Linux 2.6.0 I am assuming the regression you mentioned does not affect me.

Thank you,
Vladimir


By: Vladimir Mikhelson (vmikhelson) 2012-04-06 01:20:14.559-0500

Shaun,

Please disregard my question regarding the missing "CallerID number: 3197302568, name: UNIV ADMISSIONS, flags=0" output.

I discovered that working ast_log was replaced with silent ast_debug. Restored, recompiled.  It looks like somebody made a change without thinking of backwards compatibility.

Thank you,
Vladimir


By: Shaun Ruffell (sruffell) 2012-04-06 19:15:19.021-0500

Vladimir, hmm..something definitely doesn't sound right on your system. There isn't any obvious reason that I can think of where the -p flag should make Asterisk unresponsive.

Which version of Asterisk are you running?

By: Vladimir Mikhelson (vmikhelson) 2012-04-07 01:02:08.324-0500

Shaun,

The version is 1.8.11.0.

I might have done something wrong with passing the parameter I think. Would you suggest to pass the parameter in some different way compared to what I did?

Thank you,
Vladimir


By: Shaun Ruffell (sruffell) 2012-04-09 12:33:51.956-0500

I just ran {{# asterisk -p}} from the command line. I could then see it running at a higher priority in the output of {{# top -b -n1 | grep asterisk}} then normal.  So I'm not sure what would be different in your case.

Are you able to provide some sort of back trace from the segfault?  Is there anything in the output of {{# dmesg}} when it occurs?

By: Shaun Ruffell (sruffell) 2012-04-09 12:37:52.990-0500

Ok..something else to try to see if the problem is with the priority call, or just that your system is unable to lock that much memory.

In main/asterisk.c of your asterisk source, try adding the following line early in the main function.

{code}
mlockall(MCL_CURRENT | MCL_FUTURE);
{code}

You will need to add {{#include <sys/mman.h>}} up near the top of the file as well.

By: Shaun Ruffell (sruffell) 2012-05-23 17:02:24.472-0500

Is this still a problem for you?

By: Vladimir Mikhelson (vmikhelson) 2012-05-23 20:09:20.672-0500

Shaun,

It definitely is.

Somehow I was not notified of your latest comment 09/Apr/12 12:37 PM.  I will try the changes you suggested and will let you know what happened.

Thank you,
Vladimir


By: Vladimir Mikhelson (vmikhelson) 2012-05-24 00:06:09.581-0500

Shaun,

I have added the #include in the portion of the program where other includes were located. The mlockall I added prior to "Remember original args for restart" comment.

Compiling now.

Just FYI, the system was migrated to a slightly more powerful box with 512M RAM.

No fun.  Received seg. fault upon the amportal restart.  It is very atypical.

/usr/sbin/safe_asterisk: line 145: 12052 Segmentation fault      (core dumped) nice -n $PRIORITY ${ASTSBINDIR}/asterisk -f ${CLIARGS} ${ASTARGS} > /dev/${TTY} 2>&1 < /dev/${TTY}
Asterisk ended with exit status 139
Asterisk exited on signal 11.
Automatically restarting Asterisk.
mpg123: no process killed

-----------------------------------------------------
Asterisk could not start!
Use 'tail /var/log/asterisk/full' to find out why.
-----------------------------------------------------

And here is the tail of the full log.

[2012-05-24 00:30:03] VERBOSE[11884] asterisk.c: Executing last minute cleanups
[2012-05-24 00:30:04] VERBOSE[11884] asterisk.c: Asterisk cleanly ending (0).
[2012-05-24 00:30:18] VERBOSE[12052] config.c:   == Parsing '/etc/asterisk/asterisk.conf': [2012-05-24 00:30:18] VERBOSE[12052] config.c:   == Found
[2012-05-24 00:30:18] WARNING[12052] xmldoc.c: Couldn't find manager DataGet in XML documentation
[2012-05-24 00:30:18] WARNING[12052] xmldoc.c: Couldn't find manager DataGet in XML documentation
[2012-05-24 00:30:18] VERBOSE[12052] manager.c:   == Manager registered action DataGet
[2012-05-24 00:30:18] ERROR[12052] /usr/src/1.8.12.0/include/asterisk/utils.h: Memory Allocation Failure in function internal_ao2_alloc at line 300 of astobj2.c
[2012-05-24 00:30:18] VERBOSE[12052] loader.c:  Asterisk Dynamic Loader Starting:
[2012-05-24 00:30:18] VERBOSE[12052] config.c:   == Parsing '/etc/asterisk/modules.conf': [2012-05-24 00:30:18] VERBOSE[12052] config.c:   == Found
[2012-05-24 00:30:18] NOTICE[12052] loader.c: 2 modules will be loaded.
[2012-05-24 00:30:18] WARNING[12052] loader.c: Error loading module 'pbx_config.so': /usr/lib/asterisk/modules/pbx_config.so: failed to map segment from shared object: Resource temporarily unavailable
[2012-05-24 00:30:18] WARNING[12052] loader.c: Error loading module 'chan_local.so': /usr/lib/asterisk/modules/chan_local.so: failed to map segment from shared object: Resource temporarily unavailable
[2012-05-24 00:30:18] WARNING[12052] loader.c: Error loading module 'pbx_config.so': /usr/lib/asterisk/modules/pbx_config.so: failed to map segment from shared object: Resource temporarily unavailable
[2012-05-24 00:30:18] WARNING[12052] loader.c: Module 'pbx_config.so' could not be loaded.
[2012-05-24 00:30:18] WARNING[12052] loader.c: Error loading module 'chan_local.so': /usr/lib/asterisk/modules/chan_local.so: failed to map segment from shared object: Resource temporarily unavailable
[2012-05-24 00:30:18] WARNING[12052] loader.c: Module 'chan_local.so' could not be loaded.
[2012-05-24 00:30:18] VERBOSE[12052] config.c:   == Parsing '/etc/asterisk/http.conf': [2012-05-24 00:30:18] VERBOSE[12052] config.c:   == Found
[2012-05-24 00:30:18] ERROR[12052] tcptls.c: Unable to launch thread for http server on 0.0.0.0:8088: Resource temporarily unavailable

Please let me know whether you need more details from the dump.  I have removed the changes, recompiled, and everything is back to normal.

Thank you,
Vladimir


By: Shaun Ruffell (sruffell) 2012-05-29 09:19:52.181-0500

So your system must be paging pretty heavily since on startup your dump was caused by a memory allocation failure.  Was that dumped caused on the box that had 512M of RAM?  Are there any other services running, or modules loaded that you're not using in order to reduce the memory footprint?

I think my original hypothesis is still a good one. When your machine is idle, some of the pages used by the Asterisk process are being swapped out, then when the call comes in, it takes time to swap them back in and that extra delay causes failures.

By: Vladimir Mikhelson (vmikhelson) 2012-05-29 10:10:53.573-0500

Shaun,

I do not dismiss your theory at all.  Now we need to prove or disprove it.  So far the troubleshooting attempts were catastrophic.

It probably will be a stretch to call a system "paging heavily" according to the following:

top - 10:06:38 up 19 days,  9:59,  1 user,  load average: 0.04, 0.07, 0.08
Tasks: 106 total,   1 running, 105 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.0%us,  0.3%sy,  0.0%ni, 98.0%id,  0.0%wa,  0.3%hi,  0.3%si,  0.0%st
Mem:    514692k total,   488376k used,    26316k free,   164364k buffers
Swap:  1048568k total,    71020k used,   977548k free,   123000k cached


By: Shaun Ruffell (sruffell) 2012-05-29 10:13:55.699-0500

Do you find that the delay happens less frequently on this upgraded system?

By: Vladimir Mikhelson (vmikhelson) 2012-05-29 10:19:45.350-0500

Unfortunately it is hard to say as the event is intermittent.

I know I once reached 911 where as I dialed 263-9111.  It was bad.

-Vladimir


By: Shaun Ruffell (sruffell) 2012-05-29 10:44:16.060-0500

Are you running as root? (and are you on IRC? I'm sruffell on irc.freenode.net)

By: Vladimir Mikhelson (vmikhelson) 2012-05-29 14:07:07.727-0500

Shaun,

I run Asterisk as an Asterisk user.

Sorry, I never used IRC. I do use other messenger services.

Thank you,
Vladimir

PS. If you want to discuss the issue in person I can give you a call.
-V.

By: Shaun Ruffell (sruffell) 2012-05-30 10:10:49.116-0500

Vladimir, could you try starting the system up as root with the patch. Just to make sure that the problem is that the user asterisk is running as isn't allowed to lock that much memory. If it starts up normally, we can then see where to make the change to the limit that controls amount of memory the process is allowed to lock.

By: Shaun Ruffell (sruffell) 2012-06-17 12:00:12.188-0500

I updated the patch, since when forking the process locks were being dropped.

By: Vladimir Mikhelson (vmikhelson) 2012-06-17 15:54:04.717-0500

Shaun,

Is it safe to test on a 512MB RAM system?

Thank you,
Vladimir


By: Shaun Ruffell (sruffell) 2012-06-17 18:11:10.859-0500

To repeat what's on the asterisk-users mailing list [here|http://thread.gmane.org/gmane.comp.telephony.pbx.asterisk.user/269807/focus=269848], probably not if you're loading all modules. I would recommend adding the patch, and then editing /etc/asterisk/modules.conf to load only the modules you need for your application.

By: Vladimir Mikhelson (vmikhelson) 2012-06-17 18:20:12.276-0500

Shaun, in an attempt to keep the full copy of the conversation in one place let me repeat the question I posted on the list, "Would it be possible to lock just the specific "critical" modules in the RAM?"

In my opinion all modules needed for interrupts' processing must be loaded statically.

Thank you,
Vladimir



By: Shaun Ruffell (sruffell) 2012-06-17 18:46:15.142-0500

Not without more work. I think that it would be hard, generally, to know which Asterisk processes would *need* to be locked in memory in this case since there are many different use cases, and depending on the speed of secondary storage some may see more problems than others. Asterisk isn't needed to process any interrupts, but any module in Asterisk that results in paging could prevent Asterisk from reading all the data from the DAHDI channels.

I believe it's safest to just make sure you can lock the entire application into memory.

By: Vladimir Mikhelson (vmikhelson) 2012-06-17 18:57:33.165-0500

So Asterisk plus the drivers it relies on is as of now the "best effort" kind of an application.  It will process an interrupt if the storage is fast enough if not, then sorry....

Poor architectural choices in my humble opinion.

As I stated before, ALL drivers and dependencies should be permanently loaded in the memory if Asterisk is supposed to be reliably used as a real-time application.  Applications like Voice Mail can be safely loaded on demand.

Alternatively, a "lazy" approach of pushing the whole beast in the RAM can be taken.

I am really surprised these basic 101 features are not implemented in any 1.0 application. Sorry, I should have said, "10.0"

Do you think it is worth the effort to try to lock just the modules we need to fix my immediate problem for proving the concept?  If it works you guys can enhance the architecture based on our small experiment later.

Thank you,
Vladimir



By: Shaun Ruffell (sruffell) 2012-06-17 19:25:07.467-0500

In my opinion this isn't an Asterisk issue. It's a system issue. If you were running Asterisk on a computer with 1GB of RAM you most likely would have never hit this issue. If you only loaded the modules you actually needed you probably wouldn't have seen this issue either.

I think it would be much easier if you disable autoloading and figure out which modules you need, and then lock the complete Asterisk application into memory (with only what you need). This would prove the theory at least.

While voicemail isn't "critical", what should Asterisk do with audio frames that are backing up in chan_dahdi while loading the voicemail application pages back from disk? It can't just let them buffer up indefinitely. How many milliseconds of latency is "ok" to add? Different users of Asterisk will have different ideas about what is acceptable.

There are other system issues which cause these types of problems given the real-time nature of telephony. For example, framebuffers can cause problems because too much time is spent scrolling the screen. Hard drives in 'combined' mode (with 3MB/s transfer rates) can, when writing voicemails to the disk cause the system to spend so much time in interrupt handlers that audio is lost, system management interrupts on some servers, which happen underneath the OS can take up enough time that the driver will miss it's interrupts.  These are all system issues of a similar nature and not related to the Asterisk's architecture.

By: Vladimir Mikhelson (vmikhelson) 2012-06-17 23:21:40.329-0500

Shaun,

I understand your argument of the system configuration vs. the application complexity.

Still have to ask you, "Does any piece of documentation say that Asterisk needs a minimum of 1G of RAM to run a full set of modules?"

Also I have to agree my problem with missing DTMF started when I was forced to switch from "yum update" which delivered a limited build to a full build which I compile as maintainers still do not support Google Voice which I rely on.  That pretty much proves your theory.

As far as my problem goes it looks like I need to strip the application I compile off of the unnecessary modules which is a headache as I am not sure which ones I really need.  And even if I manage to figure that out then I will need to redo the selection for every new version I download as I did not figure out how to reliably move manuselect setup files from version to version especially when choices change.

Another option will be to jump on a Pentium IV system.  That can bring in new surprises as well as it will definitely break Digium licensing as virtually all P-IVs come with built-in NICs and Digium ties their licensing to all NICs in the system.

These are the choices I have to make as I understand it is too complex for you to lock the specific pair of modules in the RAM.

What needs to be done the minimum RAM requirements should be made public.

There is nothing here: http://www.voip-info.org/wiki/view/Asterisk+hardware+recommendations

And there is nothing here: http://astbook.asteriskdocs.org/en/2nd_Edition/asterisk-book-html-chunk/asterisk-CHP-2.html

When I first chose a PC to use back in 2008 I did not find any RAM requirements either. In fact I started with 192M and happily ran Asterisk 1.4.x with a bunch of other services running on the same machine. Then I upgraded to 1.6.0.x, 1.6.2.x, 1.8.x and never needed more than 384M. Upgrade to 512M was a side effect of troubleshooting I ran with Digium where their TDM410P appeared to produce substantial background noise on Dell Dimension systems.  They wanted me to try on another machine before admitting my theory was right.

Now back to the interrupts' processing.  In my opinion if a system is supposed to be used in a "real-time" mode all drivers responsible for IRQ and other real-time events processing should be permanently allocated in the RAM, all "slow" processes should be properly threaded and hopefully should not cause other "fast" processes wait for their completion.

Thank you,
Vladimir