[Home]

Summary:ASTERISK-30201: app_senddtmf: Repeat uses of PlayDTMF AMI action can cause infinite DTMF, or potential for crash
Reporter:Jonathan Rose (JonathanRRose)Labels:
Date Opened:2022-09-01 04:46:36Date Closed:
Priority:MinorRegression?Yes
Status:Open/NewComponents:Applications/app_senddtmf
Versions:16.3.0 18.9.0 18.12.1 Frequency of
Occurrence
Frequent
Related
Issues:
Environment:System 1 - Ubuntu 22.04 - Intel core i7 6th generation NUC System 2 - Centos 7 - some dell server blade thing with server grade hardware, it's not important, I'm just including it for the purpose of indicating that I've reproduced this on multiple systemsAttachments:
Description:The PlayDTMF manager command results in a sticking tone intermittently. I haven't seen this happen the first time this command is used on a channel, but frequently (not always) if I use PlayDTMF several times on a channel during a call the tone will persist until I send another PlayDTMF command at which point it will stop. I'm generally able to reproduce this every other call, but depending on chance it may be more or less frequent. Presently I think this only happens for channels that use Inband DTMF modes, but I haven't fully investigated the effects in other instances since they are harder to discern (an endless noise is easier to notice than a strange behavior with an info message)

The bug was discovered on a test system in the lab. We have a browser 'softphone' that uses playdtmf to inject DTMF into a call... there are other circumstances where we use playdtmf, but that's not super important. Our prod/test systems use a heavily modified version of Asterisk anyway (plus they are still using chan_sip).

So I did a fresh clone of Asterisk 18 (commit bd821549af3bccb000c809121094adb5b84fec7f) tonight on a relatively clean Ubuntu 22.04 system and reproduced it with a very minimal setup while making sure everything is super recent and still supported.

PJSIP config:

[general]
allowoverlap = no

[transport-udp]
type = transport
protocol = udp
bind = 0.0.0.0

[testuser1]
type = aor
max_contacts = 1

[testuser1]
type = endpoint
context = testtext
disallow = all
allow = ulaw
allow = alaw

aors = testuser1
asymmetric_rtp_codec = true
direct_media = false
dtmf_mode = inband

extensions.conf

[testtext]
exten => 100,1,Answer()
exten => 100,n,Echo()

manager.conf
[general]
enabled = yes

[adminpizzapasta]
secret=itsasecrettoeveryone
deny=0.0.0.0/0.0.0.0
permit=127.0.0.1
read=system,call,log
write=system,call,log

Step 1:
Start a call using the phone/softphone of your choice (specifically a Polycom VVX 410 was the one it was reproduced on in the lab while I reproduced it with TipCon1 softphone at home -- doesn't really matter, I have also prodded deeply enough to figure out that the DTMF is being repeatedly generated by Asterisk even after the generator detach function was called) to extension 100 using the testuser1 account. This will play an echo which is mostly just used so we have some media.

Step 2:
Login to manager using adminpizzapasta

Step 3:
Start spamming the call with PlayDTMF messages until you get a sticky tone

Some examples of manager commands I sent:

Action: PlayDTMF
Channel: PJSIP/testuser1-00000000
Digit: 4

Action: PlayDTMF
Channel: PJSIP/testuser1-00000000
Digit: 5

Action: PlayDTMF
Channel: PJSIP/testuser1-00000000
Digit: 6

Action: PlayDTMF
Channel: PJSIP/testuser1-00000000
Digit: #

Note the lack of a specified duration. It makes no difference if I include a duration, but it will use the default lowest possible value as a result.

If you don't get it within 20 tones or so, I recommend starting over and trying again. Usually if I can get it to happen once on a call I can get it to happen again quite frequently.

It's technically a regression, but it started happening after we moved from 1.6.2 to 16.3, so I can't really help to isolate specifically when it started right now. That said, there was a patch by Richard Mudgett back in 2019 that may have been involved. It changed some channel API stuff around so that there is a specific 'ast_senddigit_external' function intended for use when invoking DTMF outside the main media thread of a channel.

Now onto the scary part...

In the event of these tones sticking, I noticed ast_playtones_stop is invoked on the channel which in turn invokes ast_deactivate_generator, which I think is then going to call the generator release function in ast_generator_playtones (indications.c) which will then also free the playtones_state which is used by the playtones_generator function which is still running after the fact. I don't know with certainy right now if anything is going to use this free'd memory and I haven't seen any errors coming out of Asterisk during these times just yet, so it might be insignificant, but I still found it a little spooky.

One other thing to note I suppose -- it might be better to move to schedule the stops in a task processor or something rather than having it block for the duration of the DTMF. Probably not super important and it isn't exactly hurting our use any, but I can imagine that would get annoying with long duration DTMF.

I'll be trying to isolate this more over the next week and hopefully get a patch written, I just figured I'd go ahead and let you guys know about it.
Comments:By: Asterisk Team (asteriskteam) 2022-09-01 04:46:40.704-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. Please note that log messages and other files should not be sent to the Sangoma Asterisk Team unless explicitly asked for. All files should be placed on this issue in a sanitized fashion as needed.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.

Please note that by submitting data, code, or documentation to Sangoma through JIRA, you accept the Terms of Use present at [https://www.asterisk.org/terms-of-use/|https://www.asterisk.org/terms-of-use/].

By: Jonathan Rose (JonathanRRose) 2022-09-06 18:33:46.501-0500

After further investigation, I no longer think there is a risk of free'd memory use when this failure occurs. What happens in each case is the generatordata is constantly flipped back and forth between NULL and the data that was initially set for the generator. If generatordata is null when deactivate_generator_nolock is called and hits the if (ast_channel_generatordata(chan) check, it'll effectively skip deactivating the generator all together.

I'm not sure about the exact details, but in a simple one on one dial example (PJSIP endpoint A calls PJSIP endpoint B) I've added some stuff to print the before and after value as well as a backtrace so we can have a rough idea of what invoked the change:

{code}
[Sep  6 18:13:33] NOTICE[52875]: channel_internal_api.c:573 ast_channel_generatordata_set: generatordata_set was (nil), setting to 0x7f1d40004c30
      > 0x7f1e2c01bb90 -- Strict RTP switching to RTP target address 192.168.1.7:1046 as source
[Sep  6 18:13:33] ERROR[52875]:   Got 13 backtrace records
# 0: asterisk ast_channel_generatordata_set()
# 1: asterisk ast_activate_generator()
# 2: asterisk ast_playtones_start()
# 3: asterisk ast_senddigit_begin()
# 4: asterisk ast_senddigit_external()
# 5: [0x7f1e30c9fb65] app_senddtmf.so app_senddtmf.c:183 manager_play_dtmf()
# 6: asterisk <unknown>()
# 7: asterisk <unknown>()
# 8: asterisk <unknown>()
# 9: asterisk <unknown>()
#10: asterisk <unknown>()
#11: [0x7f1e43047b43] libc.so.6 pthread_create.c:442 start_thread()
#12: [0x7f1e430d9a00] libc.so.6 clone3.S:83 __clone3()

[Sep  6 18:13:33] NOTICE[52875]: channel_internal_api.c:743 ast_channel_generator_set: generator was (nil), setting to 0x5627ed0bf820
[Sep  6 18:13:33] ERROR[52875]:   Got 13 backtrace records
# 0: asterisk ast_channel_generator_set()
# 1: asterisk ast_activate_generator()
# 2: asterisk ast_playtones_start()
# 3: asterisk ast_senddigit_begin()
# 4: asterisk ast_senddigit_external()
# 5: [0x7f1e30c9fb65] app_senddtmf.so app_senddtmf.c:183 manager_play_dtmf()
# 6: asterisk <unknown>()
# 7: asterisk <unknown>()
# 8: asterisk <unknown>()
# 9: asterisk <unknown>()
#10: asterisk <unknown>()
#11: [0x7f1e43047b43] libc.so.6 pthread_create.c:442 start_thread()
#12: [0x7f1e430d9a00] libc.so.6 clone3.S:83 __clone3()

[Sep  6 18:13:33] NOTICE[52879][C-00000001]: channel_internal_api.c:573 ast_channel_generatordata_set: generatordata_set was 0x7f1d40004c30, setting to (nil)
[Sep  6 18:13:33] ERROR[52879][C-00000001]:   Got 20 backtrace records
# 0: asterisk ast_channel_generatordata_set()
# 1: asterisk <unknown>()
# 2: asterisk <unknown>()
# 3: asterisk ast_read_stream()
# 4: asterisk <unknown>()
# 5: asterisk <unknown>()
# 6: asterisk bridge_channel_internal_join()
# 7: asterisk ast_bridge_join()
# 8: asterisk ast_bridge_call_with_flags()
# 9: asterisk ast_bridge_call()
#10: [0x7f1e3149886b] app_dial.so app_dial.c:3442 dial_exec_full()
#11: [0x7f1e314992ca] app_dial.so app_dial.c:3516 dial_exec()
#12: asterisk pbx_exec()
#13: asterisk <unknown>()
#14: asterisk ast_spawn_extension()
#15: asterisk <unknown>()
#16: asterisk <unknown>()
#17: asterisk <unknown>()
#18: [0x7f1e43047b43] libc.so.6 pthread_create.c:442 start_thread()
#19: [0x7f1e430d9a00] libc.so.6 clone3.S:83 __clone3()

[Sep  6 18:13:33] NOTICE[52879][C-00000001]: channel_internal_api.c:573 ast_channel_generatordata_set: generatordata_set was (nil), setting to 0x7f1d40004c30
[Sep  6 18:13:33] ERROR[52879][C-00000001]:   Got 20 backtrace records
# 0: asterisk ast_channel_generatordata_set()
# 1: asterisk <unknown>()
# 2: asterisk <unknown>()
# 3: asterisk ast_read_stream()
# 4: asterisk <unknown>()
# 5: asterisk <unknown>()
# 6: asterisk bridge_channel_internal_join()
# 7: asterisk ast_bridge_join()
# 8: asterisk ast_bridge_call_with_flags()
# 9: asterisk ast_bridge_call()
#10: [0x7f1e3149886b] app_dial.so app_dial.c:3442 dial_exec_full()
#11: [0x7f1e314992ca] app_dial.so app_dial.c:3516 dial_exec()
#12: asterisk pbx_exec()
#13: asterisk <unknown>()
#14: asterisk ast_spawn_extension()
#15: asterisk <unknown>()
#16: asterisk <unknown>()
#17: asterisk <unknown>()
#18: [0x7f1e43047b43] libc.so.6 pthread_create.c:442 start_thread()
#19: [0x7f1e430d9a00] libc.so.6 clone3.S:83 __clone3()

[Sep  6 18:13:33] NOTICE[52879][C-00000001]: channel_internal_api.c:573 ast_channel_generatordata_set: generatordata_set was 0x7f1d40004c30, setting to (nil)
[Sep  6 18:13:33] ERROR[52879][C-00000001]:   Got 20 backtrace records
# 0: asterisk ast_channel_generatordata_set()
# 1: asterisk <unknown>()
# 2: asterisk <unknown>()
# 3: asterisk ast_read_stream()
# 4: asterisk <unknown>()
# 5: asterisk <unknown>()
# 6: asterisk bridge_channel_internal_join()
# 7: asterisk ast_bridge_join()
# 8: asterisk ast_bridge_call_with_flags()
# 9: asterisk ast_bridge_call()
#10: [0x7f1e3149886b] app_dial.so app_dial.c:3442 dial_exec_full()
#11: [0x7f1e314992ca] app_dial.so app_dial.c:3516 dial_exec()
#12: asterisk pbx_exec()
#13: asterisk <unknown>()
#14: asterisk ast_spawn_extension()
#15: asterisk <unknown>()
#16: asterisk <unknown>()
#17: asterisk <unknown>()
#18: [0x7f1e43047b43] libc.so.6 pthread_create.c:442 start_thread()
#19: [0x7f1e430d9a00] libc.so.6 clone3.S:83 __clone3()

[Sep  6 18:13:33] NOTICE[52879][C-00000001]: channel_internal_api.c:573 ast_channel_generatordata_set: generatordata_set was (nil), setting to 0x7f1d40004c30
[Sep  6 18:13:33] ERROR[52879][C-00000001]:   Got 20 backtrace records
# 0: asterisk ast_channel_generatordata_set()
# 1: asterisk <unknown>()
# 2: asterisk <unknown>()
# 3: asterisk ast_read_stream()
# 4: asterisk <unknown>()
# 5: asterisk <unknown>()
# 6: asterisk bridge_channel_internal_join()
# 7: asterisk ast_bridge_join()
# 8: asterisk ast_bridge_call_with_flags()
# 9: asterisk ast_bridge_call()
#10: [0x7f1e3149886b] app_dial.so app_dial.c:3442 dial_exec_full()
#11: [0x7f1e314992ca] app_dial.so app_dial.c:3516 dial_exec()
#12: asterisk pbx_exec()
#13: asterisk <unknown>()
#14: asterisk ast_spawn_extension()
#15: asterisk <unknown>()
#16: asterisk <unknown>()
#17: asterisk <unknown>()
#18: [0x7f1e43047b43] libc.so.6 pthread_create.c:442 start_thread()
#19: [0x7f1e430d9a00] libc.so.6 clone3.S:83 __clone3()
{code}

this pattern will repeat rather frequently for the duration of the DTMF pulse.

I would guess at this point that the only way to make accessing the generator (or at least the generatordata) externally after it's been set already to be thread safe would be to require locking around the channel while ast_read has the generator temporarily set to null (see channel.c around line 3646 in current git 18)

{code}
       } else if (ast_channel_fd_isset(chan, AST_GENERATOR_FD) && ast_channel_fdno(chan) == AST_GENERATOR_FD) {
               /* if the AST_GENERATOR_FD is set, call the generator with args
                * set to -1 so it can do whatever it needs to.
                */
               void *tmp = ast_channel_generatordata(chan);
               ast_channel_generatordata_set(chan, NULL);     /* reset to let ast_write get through */                ast_channel_generator(chan)->generate(chan, tmp, -1, -1);
               ast_channel_generatordata_set(chan, tmp);
               f = &ast_null_frame;
               ast_channel_fdno_set(chan, -1);
               goto done;
       }
{code}

I'm guessing adding additional locking here might be questionable for performance, so an alternate proposal for how to do this...

1. ast_senddigit_begin gets a duration argument. If zero, it behaves the same way as it did before (since the external function is going to have a minimum of AST_EMULATE_DTMF_DURATION, it will always have a non-zero value). It also starts returning an enum that has a success value, a failure value, and a value that indicates that it is handled by the generator. (or an alternate api can be provided to avoid disrupting other code)

2. ast_playtones_start gets a duration argument as well (or an alternate api can be provided since that's probably less disruptive to other code)

3. playtones generator will set an end time based on the duration to the generatordata if it is provided in ast_playtones_start. From that point on it will perform a time check during its generate callback and end itself if the current time exceeds the stop time.

I'm still trying to work out a patch for this myself, but I may not be able to provide it (or at least not expediently) because I have to run anything I might contribute past legal for the immediate future.

By: Jonathan Rose (JonathanRRose) 2022-09-07 10:22:22.278-0500

Not depicted in the logs above, but I see generator_force() (channel.c:2925) also gets invoked a l between the start and stop of sending a digit and it is very clearly doing something risky / potentially racy by unlocking the channel after swapping the generator data with null value. Contrary to what I thought on the previous pass, ast_read does keep the channel locked for the full duration while it keeps the generatordata swapped to a tmp store, so this is more likely to be the culprit. Attempting to keep this lock on for the duration of generator_force's swap consistently results in read errors though.

Also deactivating the generator already locks the channel prior to trying to detect the generator data, so that end of the problem is fine.

Right now I'm leaning toward the opinion that trying to stop any generator by invoking ast_deactivate_generator() is going to be inherently prone to failure because of generator_force and I'm not sure if there is anything that can be done to resolve that aside from adding a read write lock to the generatordata specifically or something along those lines.

By: Jonathan Rose (JonathanRRose) 2022-09-07 14:52:38.079-0500

I have a functional patch that solves the problem now. Something occurred to me with reference to playtones that made this patch considerably easier.

ast_playtones_start actually already has a way to supply durations to the tones you send, you just need to include a "/<duration in ms>,0" with the set of frequencies to play.

I still can't provide a patch until I run it by legal, but I can tell you all the way I ended up solving this problem in loose detail.

All changes are in channel.c

1. I made a static variant of the ast_senddigit_begin function. It includes duration as an argument. I changed from invoking playtones directly to building a playtones string selecting the tone to play from the existing table that translates digits to frequency strings and then appending '/%u,0" where %u is the duration. I have a separate success return value for when the function completes by using indications to create an inband DTMF tone.

2. Where ast_senddigit_external previously invoked ast_senddigit_begin, I now invoke my new static variant. If the function returns with the value indicating that it used inband DTMF indications, I do an immediate return without doing the sleep and ast_senddigit_end.

Not only does this result in fixing the bug, but inband DTMF created via PlayDTMF also sounds a little cleaner and more reliable in the normal case... at least in my experiments so far. I presume it's because there is less channel locking necessary since I don't need to stop the generator in a disruptive fashion.