Summary:ASTERISK-30039: cli: Targeted debug on startup deadlocks and creates unstable system
Reporter:N A (InterLinked)Labels:
Date Opened:2022-04-30 13:21:16Date Closed:2022-06-06 16:42:16
Versions:18.9.0 Frequency of
Environment:Attachments:( 0) full.txt
Description:If, while Asterisk is starting, you attempt to use tab completion with "core set debug X <tab complete>", and you do this BEFORE Asterisk is fully ready, the CLI will deadlock and the system will become permanently unstable and fail to ever initialize. Not all commands do this but "core set debug X <tab complete>" does do it. Seems to be an "unsafe" operation if Asterisk is not ready for some reason.

e.g. if you use SIGQUIT to exit the CLI and then go back into a remote console, if you run "core restart now" it will say no command found because the system never fully started so Asterisk is still not (and will never be) ready.

Even ast_coredumper fails because it fails to get the PID and there isn't an option to manually specify it.

This does reproduce consistently.
Comments:By: Asterisk Team (asteriskteam) 2022-04-30 13:21:17.094-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. Please note that log messages and other files should not be sent to the Sangoma Asterisk Team unless explicitly asked for. All files should be placed on this issue in a sanitized fashion as needed.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.

Please note that by submitting data, code, or documentation to Sangoma through JIRA, you accept the Terms of Use present at [https://www.asterisk.org/terms-of-use/|https://www.asterisk.org/terms-of-use/].

By: George Joseph (gjoseph) 2022-05-02 05:44:59.702-0500

I can't seem to reproduce with any version.  Probably timing dependent.

You can specify the asterisk pid using the {{PID}} envuronment variable.

$ PID=$(pidof asterisk) /var/lib/asterisk/scripts/ast_coredumper --running

By: N A (InterLinked) 2022-05-02 06:20:09.908-0500

I can reproduce on different machines, with different 18 versions.

It's unlikely to happen on faster machines because the restart is so quick. I tried on one machine and I just can't type the command fast enough before Asterisk initializes.

PID=20926 /var/lib/asterisk/scripts/ast_coredumper --running
Can't determine pid of the running asterisk instance

Am I doing this right? Same thing with --RUNNING

Funny thing is /var/run/asterisk/asterisk.pid has the right value here. Not sure why it's not reading that.

By: N A (InterLinked) 2022-05-02 06:32:48.656-0500

Okay, forget about ast_coredumper, I just did it manually with gdb and it worked fine:

gdb --batch -q -p 20926 -ex 'thread apply all bt full' -ex 'quit' > /tmp/full.txt

Attached is the full.txt. Something is seriously wrong. Literally every thread appears to be deadlocked. That would explain a lot!

By: Joshua C. Colp (jcolp) 2022-05-02 06:38:15.115-0500

Every thread is not deadlocked. CLI handling is deadlocked, which means CLI blocks, and CLI command registration blocks - thus blocking loading of modules.

By: N A (InterLinked) 2022-05-02 06:46:55.924-0500

I see, that would make sense. I guess calling ast_module_helper before the system is ready would lead to this deadlock then. In that same thread, there also appears to be a memory issue but not sure if that is related.

By: Joshua C. Colp (jcolp) 2022-05-02 06:49:02.905-0500

"Memory issue" is an incredibly vague statement.

This is just a classic deadlock as it's not safe to tab complete modules at startup. Loader has lock on things, module being loaded is registering a CLI command and waiting on CLI lock. Console has CLI lock, waiting on loader lock.

By: George Joseph (gjoseph) 2022-05-02 06:54:38.644-0500

You could put a check for "fully booted" before allowing CLI_GENERATE to run in cli.c/handlke_debug.

By: N A (InterLinked) 2022-05-02 07:17:20.794-0500

Wouldn't in handle_debug be too narrow? As I understand, no CLI_GENERATE at all, anywhere, can run until the system is fully booted. Or is it only certain modules?

Seems related to that other issue with the channel lists again, where running those commands before startup causes assertions. It seems like there is a broad set of things that shouldn't be allowed with the CLI at all until the system is fully booted.

By: Joshua C. Colp (jcolp) 2022-05-02 07:18:47.024-0500

In this particular case, it is specifically generating the list of modules that is the issue.

By: N A (InterLinked) 2022-05-02 07:24:13.428-0500

Yup, that makes sense. Not all commands do this, but module unload TAB for instance does.

So I think the right thing to do would be to sleep in ast_module_helper until the system is fully booted? I guess similar in a way to that other issue.

If nobody else is working on these, you can assign both issues to me. I can push a fix later this week.

By: George Joseph (gjoseph) 2022-05-02 07:27:01.144-0500

I think you have to be very careful here.  There may be internal uses of ast_module_helper that _need_ to run before the system is fully booted.

By: Friendly Automation (friendly-automation) 2022-06-06 16:42:17.429-0500

Change 18592 merged by Friendly Automation:
loader: Prevent deadlock using tab completion.


By: Friendly Automation (friendly-automation) 2022-06-06 16:51:37.249-0500

Change 18522 merged by Kevin Harwell:
loader: Prevent deadlock using tab completion.


By: Friendly Automation (friendly-automation) 2022-06-06 16:51:57.655-0500

Change 18593 merged by Kevin Harwell:
loader: Prevent deadlock using tab completion.


By: Friendly Automation (friendly-automation) 2022-06-06 16:52:08.401-0500

Change 18591 merged by Kevin Harwell:
loader: Prevent deadlock using tab completion.