Summary:ASTERISK-19487: AMI module reload causes deadlock
Reporter:Philippe Lindheimer (p_lindheimer)Labels:
Date Opened:2012-03-05 18:46:40.000-0600Date Closed:2012-03-20 12:21:19
Status:Closed/CompleteComponents:Core/Configuration Core/ManagerInterface Core/PBX
Versions: Frequency of
must be completed before resolvingASTERISK-19271 Asterisk Blockers
must be completed before resolvingASTERISK-19272 Asterisk 10.3.0 Blockers
is caused byASTERISK-18479 ast_manager_register_struct attempts to unlock an uninitialized rwlock
Environment:FreePBX issues but still a general issueAttachments:( 0) backtrace.txt
( 1) backtrace.txt
( 2) core-show-locks.txt
( 3) core-show-locks.txt
( 4) core-show-locks.txt
Description:When calling a php script with the #exec directive a reload will hang because of an apparent "deadlock" if the called script access the manager and tries to do a "database show" through the AMI. It appears to successfully create the manager connection but fails on the first call which happens to be a database show.

Addition: The "module reload" that fails is called from the AMI. From the CLI a "module reload" succeeds properly so this is only affected when triggered from the AMI using the CLI module reload command.

Proper behavior has been verified to work on and before and the failure mode has been seen on, and 10.1.3.

The FreePBX script which can reproduce this is called generate_hints.php and this script is used during a reload to generate a set of hint instructions to the dialplan which must be done dynamically if someone is to be able to do the 'core reload' from the CLI and not through the GUI.
Comments:By: Richard Mudgett (rmudgett) 2012-03-05 20:04:55.839-0600

Debugging deadlocks: Please select DEBUG_THREADS and DONT_OPTIMIZE in the Compiler Flags section of menuselect. Recompile and install Asterisk (i.e. make install).  This will then give you the console command "core show locks." When the symptoms of the deadlock present themselves again, please provide output of the deadlock via:

# asterisk -rx "core show locks" | tee /tmp/core-show-locks.txt
# gdb -se "asterisk" <pid of asterisk> | tee /tmp/backtrace.txt
gdb> bt
gdb> bt full
gdb> thread apply all bt

Then attach the core-show-locks.txt and backtrace.txt files to this issue. Thanks!

By: Marcio gomes (mpg) 2012-03-06 05:40:37.772-0600

This is the correct backtrace.txt

By: Marcio gomes (mpg) 2012-03-06 05:43:03.006-0600

Correct show locks

By: Marcio gomes (mpg) 2012-03-06 05:44:55.895-0600

In this i can see the php scripts running with ps xa, i think this is the most importante

By: Philippe Lindheimer (p_lindheimer) 2012-03-06 09:33:32.548-0600

I originally said this can be reproduced on 10.1.3 per a report in the FreePBX ticket but from a further comment, it may have not been tested there. (whether it can be reproduced there then I don't know)

By: Philippe Lindheimer (p_lindheimer) 2012-03-07 12:17:23.532-0600

Addition: The "module reload" that fails is called from the AMI. From the CLI a "module reload" succeeds properly so this is only affected when triggered from the CLI.

The suspected Changeset that triggered this issue is: revision 340279 in the Asterisk 1.8 branch

By: Philippe Lindheimer (p_lindheimer) 2012-03-07 12:22:32.127-0600

The revision 340279 shows that it fixed ASTERISK-18479 which comments that one of the fixes is ASTERISK-13784 that was broke by ASTERISK-17785 so maybe some of these should be cross-linked to this ticket?

By: Jamuel Starkey (jamuel) 2012-03-07 12:46:06.141-0600

*Comment by Philippe Lindheimer added a comment - 07/Mar/12 12:17 PM*
{quote}Addition: The "module reload" that fails is called from the AMI. From the CLI a "module reload" succeeds properly so this is only affected when triggered from the CLI.{quote}

Did you really mean that it is only affected when triggered from the AMI?

By: Philippe Lindheimer (p_lindheimer) 2012-03-07 13:02:08.226-0600

Yes that was a typo fixed. I can go to the CLI directly and do a 'module reload' and it works fine. When done through the AMI it fails.

By: Philippe Lindheimer (p_lindheimer) 2012-03-13 10:38:04.280-0500

It appears that a connection to the manager, from a #exec script at initial Asterisk startup or with a restart command, results in AMI connections failing. This may or may not be related but was reported and confirmed on the linked FreePBX ticket and thus being mentioned here. It has not yet been tested on a or prior system where we know this current issue does not exist to help assess if this is a separate bug and when it first manifested itself.

By: Philippe Lindheimer (p_lindheimer) 2012-03-13 12:50:41.637-0500

I have confirmed that the connection to the AMI fails on upon starting or restarting Asterisk so it does not appear to be related to this issue. I'll try to see what happens on other releases of Asterisk (e.g. 1.4 / 1.6.X) to determine when the behavior ma have started up or how prevalent it is.

By: Richard Mudgett (rmudgett) 2012-03-15 18:14:37.764-0500

Please test the patch on reviewboard at: