Summary:ASTERISK-19848: Deadlock in Asterisk in ast_parse_device_state / Dial
Reporter:Sebastian Gutierrez (sum)Labels:
Date Opened:2012-05-07 11:32:48Date Closed:2012-09-17 14:33:34
Versions: Frequency of
Environment:CentOs 6.2 asterisk-1-8-13rc1 dahdi 2.6.1Attachments:( 0) Captura.PNG
( 1) coreshowlocks.txt
( 2) Debug.log
( 3) Debug.log
( 4) Error.txt
Description:[Edited desc - Rusty Newton & Matt Jordan] Core show locks output appears to indicates the device_state is waiting for a lock that appears to be held by a thread that did not unlock the correct number of times.

Orig desc:Asterisk Crashes and Hungs (deadlock and crash)
Comments:By: Sebastian Gutierrez (sum) 2012-05-07 12:09:49.489-0500

I also noticed, that the phones get all unrecheable but i can ping them (maybe some locking problem), I had to downgrade to 1.8.10 to see if this version has no issue since I saw a report about crashing also.

By: Matt Jordan (mjordan) 2012-05-07 12:12:47.407-0500

Debugging deadlocks: Please select DEBUG_THREADS and DONT_OPTIMIZE in the Compiler Flags section of menuselect. Recompile and install Asterisk (i.e. make install).  This will then give you the console command "core show locks." When the symptoms of the deadlock present themselves again, please provide output of the deadlock via:

# asterisk -rx "core show locks" | tee /tmp/core-show-locks.txt
# gdb -se "asterisk" <pid of asterisk> | tee /tmp/backtrace.txt
gdb> bt
gdb> bt full
gdb> thread apply all bt

Then attach the core-show-locks.txt and backtrace.txt files to this issue. Thanks!

Since you log states that 'core show locks' does not exist, I'm assuming this was compiled without DEBUG_THREADS.

Additionally, although its difficult to tell, this does not appear to be a crash.  If it is, please generate a backtrace from the core file using the instructions on the wiki [1].  Make sure that Asterisk is compiled with DONT_OPTIMIZE at a minimum, and preferably BETTER_BACKTRACES as well.

[1] https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace

By: Sebastian Gutierrez (sum) 2012-05-07 12:21:44.737-0500

I attached Debug.log that has full bt from the dump, I had to switch back to 1.8.10 due this is a production environment and apart from the crash I had massive unreacheable in all phones for some time and then they were back again, I know this is an optimized br but may guide you, I will be testing 1.8.10 in production, if the crash or deadlocks happen again I will try to use debug threads and dont optimize so we can have more information on the dump.

By: Matt Jordan (mjordan) 2012-05-07 12:45:47.020-0500

As I mentioned, the backtrace in your attached file does not indicate a crash, but a deadlock.  In that case, to move the issue forward, getting the 'core show locks' output using the instructions would be extremely helpful.

By: Sebastian Gutierrez (sum) 2012-05-21 16:57:20.340-0500

this is related to the core show locks entered in

By: Rusty Newton (rnewton) 2012-05-30 16:20:31.508-0500

attaching coreshowlocks from asterisk-19851

By: Sebastian Gutierrez (sum) 2012-05-30 16:39:14.784-0500

to give more information, this systems uses chanspy to monitor the calls, and also on this system this ASTERISK-19922 is happening maybe is related somehow.

By: Sebastian Gutierrez (sum) 2012-06-13 13:05:10.019-0500

yo can see at Captura.PNG the UDP on 5060 queued and not being processed, then we dumped the asterisk and get also the core show locks and BT full, and then the error on the dump

By: Sebastian Gutierrez (sum) 2012-09-17 14:33:34.666-0500

I moved to 10 and this bug is not present