Summary:ASTERISK-20437: Deadlock with ast_context_remove_extension_callerid and handle_request_do
Reporter:Jeff Hutchins (jhutchins)Labels:
Date Opened:2012-09-17 17:08:06Date Closed:2012-09-24 14:21:27
Versions: Frequency of
One Time
Environment:Attachments:( 0) backtrace.full.txt
( 1) backtrace.txt
Description:As can be seen in the attached backtrace files thread 25 obtains a sip_pvt lock (0x2aaabc46b658) at chan_sip.c:25270 (#9 in thread 25) with a call to sip_pvt_lock_full and then later in the stack trace (#5 in thread 25) tries to acquire a context lock at pbx.c:4302 with a call to ast_rdlock_context. At the same time thread 2 obtains a context lock at pbx.c:5580 (#12 in thread 2) with a call to find_context_locked(context) which in turn calls ast_rdlock_context and then later in it's stack trace thread 2 attempts to obtain a sip_pvt lock (0x2aaabc46b658) at chan_sip.c:14485 (#5 in thread 2) with a call to sip_pvt_lock_full. This naturally causes a deadlock as each thread holds the resource the other is trying to acquire.
Comments:By: Michael L. Young (elguero) 2012-09-18 11:45:36.390-0500

Can you reproduce this on version 1.8.16? is really behind and there have been a lot of deadlock fixes since then based on a quick look at the changelog.

Also, I think the output of core show locks might be helpful in trying to troubleshoot this.

Debugging deadlocks: Please select DEBUG_THREADS and DONT_OPTIMIZE in the Compiler Flags section of menuselect. Recompile and install Asterisk (i.e. make install).  This will then give you the console command "core show locks." When the symptoms of the deadlock present themselves again, please provide output of the deadlock via:

# asterisk -rx "core show locks" | tee /tmp/core-show-locks.txt
# gdb -se "asterisk" <pid of asterisk> | tee /tmp/backtrace.txt
gdb> bt
gdb> bt full
gdb> thread apply all bt

Then attach the core-show-locks.txt and backtrace.txt files to this issue. Thanks!

By: Jeff Hutchins (jhutchins) 2012-09-18 12:39:00.928-0500

For an uncommon deadlock like this running with DEBUG_THREADS and DONT_OPTIMIZE on our production machines is not really an option as it is significantly less performant. However, the core-show-locks should not actually be required since I already did the leg work of finding the deadlock and explaining the exact lines where the deadlocks are occurring.

The newest version of certified asterisk is 1.8.11 and there has only been 3 deadlock fixes between and 1.8.11 none of which are pertinent to this issue. Which version of asterisk would you suggest I try?

By: Michael L. Young (elguero) 2012-09-18 22:26:02.450-0500

The latest version in the 1.8 branch is 1.8.16.  I didn't take a detailed look to see if it was exactly what you described above.  It was just a thought to help narrow this down rather than looking for a bug that might already have been fixed.

Just trying to collect enough data for when someone is ready to look at this.