Summary:ASTERISK-17534: [patch] Deadlock: ast_taskprocessor_get and SIP
Reporter:Alec Davis (alecdavis)Labels:
Date Opened:2011-03-09 15:09:37.000-0600Date Closed:2011-06-03 23:44:48
Versions:Frequency of
Environment:Attachments:( 0) backtrace-mar14.txt
( 1) bug18950.diff.txt
( 2) coreshowlocks-090311.txt
( 3) coreshowlocks-090311-summary.txt
( 4) coreshowlocks-mar14.txt
Description:Has only locked up once, but have been running for a few weeks now, and are unable to reproduce at this stage.


Deadlock summary:
=== Currently Held Locks ==============================================
=== <pending> <lock#> (<file>): <lock type> <line num> <function> <lock name> <lock addr> (times locked)
=== Thread ID: -1227080816 (tps_processing_function started at [  451] taskprocessor.c ast_taskprocessor_get())
=== ---> Lock #0 (pbx.c): MUTEX 9911 ast_rdlock_contexts &conlock 0x8217820 (1)   <<<<<<<<1>>>>>>>>
=== ---> Lock #1 (pbx.c): MUTEX 4271 handle_statechange hints 0x9a2b550 (1)
=== ---> Lock #2 (pbx.c): MUTEX 4272 handle_statechange hint 0xb0c3b578 (1)
=== ---> Waiting for Lock #3 (chan_sip.c): MUTEX 13591 cb_extensionstate p 0xcef2e28 (1) <<<<<<<<2>>>>>>>>
=== --- ---> Locked Here: chan_sip.c line 7472 (find_call)
=== -------------------------------------------------------------------
=== Thread ID: -1294894192 (do_monitor           started at [24470] chan_sip.c restart_monitor())
=== ---> Lock #0 (chan_sip.c): MUTEX 23964 handle_request_do &netlock 0xb67eb6c0 (1)
=== ---> Lock #1 (chan_sip.c): MUTEX 7472 find_call sip_pvt_ptr 0xcef2e28 (1) <<<<<<<<2>>>>>>>>
=== ---> Waiting for Lock #2 (pbx.c): MUTEX 9911 ast_rdlock_contexts &conlock 0x8217820 (1) <<<<<<<<1>>>>>>>>
=== --- ---> Locked Here: pbx.c line 9911 (ast_rdlock_contexts)
=== -------------------------------------------------------------------
Comments:By: Paul Belanger (pabelanger) 2011-03-09 15:55:35.000-0600

We'll need a backtrace too
Debugging deadlocks:

Please select DEBUG_THREADS and DONT_OPTIMIZE in the Compiler Flags section of menuselect. Recompile and install Asterisk (i.e. make install)

This will then give you the console command:

core show locks

When the symptoms of the deadlock present themselves again, please provide output of the deadlock via:

# asterisk -rx "core show locks" | tee /tmp/core-show-locks.txt

# gdb -se "asterisk" <pid of asterisk> | tee /tmp/backtrace.txt

gdb> bt
gdb> bt full
gdb> thread apply all bt

Then attach the core-show-locks.txt and backtrace.txt files to this issue. Thanks!

By: Alec Davis (alecdavis) 2011-03-13 22:18:36

uploaded files as requested.

but had to run ??
gdb /usr/sbin/asterisk <pid of asterisk> | tee /tmp/backtrace-mar14.txt

By: Irontec (irontec) 2011-03-28 05:24:28

Maybe is the same issue: ASTERISK-16961

We also have problems with locks in cb_extensionstate function, with the same Asterisk version

Latest patch from 0018310 is applied in 1.8.4 (RC)

We are going to try this patch with and see if deadlock happens again...

By: Alec Davis (alecdavis) 2011-03-28 05:42:22

Irontec, just edited your note, if you put a # immediately before the issue number, it will then link to the issue.

By: Alec Davis (alecdavis) 2011-03-28 06:01:24

Irontec, thanks for the association, I totally agree.

Using http://svnview.digium.com/svn/asterisk/branches/1.8/main/pbx.c?r1=302266&r2=302265&pathrev=302266

Just made a patch for from the actual commit, bug18950.diff.txt
edit: ... or could have just used https://reviewboard.asterisk.org/r/1072/diff/raw/

By: Steve Davies (one47) 2011-03-28 06:17:20

In relation to this, I raised


a couple of days ago, and would appreciate if someone could check my logic.

By: Irontec (irontec) 2011-03-28 06:32:59

one47, 'core-show-locks' seems to be the same issue.

There is a thread waiting for MUTEX 12959 in cb_extensionstate function.
But there isnt't any other thread with this MUTEX locked...

IMHO is the same issue.

By: Steve Davies (one47) 2011-03-28 08:00:41

Irontec, the information in ASTERISK-17607 only applies after the patch in ASTERISK-16961 has been applied as ASTERISK-16961 causes a change in how the locks are used.

Under load, we could cause locks in a few hours without ASTERISK-16961. We've had one lockup since, over a period of a couple of months, and I believe that ASTERISK-17607 might be the reason for it.

By: Steve Davies (one47) 2011-03-28 08:06:04

Apologies, I edited my earlier posts where I had posted the wrong issue tracker number. I should have been referring to ASTERISK-17607.

By: Alec Davis (alecdavis) 2011-06-03 23:44:47

fixed by http://svnview.digium.com/svn/asterisk/branches/1.8/main/pbx.c?r1=302266&r2=302265&pathrev=302266