[Home]

Summary:ASTERISK-07007: [patch] deadlock between msglist_lock and loglock in logger.c
Reporter:Tony Mountifield (softins)Labels:
Date Opened:2006-05-21 16:19:50Date Closed:2006-05-24 14:16:35
Priority:BlockerRegression?No
Status:Closed/CompleteComponents:Core/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) logger.c-trunk-patch.diff
( 1) logger.c-v1.2-patch.diff
Description:I have Asterisk 1.2 branch running on a dual-Xeon box, and for two weeks running (after a recent update) I have had Asterisk lock up at the same time. After the first time I recompiled with DEBUG_THREADS so that if it happened again I could see where the mutexes had been claimed.

It turns out that what triggers it is my logrotate job that does: asterisk -rx "logger reload"

It might possible only get triggered on a dual-CPU box, I'm not sure.

My initial investigation shows the following:

ast_verbose() claims msglist_lock and then while holding that mutex it calls ast_log(). ast_log() then claims loglock.

reload_logger() claims loglock, and then when processing logger.conf it calls ast_verbose() to say it is parsing it.

In my deadlock situation I have one process trying to log "remote unix connection", which has claimed msglist_lock and is now sitting in ast_log waiting for loglock. I have another process severals stack frames down from reload_logger(), holding locklog and sitting inside ast_verbose() waiting for msglist_lock.

At this point every process that wants to log anything just blocks.

I've gathered the following information from the system while in the blocked state.

Here are the two mutexes:

(gdb) p msglist_lock
$1 = {mutex = {__m_reserved = 0, __m_count = 0, __m_owner = 0x902c560, __m_kind = 1, __m_lock = {__status = -1233055615, __spinlock = 85}}, file = {0x812d9a6 "logger.c", 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, lineno = {846, 0, 0, 0, 0,
   0, 0, 0, 0, 0}, reentrancy = 1, func = {0x812e54a "ast_verbose", 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, thread = {32770, 0, 0, 0, 0, 0, 0, 0, 0, 0}}
(gdb) p loglock
$2 = {mutex = {__m_reserved = 0, __m_count = 0, __m_owner = 0xb6800b00, __m_kind = 1, __m_lock = {__status = 151176545, __spinlock = 1}}, file = {0x812d9a6 "logger.c", 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, lineno = {387, 0, 0, 0, 0, 0,
   0, 0, 0, 0}, reentrancy = 1, func = {0x812e58a "reload_logger", 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, thread = {92438547, 0, 0, 0, 0, 0, 0, 0, 0, 0}}

This is the backtrace from the thread executing reload_logger(), holding loglock and waiting on msglist_lock:

(gdb) bt
#0  0xb7ef40d4 in __pthread_sigsuspend () from /lib/i686/libpthread.so.0
#1  0xb7ef3708 in __pthread_wait_for_restart_signal () from /lib/i686/libpthread.so.0
#2  0xb7ef558d in __pthread_lock () from /lib/i686/libpthread.so.0
#3  0xb7ef20dd in pthread_mutex_lock () from /lib/i686/libpthread.so.0
#4  0x08057aef in ast_verbose (fmt=0x812f5e5 "  == Parsing '%s': ") at lock.h:246
ASTERISK-1  0x08066142 in config_text_file_load (database=0xb67ff680 "ÿÿÿÿ", table=0xb67ff580 "", filename=0x812e102 "logger.conf", cfg=0x9060a70) at config.c:583
ASTERISK-2  0x080652b0 in ast_config_internal_load (filename=0x812e102 "logger.conf", cfg=0x9060a70) at config.c:959
ASTERISK-3  0x08066f9c in ast_config_load (filename=0xfffffffc <Address 0xfffffffc out of bounds>) at config.c:976
ASTERISK-4  0x0805b114 in init_logger_chain () at logger.c:311
ASTERISK-5  0x0805853b in reload_logger (rotate=0) at logger.c:434
ASTERISK-6 0x0805c2ff in handle_logger_reload (fd=-4, argc=2, argv=0xb67ffb40) at logger.c:499
ASTERISK-7 0x080b83b8 in ast_cli_command (fd=13, s=0xfffffffc <Address 0xfffffffc out of bounds>) at cli.c:1364
ASTERISK-8 0x080e0e03 in netconsole (vconsole=0x8168e60) at asterisk.c:555
ASTERISK-9 0xb7ef0ce1 in pthread_start_thread () from /lib/i686/libpthread.so.0
ASTERISK-10 0xb7e6461a in clone () from /lib/i686/libc.so.6

This is the backtrace from the process holding msglist_lock and waiting for loglock inside ast_log():

(gdb) bt
#0  0xb7ef40d4 in __pthread_sigsuspend () from /lib/i686/libpthread.so.0
#1  0xb7ef3708 in __pthread_wait_for_restart_signal () from /lib/i686/libpthread.so.0
#2  0xb7ef558d in __pthread_lock () from /lib/i686/libpthread.so.0
#3  0xb7ef20dd in pthread_mutex_lock () from /lib/i686/libpthread.so.0
#4  0x08058dbd in ast_log (level=5, file=0x812d9a6 "logger.c", line=898, function=0x812e54a "ast_verbose", fmt=0x813f98b "%s") at lock.h:246
ASTERISK-1  0x08057cfb in ast_verbose (fmt=0x8141cf0 "    -- Remote UNIX connection\n") at logger.c:898
ASTERISK-2  0x080e10ae in listener (unused=0x0) at asterisk.c:635
ASTERISK-3  0xb7ef0ce1 in pthread_start_thread () from /lib/i686/libpthread.so.0
ASTERISK-4  0xb7e6461a in clone () from /lib/i686/libc.so.6

I haven't worked out a fix as yet, and I'd be grateful for anyone's
input. It might be as simple as having reload_logger() claim msglist_lock
before it claims loglock. Or not :-)

For now I'll have to avoid rotating the logs.
Comments:By: Kai Militzer (kmilitzer) 2006-05-22 02:06:18

This also triggers on a single-CPU machine with HT, but is not always reproducible  on my system.

By: Tony Mountifield (softins) 2006-05-22 12:57:34

Well I've tried having reload_logger() claim msglist_lock before loglock, and releasing msglist_lock after loglock, and that does appear to cure the problem.

I've uploaded patches for v1.2 and trunk. The patches are identical apart from line numbers and svn revision number.

By: Serge Vecher (serge-v) 2006-05-22 13:02:00

kmilitzer: does the attached patch fix the issue for you too?

By: Kai Militzer (kmilitzer) 2006-05-22 14:13:39

It seems to have fixed the problem. My logcheck logrotate didn't kill the asterisk tonight. I will watch it ...



By: Joshua C. Colp (jcolp) 2006-05-24 14:16:34

Fixed in both 1.2 and trunk, revisions 30033 and 30034 respectively. Thanks!