Summary:ASTERISK-22854: [patch] - Deadlock between cel_pgsql unload and core_event_dispatcher taskprocessor thread
Reporter:Etienne Lessard (hexanol)Labels:
Date Opened:2013-11-13 07:17:38.000-0600Date Closed:2013-12-31 15:27:03.000-0600
Versions:11.6.0 Frequency of
Environment:Attachments:( 0) cel_pgsql_fix_deadlock_event.patch
Description:A deadlock can happens between a thread unloading or reloading the cel_pgsql module and the core_event_dispatcher taskprocessor thread.

When the core_event_dispatcher taskprocessor thread is deadlocked, bad things follow, like:
* queue member status are not updated
* BLF on SIP phones are not updated
* etc, i.e. everything that use the event system...

Observed and reproducible on asterisk 11.6.0.

Description of what is happening:

Thread 1 (for example, a netconsole thread):
# a "module reload cel_pgsql" is launched
# the thread enter the "my_unload_module" function (cel_pgsql.c)
# the thread acquire the write lock on psql_columns
# the thread enter the "ast_event_unsubscribe" function (event.c)
# the thread try to acquire the write lock on ast_event_subs[sub->type]

Thread 2 (core_event_dispatcher taskprocessor thread):
# the taskprocessor pop a CEL event
# the thread enter the "handle_event" function (event.c)
# the thread acquire the read lock on ast_event_subs[sub->type]
# the thread callback the "pgsql_log" function (cel_pgsql.c), since it's a subscriber of CEL events
# the thread try to acquire a read lock on psql_columns

To reproduce the problem, I use sipp to generate calls on asterisk, and at the same time, I do a 'while sleep 0.1; do echo "$(date) Reloading..."; asterisk -rx "module reload cel_pgsql.so"; done'
Comments:By: Etienne Lessard (hexanol) 2013-11-13 07:19:29.140-0600

I've attached a patch fixing the problem.