Summary:ASTERISK-06038: chan_phone deadlocks
Reporter:matti (matti)Labels:
Date Opened:2006-01-11 05:16:48.000-0600Date Closed:2006-03-06 15:17:17.000-0600
Versions:Frequency of
Environment:Attachments:( 0) diff
( 1) diff2
( 2) diff3
( 3) gdb.txt
Description:chan_phone.so deadlocks because the monitoring thread is cancelled with a mutex locked.


A deadlock occurs in the do_monitor function, e.g. ast_mutex_lock(&monlock).
Comments:By: matti (matti) 2006-01-11 05:29:18.000-0600

signal.h may have to be included in order to compile the patch with function pthread_kill. Therefore, I uploaded diff2.

By: matti (matti) 2006-01-12 02:37:16.000-0600

I uploaded diff3 because no warning should be given when the select system call in function do_monitor is interrupted by signal SIGURG. The signal SIGURG is used to end the select system call so that the monitoring thread ends.

According to the POSIX threads documentation, asynchronous canceling must be used only when mutex locking and unlocking are not done. The SVN Asterisk chan_phone violates that rule and I suspect therefore deadlocks occur.

By: matti (matti) 2006-01-19 05:16:54.000-0600

I have sent a fax containing copyright waiver disclaimer.

By: Tilghman Lesher (tilghman) 2006-01-22 00:41:41.000-0600

What you're describing here is not possible.  pthread_cancel() is only ever called when monlock is able to be gotten, which means that the do_monitor thread does NOT have a lock on monlock.  So it can only be cancelled when it is not holding the mutex.

If you're seeing a deadlock, I advise you to connect to the running Asterisk process with gdb and get a 'thread apply all bt full', then upload a file containing that gdb output, so we can better understand what exactly is causing the deadlock that you're seeing.

By: matti (matti) 2006-01-23 03:24:23.000-0600

I, too, thought the deadlock were impossible. However, it happened several times. I am using a Linux 2.4 kernel that does probably not support native threads.
The following link describes the same problem:

By: matti (matti) 2006-01-24 01:44:41.000-0600

Another reason to patch the code is that a deadlock can happen when function phone_new fails and hangs up the channel. Function do_monitor locks iflock and calls function phone_check_exception. Function phone_check_exception calls function phone_new if an incoming call is matched in the dialplan. Function phone_new calls function ast_pbx_start. If the function ast_pbx_start fails, function phone_new calls function ast_hangup. Function ast_hangup calls function phone_hangup. Function phone_hangup calls function restart_monitor. Function tries to lock monlock, which can cause a deadlock because function restart_monitor tries to lock monlock first and then iflock. Function restart_monitor can be called from phone_hangup in several threads at a time.

By: Tilghman Lesher (tilghman) 2006-03-06 15:17:17.000-0600

Okay, although there were formatting problems in the patch, I've corrected them.  Committed to trunk.