Summary:ASTERISK-17740: Deadlock in handle_request_bye() and mutex error in handle_incoming()
Reporter:Kirill Katsnelson (kkm)Labels:
Date Opened:2011-04-22 01:55:14Date Closed:2011-11-01 16:42:59
Versions:Frequency of
Environment:Attachments:( 0) locks-backtrace-relevant-redacted.txt
Description:1. Asterisk locks up under heavy load. Getting a freeze every other day.

2. An error message was reported once

ERROR[17987]: lock.c:384 in __ast_pthread_mutex_unlock: chan_sip.c line 23627 (handle_incoming): mutex 'owner' freed more times than we've locked!

****** STEPS TO REPRODUCE ******

Could not reproduce at will.


See the attached `core show locks' and gdb backtrace snippets showing the 2 interlocked threads.

The do_monitor thread takes locks in a wrong order: the pvt is locked, and it deadlocked on trying to get a lock on the channel.

The code in handle_request_bye() is written in an assumption that is is entered into with channel lock held:

struct ast_channel *bridge = p->owner ? ast_bridged_channel(p->owner) : NULL;
/* We need to get the lock on bridge because ast_rtp_instance_set_stats_vars will attempt
* to lock the bridge. This may get hairy...
while (bridge && ast_channel_trylock(bridge)) {
do {
/* Can't use DEADLOCK_AVOIDANCE since p is an ao2 object */
} while (p->owner && ast_channel_trylock(p->owner));
bridge = p->owner ? ast_bridged_channel(p->owner) : NULL;

Apparently, the body of the outer while loop is not entered into (or there would be a warninig from an attempt to release an unowned lock). Later,

if (p->owner) {
ast_rtp_instance_set_stats_vars(p->owner, p->rtp);

and this is where the deadlock occurs, as p->owner is not locked.

On a different day (and this is why I firmly relate the 2 issues), I received the following event on the same server, although no other ill effects ensued from it:

ERROR[17987]: lock.c:384 in __ast_pthread_mutex_unlock: chan_sip.c line 23627 (handle_incoming): mutex 'owner' freed more times than we've locked!

handle_incoming() is what calls handle_request_bye() and also fully assumes the owner locks is held. Apparently, in rare cases the assumption is not true.

I am tracking down why that could happen, but help is indeed appreciated.
Comments:By: Kirill Katsnelson (kkm) 2011-04-22 06:00:42

See also https://reviewboard.asterisk.org/r/1182/ -- a patch by dvossel that changes deadlock avoidance logic there.

By: Leif Madsen (lmadsen) 2011-04-26 07:43:22

Thanks for the thorough bug report!

By: Russell Bryant (russell) 2011-04-26 15:25:37

dvossel's patch from that reviewboard link has been merged.  Is still a problem?

By: Kirill Katsnelson (kkm) 2011-04-26 16:16:12

I am running a patched server for the 3rd day in a row now, no freezes. But I think it is a bit early to tell. I'd wait for a week before concluding that it was the cure.

By: Kirill Katsnelson (kkm) 2011-05-04 04:17:49

The patch apparently resolves the issue. Never observed it any more since applying the patch.

Is the patch targeted for 1.8.4?

By: David Vossel (dvossel) 2011-11-01 16:42:59.953-0500

Fixed in at least 1.8.7, possibly before.