Summary: | ASTERISK-17740: Deadlock in handle_request_bye() and mutex error in handle_incoming() | ||
Reporter: | Kirill Katsnelson (kkm) | Labels: | |
Date Opened: | 2011-04-22 01:55:14 | Date Closed: | 2011-11-01 16:42:59 |
Priority: | Critical | Regression? | No |
Status: | Closed/Complete | Components: | Channels/chan_sip/General |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) locks-backtrace-relevant-redacted.txt | |
Description: | 1. Asterisk locks up under heavy load. Getting a freeze every other day. 2. An error message was reported once ERROR[17987]: lock.c:384 in __ast_pthread_mutex_unlock: chan_sip.c line 23627 (handle_incoming): mutex 'owner' freed more times than we've locked! ****** STEPS TO REPRODUCE ****** Could not reproduce at will. ****** ADDITIONAL INFORMATION ****** See the attached `core show locks' and gdb backtrace snippets showing the 2 interlocked threads. The do_monitor thread takes locks in a wrong order: the pvt is locked, and it deadlocked on trying to get a lock on the channel. The code in handle_request_bye() is written in an assumption that is is entered into with channel lock held: struct ast_channel *bridge = p->owner ? ast_bridged_channel(p->owner) : NULL; /* We need to get the lock on bridge because ast_rtp_instance_set_stats_vars will attempt * to lock the bridge. This may get hairy... */ while (bridge && ast_channel_trylock(bridge)) { ast_channel_unlock(p->owner); do { /* Can't use DEADLOCK_AVOIDANCE since p is an ao2 object */ sip_pvt_unlock(p); usleep(1); sip_pvt_lock(p); } while (p->owner && ast_channel_trylock(p->owner)); bridge = p->owner ? ast_bridged_channel(p->owner) : NULL; } Apparently, the body of the outer while loop is not entered into (or there would be a warninig from an attempt to release an unowned lock). Later, if (p->owner) { ast_rtp_instance_set_stats_vars(p->owner, p->rtp); } and this is where the deadlock occurs, as p->owner is not locked. On a different day (and this is why I firmly relate the 2 issues), I received the following event on the same server, although no other ill effects ensued from it: ERROR[17987]: lock.c:384 in __ast_pthread_mutex_unlock: chan_sip.c line 23627 (handle_incoming): mutex 'owner' freed more times than we've locked! handle_incoming() is what calls handle_request_bye() and also fully assumes the owner locks is held. Apparently, in rare cases the assumption is not true. I am tracking down why that could happen, but help is indeed appreciated. | ||
Comments: | By: Kirill Katsnelson (kkm) 2011-04-22 06:00:42 See also https://reviewboard.asterisk.org/r/1182/ -- a patch by dvossel that changes deadlock avoidance logic there. By: Leif Madsen (lmadsen) 2011-04-26 07:43:22 Thanks for the thorough bug report! By: Russell Bryant (russell) 2011-04-26 15:25:37 dvossel's patch from that reviewboard link has been merged. Is still a problem? By: Kirill Katsnelson (kkm) 2011-04-26 16:16:12 I am running a patched server for the 3rd day in a row now, no freezes. But I think it is a bit early to tell. I'd wait for a week before concluding that it was the cure. By: Kirill Katsnelson (kkm) 2011-05-04 04:17:49 The patch apparently resolves the issue. Never observed it any more since applying the patch. Is the patch targeted for 1.8.4? By: David Vossel (dvossel) 2011-11-01 16:42:59.953-0500 Fixed in at least 1.8.7, possibly before. |