Summary:ASTERISK-16169: app_queue's compare_weight can be called in contexts that don't lock the global queues lock
Reporter:Tim Ringenbach at Asteria Solutions Group (tim_ringenbach)Labels:
Date Opened:2010-05-28 15:37:21Date Closed:
Versions:Frequency of
Description:I haven't experienced any crashes from this issue but I thought I would report it anyway.

I noticed in the code that ring_one() is called from both try_calling() and wait_for_answer(). The former goes through a lot of trouble to lock the queues lock if we're using weights. The queues lock is then unlocked right before wait_for_answer() is called. But wait_for_answer() can also call ring_one(), which calls ring_entry(), which calls compare_weight(). And compare_weight() walks the queues list (in 1.4) or iterates the queue ao2 (in trunk).

As a side note: ever thought about creating an ast_assert_locked() macro that could be used in places that compare_weight that assume locks, to output a warning if that assumption is ever violated?
Comments:By: Mark Michelson (mmichelson) 2010-06-02 10:37:29

Not much to say here other than that you're definitely right that the behavior is inconsistent and should be fixed. I'm fairly certain that the calls to ring_one in wait_for_answer are done with no locks held, so it would likely be easy enough to just grab the global queue container lock prior to calling ring_one if weights are being used. No fancy deadlock avoidance would be necessary.

Regarding ast_assert_locked, I certainly like the idea and it would be helpful. The problem is that I have no idea how to implement such a thing. A naive approach might be to try unlocking and see if that succeeds or fails. The problem is that if it succeeds, we've just unlocked the lock. While we would obviously re-lock before returning from the macro, the operation has destroyed the atomicity of the operations we were trying to accomplish. Such a macro could safely be made if DEBUG_THREADS is enabled, since we use our own mutex structure that could be inspected to determine its use count. However, this is not on by default and we'd need a way to implement it with standard pthread_mutex_t types. Any ideas?

By: Tim Ringenbach at Asteria Solutions Group (tim_ringenbach) 2010-06-02 11:16:47

Oh you're right. For some reason I thought wait_for_answer was called with some locks held.

As for as ast_assert_locked, I'm pretty sure it's impossible to implement without storing the pthread_t of the thread holding the lock. So it's something that could be implemented when DEBUG_THREADS is on and expand to nothing otherwise.

The rational section of
http://www.opengroup.org/onlinepubs/009695399/functions/pthread_mutex_lock.html says "Mutex objects are intended to serve as a low-level primitive from which other thread synchronization functions can be built." and "Likewise, while being able to extract the thread ID of the owner of a mutex might be desirable, it would require storing the current thread ID when each mutex is locked, and this could incur unacceptable levels of overhead. Similar arguments apply to a mutex_tryunlock operation.".

If you only checked the use count, the pthread_mutex does have that internally for recursive locks, but accessing it is both nonportable and nasty.

Since asterisk uses recursive locks, you could just try locking the lock again. But that would usually work even when lock wasn't locked.