Summary:ASTERISK-05861: [patch] possible incomplete locking in sip_pvt's packets list
Reporter:Luigi Rizzo (rizzo)Labels:
Date Opened:2005-12-17 12:11:11.000-0600Date Closed:2006-03-06 18:08:18.000-0600
Versions:Frequency of
Environment:Attachments:( 0) packets_lock.diff
Description:I am not 100% sure, but i am under the impression that the "packets"
list in chan_sip.c ::  struct sip_pvt is not locked properly.

Specifically, __sip_ack() navigates the list without holding p->lock, and only
acquires that when removing the target from the list. There is however another
procedure, retrans_pkt(), which can remove records from the list (correctly
protected by the p->lock), which means that there might be a race condition
where __sip_ack() will try to dereference a freed entry.

The attached patch moves the lock outside so that list navigation is protected.

Of course I may be wrong, but it is not obvious at all what whold
prevent the above race, so if the current code is correct then a proper
comment explaining why it is correct should be committed.
Comments:By: Luigi Rizzo (rizzo) 2005-12-18 17:32:39.000-0600

followup with more details:
__sip_ack() most of the times is called with p already locked.
There is however one exception in the path
sip_reg_timeout() ==> __sip_pretend_ack() ==> __sip_ack()
where __sip_ack() is called without the lock.
Unfortunately in the above case __sip_pretend_ack() also accesses the
p->packets list without holding the lock, so  grabbing the lock in
__sip_ack() as in the original code or in my patch is still not sufficient.

I now think the proper fix would be to:
- adequately comment the assumptions and code paths that use __sip_ack()
- acquire and release the lock around the call to __sip_pretend_ack() in
- possibly remove the lock in __sip_ack() because it is now unnecessary
 (doesn't harm though);

Note, i am not very familiar with this code so it would be nice if
the original author or someone more expert than me could review the
above analysis.

By: Olle Johansson (oej) 2006-01-09 14:39:44.000-0600

I can't possibly comment on this, since the locking is still a bit of magic to me. Just happy someone tries to clear up that mess...

Anyone else that can make a comment?

By: Olle Johansson (oej) 2006-01-30 04:38:14.000-0600

We still need a locking-expert to look at this issue. /O

By: Tilghman Lesher (tilghman) 2006-03-06 18:07:58.000-0600

You're correct, although I can see where it could have been missed.  The issue is that there are only two threads which are allowed to access that list:  the channel thread and the monitor thread.  There exists a race condition right now where the monitor thread could queue a packet that could be lost, if the formerly-first packet on the list is to be ack'ed.

Committed the portion related to this race to 1.2, merged to trunk.