ASTERISK-07500: Incorrect locking causes a race condition

[Home]

Summary: ASTERISK-07500: Incorrect locking causes a race condition

Reporter: Bill Nesbitt (bn999) Labels:

Date Opened: 2006-08-09 10:41:14 Date Closed: 2006-09-07 11:34:46

Priority: Minor Regression? No

Status: Closed/Complete Components: CDR/cdr_manager

Versions: Frequency of
Occurrence

Related
Issues:

Environment: Attachments:

Description: When using cdr batching, a cond_timedwait is used to wait for certain event notification. The return code is not checked. The state of the mutex depends on the success or failure of cond_timedwait and it's return code. This often results in the thread continuing thinking it has a lock when it does not. The result is a race condition and potentially other bad stuff.

****** ADDITIONAL INFORMATION ******

The fix is simple, check the return code and reacquire the lock if necessary:

In cdr.c, around line 1058, change:

ast_mutex_lock(&cdr_pending_lock);
ast_cond_timedwait(&cdr_pending_cond, &cdr_pending_lock, &timeout);
numevents = ast_sched_runq(sched);
ast_mutex_unlock(&cdr_pending_lock);

To:

ast_mutex_lock(&cdr_pending_lock);
if (!ast_cond_timedwait(&cdr_pending_cond, &cdr_pending_lock, &timeout)) {
ast_mutex_lock(&cdr_pending_lock);
}
numevents = ast_sched_runq(sched);
ast_mutex_unlock(&cdr_pending_lock);

Comments: By: Serge Vecher (serge-v) 2006-08-09 11:10:13

1. Could you please get a disclaimer on file (see bottom of http://bugs.digium.com/main_page.php) and cofirm with a note here when done?
2. It would be great if you could upload your change as a patchfile. See http://www.asterisk.org/developers/Patch_Howto

Thanks!
By: Bill Nesbitt (bn999) 2006-08-09 11:52:19

Disclaimer faxed.
By: Bill Nesbitt (bn999) 2006-08-09 12:04:11

Patch attached.
By: Bill Nesbitt (bn999) 2006-08-21 17:58:52

I would like to withdraw this patch. After reading the man pages on cond_timedwait a few more times, although it is not completely clear, I have concluded that the lock is always retained, even in the event of an error. The race condition is cased by something else yet to be determined. Please delete the attached file.
By: Serge Vecher (serge-v) 2006-09-06 12:29:05

bn999: any further updates?
By: Bill Nesbitt (bn999) 2006-09-07 04:14:44

Hmmm... no.

I ended up configuring the CDR in a different way which does not utilize the extra thread (which is where the problem comes in.) There is still a problem here, as can be witnessed by setting the DEBUG > 2 and in cdr.conf:

enable=yes
batch=yes
size=100
time=5
scheduleronly=no

Hundreds of "Processed 0 scheduled CDR batches from the run queue" will be logged every 5 seconds.
By: Joshua C. Colp (jcolp) 2006-09-07 11:34:46

Fixed in 1.2 as of revision 42260 and trunk as of revision 42261. Thanks!