|Summary:||ASTERISK-07062: [patch] Race condition in app_meetme generates a SIGSEGV|
|Date Opened:||2006-05-31 04:11:34||Date Closed:||2011-06-07 14:00:48|
|Environment:||Attachments:||( 0) 20060622__meetme_race_fix.diff.txt|
A race condition in function "conf_run" happens when a caller is joining
a conference just before the last conference use it leaving it.
Here's the details:
Say, the last caller "A" is leaving the conference while another caller "B"
is joining it.
A's thread is between line 1565 in app_meetme.c and line 1617, "conflock" mutex
is locked. B's thread is waiting for "conflock" mutex on line 865 of app_meetme.c.
A's thread is seeing that it it the last user and on line 1582 it calls "conf_free" releasing the conference. Then A's thread proceeds to line
1617 to release "conflock".
Once "conflock"" is released, B's thread grabs it on line 865 and accesses "conf" structure that has just been released by A's thread.
Yesterday this race condition crashed my PBX. :(
I think the right solution is to keep conference reference counting at
the place where "conf_run" is called.
|Comments:||By: BJ Weschke (bweschke) 2006-05-31 06:21:31|
Wow. Ok. Talk about catching lightning in a bottle! :) It's certainly conceivable that the cnf memory structure could have gone away by the time it gets the lock as you describe, but timing has to be exact in order for this to happen.
I think probably the easiest way to handle this would be to add an integer to the conference structure called "numjoining" or something like that and we'd increment that once we've found a valid conference structure during the join process and only decrement it once the user is into the conference in conf_run (past line 890). That way, during the leave process, we can only conf_free if !numjoining. Make sense?
By: constfilin (constfilin) 2006-05-31 08:50:57
I think that once conf structure is passed to conf_run is should already have
ref count incremented. This way the code will not have to assume that "conf_run" will increment the count ASAP.
Basically each time when we put a pointer to conf into a variable (any variable!), the reference count to this object on static "confs" list should
increment. When this variable goes out of scopce the counter should decrement.
In C++ this can be done automatically, in C this has to be done manually.
In terms of app_meetme.c the right solution is to have "find_conf" increment
ref count (unless it returns NULL) and there should be a
function "release_conf" that decrements the refcount and gets called when
variable "cnf" initialized by "find_conf" goes out of scope. "release_conf"
also frees the conference and removes it from "confs" is the ref count reaches
We can still use "users" as the field to keep reference count but for clarity I
would have renamed this field to "refcount".
How about it?
By: Tony Mountifield (softins) 2006-05-31 09:01:50
I think you need to keep users and refcount separate, since they have different meanings:
refcount would be as constfilin described - a count of the referrers to the structure.
users should always match the number of nodes in the "firstuser" list, and should be incremented or decremented in the same critical section that adds to or removes from the users list.
By: Tony Mountifield (softins) 2006-05-31 09:12:50
Actually, the trunk version of app_meetme.c already has a refcount in the ast_conference structure, so I think this issue has already been fixed.
I've just looked at SVN for when the refcount was added, and it refers to bug ASTERISK-6351, of which this current bug may be a duplicate.
I guess the fix ought to be ported to the 1.2 branch.
By: Serge Vecher (serge-v) 2006-06-05 10:00:14
constfilin: do you want to produce a backported patch?
By: Tilghman Lesher (tilghman) 2006-06-22 14:47:15
A patch for a possible fix for 1.2
By: Serge Vecher (serge-v) 2006-06-28 10:12:37
constfilin, softins: any luck testing Corydon's patch?
By: constfilin (constfilin) 2006-06-28 13:22:48
I haven't tried Corydon76's patch. Instead I took the lastest app_meetme.c out of SVN and made it compile with my version of asterisk (I splitted from the main branch in March 2005). So far it has been working fine for me.
Thanks for patch anyway!
By: Serge Vecher (serge-v) 2006-06-28 13:40:20
alright, well we can't commit an untested patch. If anybody is interested in seeing this in 1.2 branch, please test the patch and ask the bug-marshall to reopen the issue with your feedback.