[Home]

Summary:ASTERISK-15495: [patch] segfault on chanspy due to race in main/channel.c
Reporter:Tim Ringenbach at Asteria Solutions Group (tim_ringenbach)Labels:
Date Opened:2010-01-22 17:52:38.000-0600Date Closed:2010-03-11 09:33:53.000-0600
Priority:MinorRegression?No
Status:Closed/CompleteComponents:Applications/app_chanspy
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) chanspy_backtrace_0016678.txt
( 1) chanspy_crash.diff
( 2) chanspy_crashv2.diff
( 3) datastore_destroy_race.diff
( 4) example_bt_datestore_destroy_race.txt
Description:When channel.c destroys the datastore on the channel, it doesn't hold the channel lock while calling the destroy callback. It really ought to, because otherwise it's accessing the datastore list without locking. I've gotten a segfault trying to lock the mutex in the ds destroy function in app_chanspy because of this race.

Holding the channel lock during the destroy should be safe because it is also held during the fixup callback, and app_chanspy has already been patched to avoid the possible deadlock from that locking order issue.
Comments:By: David Vossel (dvossel) 2010-02-12 10:29:27.000-0600

I've uploaded a patch that I believe will resolve the issue.  It doesn't look like the problem you are having has to do with locking.  The chanspy datastore is created on the stack, and the mutex is destroyed before exiting the main chanspy function. Regardless if we lock the datastore's channel during removal or not, if a channel still has the chanspy datastore after the chanspy app exits it will crash.  I believe the real fix is to put a couple of checks in chanspy to guarantee this never happens.  Please test my patch and let me know if it resolves the issue.

I'm posting this patch on reviewboard as well.  It will be at this link, https://reviewboard.asterisk.org/r/500/

By: Tim Ringenbach at Asteria Solutions Group (tim_ringenbach) 2010-02-12 13:19:02.000-0600

It looks like I already had added a chanspy_ds_free(&chanspy_ds); in the same place you have it, and it was still crashing with that.

I'll try testing your patch and post something to the review board.

I suspect that I will still be able to make it crash though.

If you look at the code in channel.c:
      while ((datastore = AST_LIST_REMOVE_HEAD(&chan->datastores, entry)))
               /* Free the data store */
               ast_channel_datastore_free(datastore);

After it removes the datastore from the linked list, but before it calls ast_channel_datastore_free(), chanspy might decide it's done, and call chanspy_ds_free(). chanspy_ds_free() won't find anything to remove, but it is ok with that. I assumed since it didn't find anything to remove, it's safe to exit and let the stack memory go away. channel.c can't know to tell it otherwise because it's not on the list anymore. But the other thread is going about to call the destroy method, which is going to access invalid stack memory and crash.

By: David Vossel (dvossel) 2010-02-12 14:18:49.000-0600

I uploaded a new patch with an additional improvement.  We should really be checking the result of ast_channel_datastore_remove().

By: Tim Ringenbach at Asteria Solutions Group (tim_ringenbach) 2010-02-12 16:06:42.000-0600

I applied your new patch to a stock 1.4.29, plus I added a "sleep(30)" where I think the race is happening to try to bring it out. Now I am running the same stress test I was using when I originally got the crash, and I'll post the backtrace if I can get it to crash.

I don't see what good checking the return value of ast_channel_datastore_remove() does. You know it didn't remove the channel datastore and really should have, but what are you going to do about it? You have to not free the memory (by returning) until the other thread finishes calling your callback, and you have no way of knowing when that will be. That's why I think you need to protect it from happening by making channel.c whole the channel lock while doing that.

By: Tim Ringenbach at Asteria Solutions Group (tim_ringenbach) 2010-02-12 16:12:35.000-0600

I just attached the backtrace I produced while running your patch plus changing main/channel.c to look like this:

       /* Get rid of each of the data stores on the channel */
       while ((datastore = AST_LIST_REMOVE_HEAD(&chan->datastores, entry))) {
               sleep(30); //Try to make the race happen
               /* Free the data store */
               ast_channel_datastore_free(datastore);
       }

Is there a good reason not to add the locking my patch adds?

By: David Vossel (dvossel) 2010-02-12 16:21:04.000-0600

You're right, locking is the problem. I discarded my review request.

I've investigated any possible issues that could arise from locking the channel during the datastore removal in 1.4, and it does not appear to be an issue.  I'll have to do a little more research before I feel comfortable with this in 1.6.x and trunk though.  Thanks for the feedback!

By: Tim Ringenbach at Asteria Solutions Group (tim_ringenbach) 2010-02-12 16:22:16.000-0600

You're welcome. Thanks for reviewing my patch!

By: Digium Subversion (svnbot) 2010-02-12 17:30:18.000-0600

Repository: asterisk
Revision: 246545

U   branches/1.4/main/channel.c

------------------------------------------------------------------------
r246545 | dvossel | 2010-02-12 17:30:18 -0600 (Fri, 12 Feb 2010) | 16 lines

lock channel during datastore removal

On channel destruction the channel's datastores are removed and
destroyed.  Since there are public API calls to find and remove
datastores on a channel, a lock should be held whenever datastores are
removed and destroyed.  This resolves a crash caused by a race
condition in app_chanspy.c.

(closes issue ASTERISK-15495)
Reported by: tim_ringenbach
Patches:
     datastore_destroy_race.diff uploaded by tim ringenbach (license 540)
Tested by: dvossel



------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=246545

By: Digium Subversion (svnbot) 2010-02-12 17:32:34.000-0600

Repository: asterisk
Revision: 246546

_U  trunk/
U   trunk/main/channel.c

------------------------------------------------------------------------
r246546 | dvossel | 2010-02-12 17:32:34 -0600 (Fri, 12 Feb 2010) | 21 lines

Merged revisions 246545 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
 r246545 | dvossel | 2010-02-12 17:30:17 -0600 (Fri, 12 Feb 2010) | 16 lines
 
 lock channel during datastore removal
 
 On channel destruction the channel's datastores are removed and
 destroyed.  Since there are public API calls to find and remove
 datastores on a channel, a lock should be held whenever datastores are
 removed and destroyed.  This resolves a crash caused by a race
 condition in app_chanspy.c.
 
 (closes issue ASTERISK-15495)
 Reported by: tim_ringenbach
 Patches:
       datastore_destroy_race.diff uploaded by tim ringenbach (license 540)
 Tested by: dvossel
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=246546

By: Digium Subversion (svnbot) 2010-02-12 17:33:38.000-0600

Repository: asterisk
Revision: 246547

_U  branches/1.6.2/
U   branches/1.6.2/main/channel.c

------------------------------------------------------------------------
r246547 | dvossel | 2010-02-12 17:33:38 -0600 (Fri, 12 Feb 2010) | 28 lines

Merged revisions 246546 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
 r246546 | dvossel | 2010-02-12 17:32:33 -0600 (Fri, 12 Feb 2010) | 21 lines
 
 Merged revisions 246545 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
   r246545 | dvossel | 2010-02-12 17:30:17 -0600 (Fri, 12 Feb 2010) | 16 lines
   
   lock channel during datastore removal
   
   On channel destruction the channel's datastores are removed and
   destroyed.  Since there are public API calls to find and remove
   datastores on a channel, a lock should be held whenever datastores are
   removed and destroyed.  This resolves a crash caused by a race
   condition in app_chanspy.c.
   
   (closes issue ASTERISK-15495)
   Reported by: tim_ringenbach
   Patches:
         datastore_destroy_race.diff uploaded by tim ringenbach (license 540)
   Tested by: dvossel
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=246547

By: Digium Subversion (svnbot) 2010-02-12 17:34:10.000-0600

Repository: asterisk
Revision: 246548

_U  branches/1.6.1/
U   branches/1.6.1/main/channel.c

------------------------------------------------------------------------
r246548 | dvossel | 2010-02-12 17:34:10 -0600 (Fri, 12 Feb 2010) | 28 lines

Merged revisions 246546 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
 r246546 | dvossel | 2010-02-12 17:32:33 -0600 (Fri, 12 Feb 2010) | 21 lines
 
 Merged revisions 246545 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
   r246545 | dvossel | 2010-02-12 17:30:17 -0600 (Fri, 12 Feb 2010) | 16 lines
   
   lock channel during datastore removal
   
   On channel destruction the channel's datastores are removed and
   destroyed.  Since there are public API calls to find and remove
   datastores on a channel, a lock should be held whenever datastores are
   removed and destroyed.  This resolves a crash caused by a race
   condition in app_chanspy.c.
   
   (closes issue ASTERISK-15495)
   Reported by: tim_ringenbach
   Patches:
         datastore_destroy_race.diff uploaded by tim ringenbach (license 540)
   Tested by: dvossel
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=246548

By: Digium Subversion (svnbot) 2010-02-12 17:35:07.000-0600

Repository: asterisk
Revision: 246549

_U  branches/1.6.0/
U   branches/1.6.0/main/channel.c

------------------------------------------------------------------------
r246549 | dvossel | 2010-02-12 17:35:06 -0600 (Fri, 12 Feb 2010) | 28 lines

Merged revisions 246546 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
 r246546 | dvossel | 2010-02-12 17:32:33 -0600 (Fri, 12 Feb 2010) | 21 lines
 
 Merged revisions 246545 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
   r246545 | dvossel | 2010-02-12 17:30:17 -0600 (Fri, 12 Feb 2010) | 16 lines
   
   lock channel during datastore removal
   
   On channel destruction the channel's datastores are removed and
   destroyed.  Since there are public API calls to find and remove
   datastores on a channel, a lock should be held whenever datastores are
   removed and destroyed.  This resolves a crash caused by a race
   condition in app_chanspy.c.
   
   (closes issue ASTERISK-15495)
   Reported by: tim_ringenbach
   Patches:
         datastore_destroy_race.diff uploaded by tim ringenbach (license 540)
   Tested by: dvossel
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=246549