Summary: | ASTERISK-20175: Security Vulnerability: denial of service attack through exploitation of device state caching | ||||||||||||||
Reporter: | Matt Jordan (mjordan) | Labels: | |||||||||||||
Date Opened: | 2012-07-26 14:39:57 | Date Closed: | 2013-01-02 14:30:53.000-0600 | ||||||||||||
Priority: | Major | Regression? | No | ||||||||||||
Status: | Closed/Complete | Components: | Core/General | ||||||||||||
Versions: | 1.8.14.1 10.6.1 10.6.1-digiumphones | Frequency of Occurrence | |||||||||||||
Related Issues: |
| ||||||||||||||
Environment: | Attachments: | ( 0) event-cachability-3.diff | |||||||||||||
Description: | From Russell: I have been working with someone on some performance issues with their Asterisk cluster that uses distributed device state. One of the problems that we identified was that the size of the device state cache was growing out of control. To view the cache, you can do: {noformat} *CLI> event dump cache DeviceState {noformat} In particular, the states that were causing the problem on these systems were things like: {noformat} Local/12341234@whatever DAHDI/i8/12341234 {noformat} Certain "device states" like this are useless to cache. Imagine an outbound call center that uses Local channels in their dialplan and PRIs for doing outbound calls. They get entries in the cache for every number they dial. Ouch. That's a bug that needs to be addressed and I'm not quite sure how to fix it in a good generic way yet. However, that's not the vulnerability. It's just the background that led me to the vulnerability. I started thinking about how far this problem really reaches. I wondered, can I remotely grow the cache, causing performance problems and eventually running out of RAM? Unfortunately, yes. I have verified this with SIP. I imagine the same issue exists with IAX2. In chan_sip, if you allow anonymous calls, the channel name is based on the domain in the From header. I verified this vulnerability by doing the following: {noformat} ; sip.conf [someserver] type=peer host=someserver.com fromdomain=example.com fromuser=foo {noformat} I then used a call file: {noformat} Channel: SIP/foo@someserver CallerID: "My Name" <1111111> Application: Playback Data: beep {noformat} The domain in the From header should be "example.com". The channel name on the remote server should be "SIP/example.com-<something>". An entry will be added to the cache for "SIP/example.com". This means that I can very easily continue to send calls with different domains and fill up the cache. The public server that I tested this against happened to be running Asterisk 10. I believe that this affects all versions that have the device state cache, which would be 1.6.something and up. This is a nasty problem and I'm not sure what the fix should be. It's an architectural problem. The cache needs to only consist of things that are defined locally, and not things that are dynamically generated, but there's not a good generic way to determine that given a "device" name. I'd be happy to brainstorm with others on this. While the original report came from me, I'd like to credit Leif Madsen and Joshua Colp for their assistance with verifying the vulnerability. Thanks, -- Russell Bryant | ||||||||||||||
Comments: | By: Kinsey Moore (kmoore) 2012-08-02 13:59:08.375-0500 Consumers for further state distribution: res_xmpp res_jabber res_corosync Direct consumers of device state: app_queue CCSS pbx hints devicestate All of these seem to hook device state unconditionally except CCSS which hooks information for specific devices as they require CCSS. By: Kinsey Moore (kmoore) 2012-08-02 15:11:30.255-0500 It also appears that main/devicestate.c is the only consumer of cached device state events. By: Kinsey Moore (kmoore) 2012-08-09 09:33:03.952-0500 I finally got a chance to talk with Russell and this is going to be pretty nasty to fix. Only the channel driver that creates the channel can know whether its state should be cached. This information has to live with the channel (probably a flag) as its state changes so that created events can be marked as cacheable. This flag already exists on ast_event_ref instead of the event itself (see _ast_event_queue in main/event.c) but is not used/exposed so as to be easily usable and will require a small API change (or may be easier to put on the event itself via a new IE). Device state changes that are distributed must also have this cacheable flag so that remote systems can know whether the new state should be cached or discarded after any receivers have taken appropriate action. I have not yet determined if this can be backwards compatible with existing event state distribution architecture. By: Matt Jordan (mjordan) 2012-08-09 16:11:52.165-0500 So, first you should probably keep in mind that whatever is done has to be done in the context of 1.8+. _ast_event_queue doesn't exist in 1.8, and the ast_event_ref object does not have a cache attribute in 1.8. That being said, that doesn't mean that can't be backported to 1.8. Something Kevin suggested was to think about making this configurable in each channel driver. The default would be to 'save state' for each device, but then allow for 'guest' devices to not have their state saved, as well as any particular configurable device. In the case of local channels, you'd probably never have their device state cached. Ideally then, each device would mark whether or not they want their event to be cached when they raise the event. This would allow a system administrator the ability to configure the system such that they can prevent the situation Russell ran into, while keeping the current behavior (cache stuff) if they so desire. As far as the distributed architecture goes, if we have to convey the cachce information in the event, then it won't be 'purely' backwards compatible. However, if all we've done is embed a new IE into the event, then 'old' systems should be okay, since they can pull information out of the event based on the identifier of each IE. By: Kinsey Moore (kmoore) 2012-08-14 16:23:20.384-0500 List of items to complete: * Identify all event generation that would need to determine cachability of the event being generated and determine in what cases these events are cachable. (needs a lot of research) * Per-channel-driver implementation of options to change caching behavior with cachability flag on channel. ** Maybe this would be better as a global option to make all events cachable vs some not cachable? (would still require per-channel flag) * Change internal generation/usage of cache flag to be an IE on the ast_event instead of a flag on the ast_event_ref. * Update all instances of event generation to use this IE appropriately in conjunction with the flag on the channel or the global option, whichever is chosen. ** Events without the IE should be considered cachable since they would be coming from a legacy system that expects them to be cachable. Otherwise, all events should have the cachability IE. * Change distributed generation/usage of events to serialize/deserialize the IE describing cachability. ** res_ais/res_corosync: Transparent since the event is sent as binary data. An unknown field in the event should not cause problems to legacy Asterisk systems running res_ais/res_corosync. ** res_xmpp/res_jabber: The persist_items configuration field is a per-subscription configuration and not a per-event configuration. This will be inserted as an item in aji_build_publish_skeleton. *** The cachability item should always come last so as not to disturb the parsing of legacy implementations. This should assume cachability by default and the additional information should negate cachability to prevent incorrect interpretation of events from legacy systems. Note 1: There do not appear to be any event comparison functions in event.h that would choke on an additional/unknown IE. Note 2: Porting from 1.8 forward should not be much of an issue, but the existing cachability flags should be removed from the ast_event_ref in 10, 11, and trunk and the differences between 1.8 and 10 should be smoothed out as far as ast_event_queue vs ast_event_queue_and_cache. Does this need to be done for core, core+extended, or all modules? Edit: clarification on configuration option as per-subscription and not per-event as published. By: Kinsey Moore (kmoore) 2012-08-21 16:43:30.637-0500 Event-related areas needing evaluation: * Code that uses ast_event_queue_and_cache with AST_EVENT_DEVICE_STATE[_CHANGE] ** res_jabber/res_xmpp ** res_corosync/res_ais ** devicestate.c * Code that uses ast_event_subscribe[_new] with any cachable event types ** res_jabber/res_xmpp ** res_corosync/res_ais ** devicestate.c * Code that consumes cached AST_EVENT_DEVICE_STATE[_CHANGE] events ** devicestate.c * Code that uses ast_devstate_changed or ast_devstate_changed_literal ** channels: dahdi, sip, agent, iax2, skinny ** main: channel, devicestate(needs to pass through the new cache parameter), features ** apps: confbridge, meetme ** res: calendar ** funcs: devstate |