|ASTERISK-20175: Security Vulnerability: denial of service attack through exploitation of device state caching
|Matt Jordan (mjordan)
|220.127.116.11 10.6.1 10.6.1-digiumphones
|( 0) event-cachability-3.diff
I have been working with someone on some performance issues with their
Asterisk cluster that uses distributed device state. One of the
problems that we identified was that the size of the device state
cache was growing out of control. To view the cache, you can do:
*CLI> event dump cache DeviceState
In particular, the states that were causing the problem on these
systems were things like:
Certain "device states" like this are useless to cache. Imagine an
outbound call center that uses Local channels in their dialplan and
PRIs for doing outbound calls. They get entries in the cache for
every number they dial. Ouch. That's a bug that needs to be
addressed and I'm not quite sure how to fix it in a good generic way
yet. However, that's not the vulnerability. It's just the background
that led me to the vulnerability.
I started thinking about how far this problem really reaches. I
wondered, can I remotely grow the cache, causing performance problems
and eventually running out of RAM? Unfortunately, yes. I have
verified this with SIP. I imagine the same issue exists with IAX2.
In chan_sip, if you allow anonymous calls, the channel name is based
on the domain in the From header. I verified this vulnerability by
doing the following:
I then used a call file:
CallerID: "My Name" <1111111>
The domain in the From header should be "example.com". The channel
name on the remote server should be "SIP/example.com-<something>". An
entry will be added to the cache for "SIP/example.com". This means
that I can very easily continue to send calls with different domains
and fill up the cache.
The public server that I tested this against happened to be running
Asterisk 10. I believe that this affects all versions that have the
device state cache, which would be 1.6.something and up.
This is a nasty problem and I'm not sure what the fix should be. It's
an architectural problem. The cache needs to only consist of things
that are defined locally, and not things that are dynamically
generated, but there's not a good generic way to determine that given
a "device" name. I'd be happy to brainstorm with others on this.
While the original report came from me, I'd like to credit Leif Madsen
and Joshua Colp for their assistance with verifying the vulnerability.
|By: Kinsey Moore (kmoore) 2012-08-02 13:59:08.375-0500
Consumers for further state distribution:
Direct consumers of device state:
All of these seem to hook device state unconditionally except CCSS which hooks information for specific devices as they require CCSS.
By: Kinsey Moore (kmoore) 2012-08-02 15:11:30.255-0500
It also appears that main/devicestate.c is the only consumer of cached device state events.
By: Kinsey Moore (kmoore) 2012-08-09 09:33:03.952-0500
I finally got a chance to talk with Russell and this is going to be pretty nasty to fix. Only the channel driver that creates the channel can know whether its state should be cached. This information has to live with the channel (probably a flag) as its state changes so that created events can be marked as cacheable. This flag already exists on ast_event_ref instead of the event itself (see _ast_event_queue in main/event.c) but is not used/exposed so as to be easily usable and will require a small API change (or may be easier to put on the event itself via a new IE). Device state changes that are distributed must also have this cacheable flag so that remote systems can know whether the new state should be cached or discarded after any receivers have taken appropriate action. I have not yet determined if this can be backwards compatible with existing event state distribution architecture.
By: Matt Jordan (mjordan) 2012-08-09 16:11:52.165-0500
So, first you should probably keep in mind that whatever is done has to be done in the context of 1.8+. _ast_event_queue doesn't exist in 1.8, and the ast_event_ref object does not have a cache attribute in 1.8.
That being said, that doesn't mean that can't be backported to 1.8.
Something Kevin suggested was to think about making this configurable in each channel driver. The default would be to 'save state' for each device, but then allow for 'guest' devices to not have their state saved, as well as any particular configurable device. In the case of local channels, you'd probably never have their device state cached.
Ideally then, each device would mark whether or not they want their event to be cached when they raise the event. This would allow a system administrator the ability to configure the system such that they can prevent the situation Russell ran into, while keeping the current behavior (cache stuff) if they so desire.
As far as the distributed architecture goes, if we have to convey the cachce information in the event, then it won't be 'purely' backwards compatible. However, if all we've done is embed a new IE into the event, then 'old' systems should be okay, since they can pull information out of the event based on the identifier of each IE.
By: Kinsey Moore (kmoore) 2012-08-14 16:23:20.384-0500
List of items to complete:
* Identify all event generation that would need to determine cachability of the event being generated and determine in what cases these events are cachable. (needs a lot of research)
* Per-channel-driver implementation of options to change caching behavior with cachability flag on channel.
** Maybe this would be better as a global option to make all events cachable vs some not cachable? (would still require per-channel flag)
* Change internal generation/usage of cache flag to be an IE on the ast_event instead of a flag on the ast_event_ref.
* Update all instances of event generation to use this IE appropriately in conjunction with the flag on the channel or the global option, whichever is chosen.
** Events without the IE should be considered cachable since they would be coming from a legacy system that expects them to be cachable. Otherwise, all events should have the cachability IE.
* Change distributed generation/usage of events to serialize/deserialize the IE describing cachability.
** res_ais/res_corosync: Transparent since the event is sent as binary data. An unknown field in the event should not cause problems to legacy Asterisk systems running res_ais/res_corosync.
** res_xmpp/res_jabber: The persist_items configuration field is a per-subscription configuration and not a per-event configuration. This will be inserted as an item in aji_build_publish_skeleton.
*** The cachability item should always come last so as not to disturb the parsing of legacy implementations. This should assume cachability by default and the additional information should negate cachability to prevent incorrect interpretation of events from legacy systems.
Note 1: There do not appear to be any event comparison functions in event.h that would choke on an additional/unknown IE.
Note 2: Porting from 1.8 forward should not be much of an issue, but the existing cachability flags should be removed from the ast_event_ref in 10, 11, and trunk and the differences between 1.8 and 10 should be smoothed out as far as ast_event_queue vs ast_event_queue_and_cache.
Does this need to be done for core, core+extended, or all modules?
Edit: clarification on configuration option as per-subscription and not per-event as published.
By: Kinsey Moore (kmoore) 2012-08-21 16:43:30.637-0500
Event-related areas needing evaluation:
* Code that uses ast_event_queue_and_cache with AST_EVENT_DEVICE_STATE[_CHANGE]
* Code that uses ast_event_subscribe[_new] with any cachable event types
* Code that consumes cached AST_EVENT_DEVICE_STATE[_CHANGE] events
* Code that uses ast_devstate_changed or ast_devstate_changed_literal
** channels: dahdi, sip, agent, iax2, skinny
** main: channel, devicestate(needs to pass through the new cache parameter), features
** apps: confbridge, meetme
** res: calendar
** funcs: devstate