This project provides several middleware to add data-protection capabilities for Swift. The goal is to allow cluster operators to be able to automatically guard against accidental or malicious overwrites and deletions. This in turn allows IT administrators to feel comfortable giving workstations easy, usable, direct access to Swift (e.g. by mounting a container as though it were a network drive) without worrying about malware or disgruntled users.
-
The existing
versioned_writes
middleware now has the concept of a "versioning mode". Previously, it would always behave as a stack, withPUT
s pushing a new version onto the stack andDELETE
s popping the most recent version off. Now, there is a new option to behave as a history, withPUT
s andDELETE
s behaving normally while recording objects' previous states. -
A new
defaulter
middleware is introduced to allow operators and users to specify default header values to set (if not already present) duringPUT
s.- Containers may set defaults for objects.
- Accounts may set defaults for containers and objects.
- The filter config may set defaults for accounts, containers,
and objects.
This allows the operator to automatically enable versioning on
all new containers, and to do so with the new "history" mode.
Note that this required changes to
versioned_writes
so that subrequests would have their defaults populated.
-
The existing
versioned_writes
middleware will now attempt to auto-vivify the versions container if it does not exist. Otherwise, users would still need to manually create the versions container for their primary containers; with this, they don't even have to know about it. -
A new
data_protection
middleware is introduced to guard against unsafe actions (PUT
s,POST
s,DELETE
s) in versions locations, as well as attempts to modify the versioning status of containers. This ensures that malware, etc. cannot truly destroy data, only move it to the versions container. -
Since versions containers would otherwise grow without bound, the
data_protection
middleware may also be used to specify a default retention window that should be used for new versions containers. This uses thedefaulter
infrastructure to addX-Delete-After
headers to the objects copied infer versions containers. -
Since the
defaulter
infrastructure may otherwise be used to subvert the protection, thedata_protection
middleware prevents (non-admin) users from being able to set the following headers on their accounts:X-Default-Container-X-Data-Protection
X-Default-Container-X-Versions-Location
X-Default-Container-X-Versions-Mode
Note thatX-Default-Object-X-Delete-At
andX-Default-Object-X-Delete-After
are fine, as they would be overridden by the container-levelX-Default-Object-X-Delete-After
(andX-Delete-After
takes precedence overX-Delete-At
).
The versioned_writes
filter config must include
use = egg:data_protection#versioned_writes
; using
paste.filter_factory = ...
will cause Swift to auto-insert its own
versioned_writes
, which will likely lead to bad/weird behavior.
The example proxy-server.conf describes a recommended setup, not the defaults of the middlewares. In particular, operators should be sure to:
- Enable
use_formatting
in thedefaulter
filter config. Otherwise, all object versions for all containers will be stored in a single container. - Be sure to include
default-container-x-versions-mode = history
in thedefaulter
filter config. Otherwise, Swift will default to the stack-based versioning, whereDELETE
s actually destroy data. - Configure the
auto_enable_prefix
in thedata_protection
filter config and use that prefix when configuringdefault-container-x-versions-location
. Otherwise, users may create the versions container before it is auto-vivified, and it won't have the protection flag set. - Choose an appropriate value for
default_versions_retention
; by default, all versions are retained indefinitely. - Disable the
owner_can_protect
option in thedata_protection
filter config. This is enabled by default in hopes of later submitting the middleware upstream, where account owners are expected to have full control over all data within the account.
The recommended setup restricts account owners' ability to manage the data within their account, including their own data usage. This may be tolerable in private deployments, but would be wholely inappropriate for public clouds.
The owner_can_protect
option may make the data_protection
middleware
more appropriate for public clouds (and allow account owners to protect
against accidental data loss from read/write users), but it remains largely
untested.