tipabu/data-protection.md Secret

## data-protection.md

      
    Raw
  

              data-protection.md
            
          
    Data Protection Middleware for OpenStack Swift

This project provides several middleware to add data-protection
capabilities for Swift. The goal is to allow cluster operators
to be able to automatically guard against accidental or malicious
overwrites and deletions. This in turn allows IT administrators
to feel comfortable giving workstations easy, usable, direct access
to Swift (e.g. by mounting a container as though it were a network
drive) without worrying about malware or disgruntled users.
Changes and Features


The existing versioned_writes middleware now has the concept
of a "versioning mode". Previously, it would always behave as
a stack, with PUTs pushing a new version onto the stack and
DELETEs popping the most recent version off. Now, there is a
new option to behave as a history, with PUTs and DELETEs
behaving normally while recording objects' previous states.


A new defaulter middleware is introduced to allow operators
and users to specify default header values to set (if not
already present) during PUTs.

Containers may set defaults for objects.
Accounts may set defaults for containers and objects.
The filter config may set defaults for accounts, containers,
and objects.
This allows the operator to automatically enable versioning on
all new containers, and to do so with the new "history" mode.
Note that this required changes to versioned_writes so that
subrequests would have their defaults populated.


The existing versioned_writes middleware will now attempt to
auto-vivify the versions container if it does not exist.
Otherwise, users would still need to manually create the versions
container for their primary containers; with this, they don't
even have to know about it.


A new data_protection middleware is introduced to guard against
unsafe actions (PUTs, POSTs, DELETEs) in versions locations,
as well as attempts to modify the versioning status of containers.
This ensures that malware, etc. cannot truly destroy data, only
move it to the versions container.


Since versions containers would otherwise grow without bound, the
data_protection middleware may also be used to specify a default
retention window that should be used for new versions containers.
This uses the defaulter infrastructure to add X-Delete-After
headers to the objects copied infer versions containers.


Since the defaulter infrastructure may otherwise be used to
subvert the protection, the data_protection middleware prevents
(non-admin) users from being able to set the following headers on
their accounts:

X-Default-Container-X-Data-Protection
X-Default-Container-X-Versions-Location
X-Default-Container-X-Versions-Mode
Note that X-Default-Object-X-Delete-At and
X-Default-Object-X-Delete-After are fine, as they would be
overridden by the container-level X-Default-Object-X-Delete-After
(and X-Delete-After takes precedence over X-Delete-At).


Caveats

The versioned_writes filter config must include
use = egg:data_protection#versioned_writes; using
paste.filter_factory = ... will cause Swift to auto-insert its own
versioned_writes, which will likely lead to bad/weird behavior.

The example proxy-server.conf describes a recommended setup, not the
defaults of the middlewares. In particular, operators should be sure to:

Enable use_formatting in the defaulter filter config. Otherwise,
all object versions for all containers will be stored in a single
container.
Be sure to include default-container-x-versions-mode = history in
the defaulter filter config. Otherwise, Swift will default to the
stack-based versioning, where DELETEs actually destroy data.
Configure the auto_enable_prefix in the data_protection filter
config and use that prefix when configuring
default-container-x-versions-location. Otherwise, users may create
the versions container before it is auto-vivified, and it won't have
the protection flag set.
Choose an appropriate value for default_versions_retention; by
default, all versions are retained indefinitely.
Disable the owner_can_protect option in the data_protection filter
config. This is enabled by default in hopes of later submitting the
middleware upstream, where account owners are expected to have full
control over all data within the account.


The recommended setup restricts account owners' ability to manage the
data within their account, including their own data usage. This may be
tolerable in private deployments, but would be wholely inappropriate
for public clouds.
The owner_can_protect option may make the data_protection middleware
more appropriate for public clouds (and allow account owners to protect
against accidental data loss from read/write users), but it remains largely
untested.

  
## proxy-server.conf-sample
[pipeline:main]
# A few notes on the pipeline and pipeline placement:
#
#   * defaulter should be as far left as possible while still right of our
#     sane-WSGI-environment middlewares (gatekeeper, proxy-logging, cache).
#
#   * versioned_writes must be explicitly put into the pipeline; if you allow
#     Swift to insert it, it won't be the history-capable fork.
#
#   * data_protection must be after versioned_writes; they go hand-in-hand.

pipeline = catch_errors gatekeeper healthcheck proxy-logging cache defaulter
 container_sync bulk tempurl ratelimit tempauth container-quotas account-quotas
 slo dlo versioned_writes data_protection proxy-logging proxy-server

[filter:defaulter]
use = egg:swift_data_protection#defaulter
use_formatting = true
default-container-x-versions-location = .trash-{container}
default-container-x-versions-mode = history

[filter:versioned_writes]
use = egg:swift_data_protection#versioned_writes
allow_versioned_writes = true

[filter:data_protection]
use = egg:swift_data_protection#data_protection
auto_enable_prefix = .trash-
owner_can_protect = false
default_versions_retention = 7776000  # 90 days
	[pipeline:main]
	# A few notes on the pipeline and pipeline placement:
	#
	# * defaulter should be as far left as possible while still right of our
	# sane-WSGI-environment middlewares (gatekeeper, proxy-logging, cache).
	#
	# * versioned_writes must be explicitly put into the pipeline; if you allow
	# Swift to insert it, it won't be the history-capable fork.
	#
	# * data_protection must be after versioned_writes; they go hand-in-hand.

	pipeline = catch_errors gatekeeper healthcheck proxy-logging cache defaulter
	container_sync bulk tempurl ratelimit tempauth container-quotas account-quotas
	slo dlo versioned_writes data_protection proxy-logging proxy-server

	[filter:defaulter]
	use = egg:swift_data_protection#defaulter
	use_formatting = true
	default-container-x-versions-location = .trash-{container}
	default-container-x-versions-mode = history

	[filter:versioned_writes]
	use = egg:swift_data_protection#versioned_writes
	allow_versioned_writes = true

	[filter:data_protection]
	use = egg:swift_data_protection#data_protection
	auto_enable_prefix = .trash-
	owner_can_protect = false
	default_versions_retention = 7776000 # 90 days