nagydani/swarm-encryption.md Secret

## swarm-encryption.md

      
    Raw
  

              swarm-encryption.md
            
          
    Symmetric Encryption for Swarm Content

Motivation

It is a natural requirement for many use cases to store private information accessible only to authorized parties in Swarm.
Since in Swarm there can be no expectation of not sharing any information with other nodes, the only way to prevent
unauthorized parties from acessing it is encryption. In this case, authorized parties must be in possession of the
corresponding decryption key, while unauthorized parties must not.
The objective of this document is to extend Swarm with an encryption suite that would allow the development of decentralized
applications that need to store and manipulate large amounts of private data the same way as they are currently able to
store and manipulate public data. The only difference between accessing private and public data will be the presence of a
decryption/encryption key and the computational overhead related to encryption.
Cryptographic Design

Requirements

Encryption is to be implemented on top of the DPA layer, with the underlying infrastructure for storing, locating and retrieving
chunks unchanged. In particular, this implies that ciphertext chunks fit in the same 4096 + 8 bytes as plaintext chunks. However,
in order not to disclose the length of the plaintext, all ciphertext chunks must be padded to full size and the length must also
be encrypted.
The computational and storage costs of various operations, such as storing and retrieving full or partial plaintext as well as
retrieving or changing parts of raw binary data, manifests and file collections must remain the same as their unencrypted
counterparts except for a constant or linear overhead (same big O).
Security-wise even in case of (even adaptively) chosen plaintexts, an attacker must not be able to distinguish ciphertext chunks from
random data (identically uniformly distributed, independent random bits) as well as ciphertext chunks resulting from encrypting other
data.
The API should be as close to the unencrypted API as possible. In particular, the only change is that the Swarm hash h_p
of the plaintext is replaced by the pair (h_c, k) of the Swarm hash of the ciphertext and the symmetric decription
key.
Implementation Guidelines

Padding

Plaintext chunks consist of a 8-byte length field encoding the size of the binary blob accessible through a Merkle structure
for which this chunk is the root, followed by the chunk payload which is at most 4096 bytes. Before encrypting, each chunk
payload is padded to exactly 4096 bytes, as its actual length can be deduced from the length field as follows:
payloadLength := length
while payloadLength > 4096
        payloadLength := payloadLength + 4095
        payloadLength := payloadLength / 4096
        payloadLength := payloadLength * refSize

Where length is the content of the length field and refSize is the sum of size of the referencing hash value and that of
the decryption key, which is currently 64, as we use 256-bit hashes and 256-bit keys.
This procedure can be used to remove the padding after decryption before returning the plaintext chunk. To frustrate keyspace
search, the padding must be random. This way, the only distinguishing feature of a plaintext chunk is length being much
smaller than 2⁶⁴, but for any ciphertext, there will be a large number of keys (well over 2¹⁹²) resulting
in such plaintexts.
Chunk Encryption

Chunks are encrypted and decrypted using a stream cipher seeded with the corresponding symmetric key.  In order not to increase the
attack surface by introducing additional cryptographic primitives, the stream cipher of choice can be SHAKE256 as defined in
FIPS-202 as it relies on the security of the same Keccak sponge function as
used in Swarm hash. Another attractive alternative is to use SHA3 in CTR mode (i.e. hashing together the key with a counter), which
is considerably slower than SHAKE256, but has the desirable property of being easier (and cheaper) to implement in EVM, lending itself
to use in smart contracts constraining the plaintext of encrypted Swarm content.
API Design

It is important to emphasize, that encrypted Swarm chunks are not different from plaintext chunks and therefore there is
no change whatsoever on the P2P protocol level. The proposed encryption scheme is end-to-end, meaning that encription and
decryption is done on endpoints or protocol gateways.
Libraries

DPA Put and Get

At the DPA layer, the encrypted version of Put has the same argument (the plaintext chunk) as its unencrypted counterpart. It
generates a random encryption key, pads and encrypts the plaintext chunk with it and submits the ciphertext to unencrypted Put,
returning both the Swarm hash returned by it and the encryption key. In order to guarantee the uniqueness of encryption keys as
well as to ease tha load on the OS's entropy pool, it is recommended (but not required) to generate the key as the MAC of the
plaintext using a (semi-) permanent random key stored in memory.
The encrypted version of Get takes a reference (consisting of a ciphertext hash and a decryption key) as its argument instead of
just the hash in the unencrypted version. It calls the unencrypted Get with the ciphertext hash, retrieves the ciphertext chunk
from the DPA and decrypts it using the supplied decryption key, returning the plaintext chunk.
Higher protocol layers

The API's of those do not change except for accepting 512-bit references (consisting of a Swarm hash and a decryption key) in the
place of the 256-bit Swarm hash in the unencrypted version.
Command-line utilities

There will be two alternative ways of passing the encryption-decryption key to command-line utilities:

Directly in a command-line argument (unsafe, but useful for testing)
By passing the path to a file containing the key in a command-line argument, in which case the file's access control
will take care of security.