Skip to content

Instantly share code, notes, and snippets.

@nagydani
Created January 18, 2018 13:33
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nagydani/566c144cd75378c3a092f48cb2239023 to your computer and use it in GitHub Desktop.
Save nagydani/566c144cd75378c3a092f48cb2239023 to your computer and use it in GitHub Desktop.
Symmetric Encryption for Swarm Content

Symmetric Encryption for Swarm Content

Motivation

It is a natural requirement for many use cases to store private information accessible only to authorized parties in Swarm. Since in Swarm there can be no expectation of not sharing any information with other nodes, the only way to prevent unauthorized parties from acessing it is encryption. In this case, authorized parties must be in possession of the corresponding decryption key, while unauthorized parties must not.

The objective of this document is to extend Swarm with an encryption suite that would allow the development of decentralized applications that need to store and manipulate large amounts of private data the same way as they are currently able to store and manipulate public data. The only difference between accessing private and public data will be the presence of a decryption/encryption key and the computational overhead related to encryption.

Cryptographic Design

Requirements

Encryption is to be implemented on top of the DPA layer, with the underlying infrastructure for storing, locating and retrieving chunks unchanged. In particular, this implies that ciphertext chunks fit in the same 4096 + 8 bytes as plaintext chunks. However, in order not to disclose the length of the plaintext, all ciphertext chunks must be padded to full size and the length must also be encrypted.

The computational and storage costs of various operations, such as storing and retrieving full or partial plaintext as well as retrieving or changing parts of raw binary data, manifests and file collections must remain the same as their unencrypted counterparts except for a constant or linear overhead (same big O).

Security-wise even in case of (even adaptively) chosen plaintexts, an attacker must not be able to distinguish ciphertext chunks from random data (identically uniformly distributed, independent random bits) as well as ciphertext chunks resulting from encrypting other data.

The API should be as close to the unencrypted API as possible. In particular, the only change is that the Swarm hash hp of the plaintext is replaced by the pair (hc, k) of the Swarm hash of the ciphertext and the symmetric decription key.

Implementation Guidelines

Padding

Plaintext chunks consist of a 8-byte length field encoding the size of the binary blob accessible through a Merkle structure for which this chunk is the root, followed by the chunk payload which is at most 4096 bytes. Before encrypting, each chunk payload is padded to exactly 4096 bytes, as its actual length can be deduced from the length field as follows:

payloadLength := length
while payloadLength > 4096
        payloadLength := payloadLength + 4095
        payloadLength := payloadLength / 4096
        payloadLength := payloadLength * refSize

Where length is the content of the length field and refSize is the sum of size of the referencing hash value and that of the decryption key, which is currently 64, as we use 256-bit hashes and 256-bit keys.

This procedure can be used to remove the padding after decryption before returning the plaintext chunk. To frustrate keyspace search, the padding must be random. This way, the only distinguishing feature of a plaintext chunk is length being much smaller than 264, but for any ciphertext, there will be a large number of keys (well over 2192) resulting in such plaintexts.

Chunk Encryption

Chunks are encrypted and decrypted using a stream cipher seeded with the corresponding symmetric key. In order not to increase the attack surface by introducing additional cryptographic primitives, the stream cipher of choice can be SHAKE256 as defined in FIPS-202 as it relies on the security of the same Keccak sponge function as used in Swarm hash. Another attractive alternative is to use SHA3 in CTR mode (i.e. hashing together the key with a counter), which is considerably slower than SHAKE256, but has the desirable property of being easier (and cheaper) to implement in EVM, lending itself to use in smart contracts constraining the plaintext of encrypted Swarm content.

API Design

It is important to emphasize, that encrypted Swarm chunks are not different from plaintext chunks and therefore there is no change whatsoever on the P2P protocol level. The proposed encryption scheme is end-to-end, meaning that encription and decryption is done on endpoints or protocol gateways.

Libraries

DPA Put and Get

At the DPA layer, the encrypted version of Put has the same argument (the plaintext chunk) as its unencrypted counterpart. It generates a random encryption key, pads and encrypts the plaintext chunk with it and submits the ciphertext to unencrypted Put, returning both the Swarm hash returned by it and the encryption key. In order to guarantee the uniqueness of encryption keys as well as to ease tha load on the OS's entropy pool, it is recommended (but not required) to generate the key as the MAC of the plaintext using a (semi-) permanent random key stored in memory.

The encrypted version of Get takes a reference (consisting of a ciphertext hash and a decryption key) as its argument instead of just the hash in the unencrypted version. It calls the unencrypted Get with the ciphertext hash, retrieves the ciphertext chunk from the DPA and decrypts it using the supplied decryption key, returning the plaintext chunk.

Higher protocol layers

The API's of those do not change except for accepting 512-bit references (consisting of a Swarm hash and a decryption key) in the place of the 256-bit Swarm hash in the unencrypted version.

Command-line utilities

There will be two alternative ways of passing the encryption-decryption key to command-line utilities:

  • Directly in a command-line argument (unsafe, but useful for testing)
  • By passing the path to a file containing the key in a command-line argument, in which case the file's access control will take care of security.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment