calmh/encryption.md

## encryption.md

      
    Raw
  

              encryption.md
            
          
    Goals

Syncthing devices that store data in encrypted format. Such a device can participate fully in a cluster, but the information on the device is not directly usable.
Assumptions and Definitions


There exists a secret key per folder, known only by devices that can access the unencrypted data. We call these "secure devices".


Devices that do not know the secret key can only access encrypted data. We call these "insecure devices".


An insecure device is assumed to be under the control of an adversary. As such, we cannot trust disk nor memory contents to be safe from prying eyes nor malicions modification.


A new secure device must be able to fully initialize from just knowing the secret key and connecting to an insecure device.


Protected Information

We would like to protect the following information from unauthorized access and modification:

File contents
File content hashes
File and directory names
File sizes
File metadata (modification times and permissions)

For the solution to remaing somewhat human-manageable, we do not propose encrypting the folder name/label. This can instead be chosen to be inconspicuous or meaningless as required.
Protocol Impact

Background

Devices exchange an Index describing the contents of the folder. This index contains (see BEPv1 spec):

Folder name
List of files/directories, with:

Name
Flags (permission bits etc)
Modification time
Version information (a version vector)
Local version (an incrementing integer)
List of blocks, with:

Size
Hash (SHA-256)


To remain up to date with the cluster, a device performs Requests to get new file data. The request message contains:

Folder name
File name
Offset
Size
Hash (SHA-256)
Flags and options (currently zero/empty)

The Response to a request contains:

Data (up to 128 KiB)
Error code (integer)
Error message (string)

Most of this information must be covered by encryption.
Proposed Changes

Encrypting/decrypting something requires the secret key, the plaintext and a IV/nonce. The IV/nonce must be unique per encrypted plaintext (i.e. cannot be reused), but is not by itself a secret.

Folder name (plaintext)
List of files/directories, with:

Name (encrypted)
Nonce for encrypted name (new field)
Flags (permission bits etc) (set to default, stored in encrypted metadata)
Modification time (set to default, stored in encrypted metadata)
Version information (a version vector) (plaintext)
Local version (an incrementing integer) (plaintext)
Encrypted metadata (new field)
Encrypted metadata nonce (new field)
List of blocks, with:

Size (unencrypted)
Hash (SHA-256) (contains hash of encrypted data)
Encrypted hash of plaintext (SHA-256) (new field)
Encrypted hash of plaintext nonce (new field)
Encrypted plaintext nonce (new field)


The encrypted file name is base64 encoded and has slashes added in appropriate places to create a hierachy and avoid storing tens of thousands of files per directory. A flag in the index exchange and cluster config messages is introduced to indicate that encryption is in effect. The request message remains unchanged. The file name is however the encrypted name from the index above and the content hash is that of the encrypted data. The response message remains unchanged. The actual data sent is encrypted, using the nonce stored in the index.
This places some additional work and responsibilities on the originating secure device. It must compute hashes for both the encrypted an unencrypted version of data, and additional information must be stored in the index and transmitted to other devices. The communication between secure devices is identical to the one between a secure and an insecure device, but a number of fields must be encrypted and decrypted during communication, adding overhead.
An insecure device mostly just needs to ignore the new fields. It uses the file name and metadata given and can verify the stored and transmitted data against the hash as usual.
Encryption

I suggest using the NaCL secretbox (https://godoc.org/golang.org/x/crypto/nacl/secretbox). They've thought about most things so that we don't have to, apart from generating unique nonces.

Secretbox uses XSalsa20 and Poly1305 to encrypt and authenticate messages with secret-key cryptography.

Potential Issues


This does not in fact hide file sizes. How important is that? If it is important, can we solve it without splitting each file into all it's constituent blocks (which is inefficient storage-wise)?


Initial merge between two secure devices who have chosen different nonces etc?


Encrypting file names means files will no longer be contained in their parent directories. Directories need not be represented on the insecure device. However things will be confused if we have a file with the encrypted file name "a/bc/def/ghijklmnopqrstuvw" and we don't have directory entries for "a", "a/bc" etc... This will require handling on the insecure device.


The insecure device must be tought not do scans etc.