Skip to content

Instantly share code, notes, and snippets.

@calmh
Last active January 6, 2019 14:38
Show Gist options
  • Star 5 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save calmh/e9976f6248d54917ebd6 to your computer and use it in GitHub Desktop.
Save calmh/e9976f6248d54917ebd6 to your computer and use it in GitHub Desktop.

Goals

Syncthing devices that store data in encrypted format. Such a device can participate fully in a cluster, but the information on the device is not directly usable.

Assumptions and Definitions

  • There exists a secret key per folder, known only by devices that can access the unencrypted data. We call these "secure devices".

  • Devices that do not know the secret key can only access encrypted data. We call these "insecure devices".

  • An insecure device is assumed to be under the control of an adversary. As such, we cannot trust disk nor memory contents to be safe from prying eyes nor malicions modification.

  • A new secure device must be able to fully initialize from just knowing the secret key and connecting to an insecure device.

Protected Information

We would like to protect the following information from unauthorized access and modification:

  • File contents
  • File content hashes
  • File and directory names
  • File sizes
  • File metadata (modification times and permissions)

For the solution to remaing somewhat human-manageable, we do not propose encrypting the folder name/label. This can instead be chosen to be inconspicuous or meaningless as required.

Protocol Impact

Background

Devices exchange an Index describing the contents of the folder. This index contains (see BEPv1 spec):

  • Folder name
  • List of files/directories, with:
    • Name
    • Flags (permission bits etc)
    • Modification time
    • Version information (a version vector)
    • Local version (an incrementing integer)
    • List of blocks, with:
      • Size
      • Hash (SHA-256)

To remain up to date with the cluster, a device performs Requests to get new file data. The request message contains:

  • Folder name
  • File name
  • Offset
  • Size
  • Hash (SHA-256)
  • Flags and options (currently zero/empty)

The Response to a request contains:

  • Data (up to 128 KiB)
  • Error code (integer)
  • Error message (string)

Most of this information must be covered by encryption.

Proposed Changes

Encrypting/decrypting something requires the secret key, the plaintext and a IV/nonce. The IV/nonce must be unique per encrypted plaintext (i.e. cannot be reused), but is not by itself a secret.

  • Folder name (plaintext)
  • List of files/directories, with:
    • Name (encrypted)
    • Nonce for encrypted name (new field)
    • Flags (permission bits etc) (set to default, stored in encrypted metadata)
    • Modification time (set to default, stored in encrypted metadata)
    • Version information (a version vector) (plaintext)
    • Local version (an incrementing integer) (plaintext)
    • Encrypted metadata (new field)
    • Encrypted metadata nonce (new field)
    • List of blocks, with:
      • Size (unencrypted)
      • Hash (SHA-256) (contains hash of encrypted data)
      • Encrypted hash of plaintext (SHA-256) (new field)
      • Encrypted hash of plaintext nonce (new field)
      • Encrypted plaintext nonce (new field)

The encrypted file name is base64 encoded and has slashes added in appropriate places to create a hierachy and avoid storing tens of thousands of files per directory. A flag in the index exchange and cluster config messages is introduced to indicate that encryption is in effect. The request message remains unchanged. The file name is however the encrypted name from the index above and the content hash is that of the encrypted data. The response message remains unchanged. The actual data sent is encrypted, using the nonce stored in the index.

This places some additional work and responsibilities on the originating secure device. It must compute hashes for both the encrypted an unencrypted version of data, and additional information must be stored in the index and transmitted to other devices. The communication between secure devices is identical to the one between a secure and an insecure device, but a number of fields must be encrypted and decrypted during communication, adding overhead.

An insecure device mostly just needs to ignore the new fields. It uses the file name and metadata given and can verify the stored and transmitted data against the hash as usual.

Encryption

I suggest using the NaCL secretbox (https://godoc.org/golang.org/x/crypto/nacl/secretbox). They've thought about most things so that we don't have to, apart from generating unique nonces.

Secretbox uses XSalsa20 and Poly1305 to encrypt and authenticate messages with secret-key cryptography.

Potential Issues

  • This does not in fact hide file sizes. How important is that? If it is important, can we solve it without splitting each file into all it's constituent blocks (which is inefficient storage-wise)?

  • Initial merge between two secure devices who have chosen different nonces etc?

  • Encrypting file names means files will no longer be contained in their parent directories. Directories need not be represented on the insecure device. However things will be confused if we have a file with the encrypted file name "a/bc/def/ghijklmnopqrstuvw" and we don't have directory entries for "a", "a/bc" etc... This will require handling on the insecure device.

  • The insecure device must be tought not do scans etc.

@stevenroose
Copy link

I'm having exams coming up, so just ran over it real quick.

One thing though, I'd like to say.

Initial merge between two secure devices who have chosen different nonces etc?

I dont know XSalsa20 and Poly1305, but I have experience with ECDSA. In ECDSA, it is common to deterministically generate nonces to avoid collision. That would also resolve your issue. Take f.e.
nonce = SHA-256( SHA-256(message) + SHA-256(key) )
Actors that don't know the key won't be able to regenerate the nonce and all actors with the key and the (encrypted) hash of the message (which is passed along with the encrypted message) will be able to generate the nonce unambiguously. Nonce will only collide for the exact same message, which will also generate the same encrypted message, so that's ok and won't leak any data.

@calmh
Copy link
Author

calmh commented Jul 14, 2015

Yes, smart. We should do that.

@oceanofsolaris
Copy link

In this case, it would probably be easiest to derive the nonce from the filename/path, since it must be unique for every file (using the filename+path for 'message' in stevenroose proposed solution).

Since I don't really know a lot about the SyncThing protocol, this might be a stupid question, but anyways:
How would two secure devices handle conflicting changes? As far as I see, they would not have access to any information that would help them resolve it, so they would need to leave it to an insecure device to finally handle it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment