RobinLinus/sats4files.md

## sats4files.md

      
    Raw
  

              sats4files.md
            
          
    Sats4Files: Decentralized File Hosting based on Lightning

Sats4Files is a protocol for decentralized file hosting. It allows users to request data from untrusted servers, and upon receiving the encrypted data, they can pay for the decryption key via Lightning. The exchange is atomic, ensuring that the user only receives the data if they pay for it, and the seller only gets paid if they deliver the data. The protocol is an efficient form of verifiable encryption, which is similar to verifiable secret sharing using Shamir's algorithm.
This scheme is simplified and does not fully solve the problem, as discussed in the Limitations section. This writeup intends to spark interest to solve the remaining issues beyond the fixes that we suggest.
Sats4Files Problem

The client wants to buy from the server the file corresponding to a particular file_id.
Here, we assume we have PTLCs on Lightning instead of HTLCs. That means we can buy a discrete logarithm over Lightning.
Polynomial Encoding


Chunk up the file into n 32-byte chunks chunk_1, chunk_2, ... , chunk_n
Compute the Lagrange polynomial f(x) using f(1)=chunk_1, f(2)=chunk_2, ... , f(n)=chunk_n
Sample n many out-of-domain (OOD) points f(n+1), f(n+2), ... ,f(n+n)

Encryption


Send to the client n-1 many of those points f(n+1), f(n+2), ... ,f(n+n-1). This way, the client learns nothing about f yet, because they need one more point to reconstruct the polynomial.
Send the last OOD point encrypted: f(n+n)G. This is the used 'preimage' in our PTLC.

Verification


The client computes the other n-1 many encrypted OOD points f(n+1)G, f(n+2)G, ... ,f(n+n-1)G from the unencrypted points
Use Lagrange interpolation with the n many encrypted OOD points to compute the n many encrypted chunks f(1)G, ..., f(n)G
Hash the encrypted chunks and verify that it is the file_id

Decryption


The client buys the remaining unencrypted value f(n+n) of the encrypted point f(n+n)G via Lightning in a PTLC
Use Lagrange interpolation with the n many unencrypted OOD points to compute the n many unencrypted chunks f(1), ..., f(n)

Limitations and Optimizations

A malicious client can decrypt the file if they know only one chunk. This modification fixes the issue.

In Lightning, you can use a preimage only once safely. So, the server has to reblind the polynomial for each client and each request. Therefore, they always add another random point f(n+1) to the domain. (This increases the degree of the polynomial by 1)
The chunks_1, ..., chunks_n are not uniformly random, because they represent, e.g., text files. We can solve that by adding upfront another point to the domain f(0)=chunk_0, which will be used as seed to obfuscate the chunks like chunk_1 xor H(chunk_0 || 0), ... ,chunk_n xor H(chunk_0 || n) to hide all structure and make them look uniformly random. The new file_id then becomes the hash of chunk_0, ... , chunk_n.
The connection between the PTLC and the polynomial should be one-way. That means that the client can buy a dlog via the PTLC, but if they can decrypt the polynomial (for example, because they know the file already) they should learn nothing about the PTLC. That ensures they can't tamper with the payment route.
In general, it makes sense to publish the files in a compressed format, to minimize the amount of data having to get processed by the verifiable encryption.
It is possible to encrypt multiple files with the same key, such that they can be batched into a single Lightning payment buying them all at once. However, if a malicious client already knows any one of the files they can compute the decryption key for all of them.
Larger files can be chunked up into smaller chunks such that the maximum degree of polynomials is bound to fit consumer hardware easily.

Here's a basic implementation of the scheme.
See also my other schemes here.