laughinghan/double-blind_redeem.md

## double-blind_redeem.md

      
    Raw
  

              double-blind_redeem.md
            
          
    Two Ways to do Interactive Double-Blind Trust Token Redemption

By "double-blind" I mean that not only does the issuer not know who the client is, but the server doesn't know exactly which issuers the client redeemed trust tokens from.
Summary: Two ways for an untrusted client to prove to an untrusted server that the client is trusted by X out of Y trust token issuers, without revealing exactly which issuers:

One way uses hash functions as its only crypto primitive, but requires revealing some of the issuers to the server, which is a leak of fingerprintable info. It's also probabilistic, and the chances of a malicious client's lie being caught depends on the degree of the lie as well as how many issuers are revealed to the server, which is an undesirable tradeoff.
The other way, if it works, reveals none of the issuers to the server, and the client cannot lie at all. But this assumes I understand elliptic curve cryptography, which I learned entirely from the Privacy Pass explainer, so it probably has holes ¯\_(ツ)_/¯.

Use Case: So that a site can score the trustworthiness of a client based on many trust token issuers, while minimizing the fingerprintability of the information that is revealed by the client to the server. (Straw Proposal on WICG/trust-token-api) (Concerns with allowing too few trust token issuers per-site)
Okay, so details of the two ways:
Hash-based Commitment and Probabilistic Reveal


site asks for trust tokens from list of accepted issuers
nonce per accepted issuer: client picks a different nonce for each accepted issuer, regardless of whether the client has a trust token for that issuer, and hashes just the nonce by itself (hash(nonce))
client sends to server:

nonce commitment: a map from each issuer domain to the hash of the corresponding nonce (hash(nonce))
issuer & token commitments: a list of pairs of hashes, hash(nonce || issuer domain) and hash(nonce || issuer domain || trust token), for each issuer the client does have a trust token for
(|| is concatenation)


challenge: server verifies that every hash(nonce || issuer domain) is unique, and picks a random pair of hashes from the list to ask the client to reveal
reveal: client reveals nonce, trust token, and issuer domain
server verifies each hash, hash(nonce), hash(nonce || issuer domain), hash(nonce || issuer domain || trust token), as well as redeems the trust token

Security properties:

Server can't find out which issuer domains the client has trust tokens for other than the one revealed as long as hash(nonce || issuer domain) is irreversible. Note that domain names are low entropy so just hash(issuer domain) by itself is easily reversible with brute force.
Client definitely can't pretend that one valid trust token for one issuer is for many issuers and just rehash with many different nonces, because they committed to the nonce for that issuer, in the map.
Client probably would be caught if they have fewer trust tokens than they claim, although if they inflated the amount by 2x they would only have a half/half chance of getting caught, which is maybe worse than we'd like. Obviously, their ability to inflate the number of trust tokens they have decreases exponentially with the number of pairs of hashes the server can ask the client to reveal, but also reveals to the server more fingerprinting information about honest clients, so that's a tradeoff.

(Is each revealed issuer domain >1 or <1 bit of fingerprinting? By itself it's >1, but it overlaps in information with already-revealed count of how many issuers the client has trust tokens from, so I think it's <1 additional bit? Idea: the count is already log(number of issuers queried) bits of info, so the server may as well be allowed to request up to floor(log(count)) pairs of hashes revealed, or minimum 2?)


I Blind, You Encrypt, I Decrypt and Unblind

Basically, the client sends a blinded nonce per accepted issuer, the server signs then encrypts each with the issuer's public key, the client uses its trust tokens to ask issuers to decrypt them, and finally the client unblinds the decrypted signed nonces and reveals them to the server, thus proving how many issuers trust it without revealing which issuers.
Diagram, in the style of the Privacy Pass explainer, but with T replaced with N to avoid confusion between the nonces in this protocol and the original trusted tokens:
b1N1 ... b9N9 ->
                E1 = encrypt(sb1N1, issuer1.publicKey)
                ...
                E9 = encrypt(sb9N9, issuer9.publicKey)
                <- E1 ... E9

Between client and trust token issuers:
(client has trust tokens for issuer3 thru issuer7)

    E3 -> (to issuer3)
                    sb3N3 = decrypt(E3, issuer3.privateKey)
                    <- (from issuer3) sb3N3
    ...
    E7 -> (to issuer7)
                    sb7N7 = decrypt(E7, issuer7.privateKey)
                    <- (from issuer7) sb7N7

Back to client and first-party server:
(can now unblind to anonymize)

sN3 = sb3N3 / b3
...
sN7 = sb7N7 / b7
hash([sN3 ... sN7]) ->
                <- sb1N1 ... sb9N9,
                   DLEQ(b1N1:sb1N1 == (b2N2 + ... + b9N9):(sb2N2 + ... + sb9N9))

(finally, 5 anonymized signed nonces, proves client trusted by 5 issuers)
sN3 ... sN7 ->


site asks for trust tokens from list of say, 9 issuers
send blinded nonces: client sends 9 blinded nonces to the server, b₁N₁ ... b₉N₉, and (optional) says "I have 5 of those issuers, btw"
return, signed and encrypted: server signs them all with the same signature (sb₁N₁ ... sb₉N₉, all with the same s), but then encrypts each with a different issuer's public key, and sends the encrypted signed nonces E₁ ... E₉ back to the client (along with who they're encrypted for)
decrypt: client forwards 5 of them to the 5 issuers it has trust tokens for, issuer3 ... issuer7, who decrypt them and return them to the client
send commitment: client now has 5 decrypted signed blinded nonces sb₃N₃ ... sb₇N₇. Client unblinds them (divides by b₃ ... b₇ to get s₃N₃ ... s₇N₇), hashes them together, and sends the hash to the server
batch DLEQ: server responds with all 9 unencrypted signed tokens sb₁N₁ ... sb₉N₉, and a batch DLEQ to prove they were all signed with the same signature
redeem anonymized signed nonces: client finally sends the 5 unblinded (anonymized) signed nonces s₃N₃ ... s₇N₇ (after verifying DLEQ to ensure they weren't tagged)
server can now verify client decrypted 5 of the 9 encrypted signed nonces, but can't tell which

Concerns:

issuer and site could collude to de-anonymize? I think this is inherent in trust token redemption and none of this makes it any worse, though?
the batch DLEQ adds a round-trip, can that be eliminated? (Other than by sending individual DLEQs for each encrypted signed blinded token? Or can that be made more compact?)

also, does the batch DLEQ need to be a pseudorandom linear combination, like in Privacy Pass? The explainer says that's necessary without expanding on why, I haven't read the paper yet


Enhancement: Weighted Sum Thresholds

All this just lets the server verify the count the client claimed, that it indeed has trust tokens for 5 out of 9 accepted issuers. But what if the server trusts some issuers more than others?
The server could request multiple blinded nonces for some issuers, then the total number of unblinded nonces the client is able to redeem would be a weighted sum of the issuers with available trust tokens. But that would leak unlimited fingerprinting bits to the server.
Idea  Round down to the nearest threshold in a list of thresholds provided by the server, where the client limits how many thresholds are supported, e.g. 7 thresholds divides the number line into 8 regions and thus leaks exactly 3 bits of fingerprintable info:
max_thresholds = 7 ->
                <- thresholds = [3, 5, 8]
                   issuer_weights = { issuer1: 2,
                                      issuer2: 3,
                                      issuer3: 3,
                                      issuer4: 4 }

(client sends 2+3+3+4 = 12 blinded nonces)
b1N1 ...  b12N12 ->
                (2 for issuer1) E1, E2 = encrypt([sb1N1, sb2N2], issuer1.publicKey)
                (3 for issuer2) E3, E4, E5 = encrypt([sb3N3, sb4N4, sb5N5], issuer2.publicKey)
                (3 for issuer3) E6, E7, E8 = encrypt([sb6N6, sb7N7, sb8N8], issuer3.publicKey)
                (4 for issuer4) E9, E10, E11, E12 = encrypt([sb9N9, sb10N10, sb11N11, sb12N12], issuer4.publicKey)
                <- E1 ... E12

(client has tokens for issuer2 and issuer3,
 weighted sum = 6, round down to nearest threshold = 5)

Between client and trust token issuers:

    E3, E4, E5 -> (to issuer2)
                    sb3N3, sb4N4, sb5N5 = decrypt([E3, E4, E5], issuer2.privateKey)
                    <- (from issuer2) sb3N3, sb4N4, sb5N5
    E6, E7 -> (to issuer3)
                    sb6N6, sb7N7 = decrypt([E6, E7], issuer3.privateKey)
                    <- (from issuer3) sb6N6, sb7N7

Back to client and first-party server:
(can now unblind to anonymize)

sN3 = sb3N3 / b3
...
sN7 = sb7N7 / b7
hash([sN3 ... sN7]) ->
                <- sb1N1, ... , sb12N12,
                   DLEQ(b1N1:sb1N1 == (b2N2 + ... + b12N12):(sb2N2 + ... + sb12N12))

(finally, 5 anonymized signed nonces, proves weighted sum of issuers met threshold of 5)
sN3 ... sN7 ->

Enhancement: What About SRRs? Use Certificateless Public Key Cryptography

So far this protocol is necessarily interactive. What about something like Signed Redemption Records, which are obtained upfront then used and possibly reused? In particular, for weighted sum thresholds, does an issuer trusted with weight 100 really have to decrypt 100 nonces, or can they just send one thing that can be used to decrypt many nonces?
I think there's a way, using something called Certificateless Public Key Cryptography, where a "key generating center" (KGC) has "public parameters" and corresponding private "master key". Anyone can generate a public key from any "identity" (e.g. email address) by combining it with the public parameters and a secret. To compute the corresponding private key, you need the secret and the "partial private key" from the KGC, computed from your identity and the private master key.
The KGC is kinda like a certificate authority (CA): the KGC is supposed to verify your identity before sending you your partial private key, just like a CA is supposed to do before signing your public key. The big difference: you can't certify the link between your public key and identity without talking to the CA, but by using the KGC's public parameters you can generate a public key that verifiably corresponds to an identity without talking to the KGC. You won't be able to read any messages encrypted with that public key until you get the partial private key from the KGC, but in this case we don't need to.
Idea: If every trust token issuer is a KGC, the client can generate valid public keys for every issuer accepted by a site (e.g. all 9 of them), then compute private keys for just the issuers that the client actually has trust tokens for (e.g. 5 of them). Like SRRs, these public and private keys are obtained upfront, they encode an "identity" that can contain e.g. a timestamp, and by decrypting and unblinding the encrypted signed blinded nonces exchanged with the server, the client can prove trust from 5 out of the 9 accepted issuers, without revealing which ones. The client can even decrypt-and-unblind 100 such nonces encrypted for a particular issuer with a high trust weight, without the issuer having to worry about it.
In more detail:

every trust token issuer also acts as a KGC, and publishes public parameters the same way it publishes that public DLEQ reference point, sG
out of 9 accepted issuers requested by the site, let's say client has trust tokens available for 5 of them

the site supplies accepted public parameters for each of the 9 accepted issuers

(if it could tell which issuers the client already knew the public parameters for, that's fingerprintable; and the client shouldn't be required to make HTTP requests to arbitrary issuers it might not trust)


for the 5 available issuers, client should ensure the public parameters match remembered values. Who knows what could be revealed by maliciously-chosen "public parameters"


first, client generates "identities" consisting of the first-party origin (e.g. publisher.com), timestamp, and nonces for all 9 issuers. With a per-issuer secret nonce, they can then be used to generate verifiable public keys for all 9 issuers

is it important that different nonces be used for different issuers, to make it harder for them to collude and track the client? Probably not?


instead of getting SRRs from the 5 available issuers, client redeems its trust tokens for partial private keys from each issuer for the corresponding generated identity, which can then be combined with the per-issuer secret nonce to get the private keys

the issuer should verify that the timestamp encoded into the identity is reasonable

(a potential variant has the issuer generate the identity using the issuer's clock, but then the client would have to fake it for other 4 unknown issuers, which might be detectable to the server)


client saves all of this in first-party (non-JS-accessible) storage (such that it persists until this site's data is cleared; for compactness, a seed to generate these nonces from each issuer domain could be stored instead)
from now on, whenever the site asks for trust tokens from those 9 issuers:

client sends the identities and public keys to the server alongside the blinded nonces
server verifies:

the supplied public keys against the issuer KGC public parameters
the timestamp encoded into the supplied identities, for freshness
the publisher origin encoded into the supplied identities, for token reuse


server, instead of encrypting the signed blinded nonces with each issuer's own public key, encrypts with the supplied public keys
client can now decrypt with the encrypted signed blinded nonces with the 5 available private keys, without needing to talk to the issuers again


(The proposal currently describes SRRs as important for mitigating token exhaustion attacks, although to be honest I don't really get why. Whatever site (and duration) that you're caching SRRs for, you should also be perfectly happy to store (first-party) cookies for, so the site should be able to remember that you already authenticated with them, so shouldn't the client just refuse to redeem new tokens while possessing first-party cookies?)