ara4n/msc2134.md

## msc2134.md

      
    Raw
  

              msc2134.md
            
          
    We consider three attackers:

A malicious third party trying to discover the identity server mappings in the homeserver.

The malicious third party scenario can only be protected against by rate limiting lookups, given otherwise it looks identical to legitimate traffic.

An attacker who has stolen an IS db

In theory the 3PIDs could be stored hashed with a static salt to protect a stolen DB. This has been descoped from this MSC, and is largely an orthogonal problem. XXX: is this the right thing to have done?

A compromised or malicious identity server, who may be trying to determine the contents of a user's addressbook (including non-Matrix users)

Our approaches for protecting against a malicious identity server are:


We resign ourselves to the IS knowing the 3PIDs at point of bind, as otherwise it can't validate them.


To protect the 3PIDs of non-Matrix users:

We could hash the uploaded 3PIDs with a static pepper; however, a malicious IS could pre-generate a rainbow table to reverse these hashes.
We could hash the uploaded 3PIDs with a slowly rotating pepper; a malicious IS could generate a rainbow table in retrospect to reverse these hashes (but wouldn't be able to reuse the table)
We could send partial hashes of the uploaded 3PIDs (with full salted hashes to disambiguate the 3PIDs), have the IS respond with anonymised partial results, to allow the IS to avoid reversing the 3PIDs (a k-anonymity approach).  However, the IS could still claim to have mappings for all 3PIDs, and so receive all the salted hashes, and be able to reverse them via rainbow tables for that salt.


So, in terms computational complexity for the attacker, respectively:

The attacker has to generate a rainbow table over all possible IDs once, which can then be reused for subsequent attacks.
The attacker has to generate a rainbow table over all possible IDs for a given lookup timeframe, which cannot be reused for subsequent attacks.
The attacker has to generate multiple but partial rainbow tables, one per every ID being looked up (but limited to the preimages which match the partial hash), which cannot then be reused for any other attack.

For making life hardest for an attacker, option 3 (k-anon) wins.  However, it also makes things harder for the client & server:

The client has to calculate new salted hashes for all 3PIDs every time it uploads.
The server has to calculate new salted hashes for all partially-matching 3PIDs hashes as it looks them up.

It's worth noting that one could always just go and load up a malicious IS DB with a huge pre-image set of mappings and thus see what uploaded 3PIDs match, no matter what algorithm is used.
For k-anon this would put the most computational onus on the server (as it would effectively be creating a partial rainbow table for every lookup), but this is probably not infeasible - so we've gone and added a lot of complexity and computational cost for not much benefit, given the system can still be trivially attacked.
Finally, as more and more users come onto Matrix, their contact lists will get more and more exposed anyway given the IS server has to be able to identity Matrix-enabled 3PIDs to perform the lookup.
Conclusion: k-anon makes it harder to attack, but it's unclear that this is actually enough of an obstacle to meaningfully stop a malicious IS. Therefore we should KISS and go for a simple hash lookup with a rotating pepper (which is not much harder than a static pepper, especially if our initial implementation doesn't bother rotating the pepper. Rather than trying to make the k-anon approach work, we'd be better off spending that time figuring out how to store 3pids as hashes in the DB (and in 3pid bindings etc), or how to decentralise ISes in general.