cpacia/bip37.md

## bip37.md

      
    Raw
  

              bip37.md
            
          
    Here's an overview of the privacy issues with BIP37.
If you create a bloom filter and pack it with a bunch of addresses, it's not really possible for a third party to know what addresses
are in the filter. At best they can say an address is in the filter with a given probability.
But everything changes if you create two (or more) separate filters with different sizes and/or fp rates containing the same addresses.
Now the probability that a transaction would match both filters and still be a false positive is the false positive rate of
filter A * the false positive rate of filter B (and filter C, D, E, etc). So in otherwords if someone creates more than one filter
you can say with near certainty which addresses are in their filter.
This is actually how all bloom filtering wallets work. Every time you receive a new payment the wallet needs to generate a fresh address
to extend its keychain. But that fresh address needs to go into the filter if you want to get matches. But adding it to the filter, without
resizing the filter, increases the false positive rate. Eventually your false positive rate will be 100% if you keep adding new
addresses without resizing. So all SPV wallets are programmed to eventually resize the fitler. And when they do that, they blow their
privacy because now the remote peers have two filters (or more) created by the same node.
So maybe you could just pre-generate all the addresses the wallet will ever use up front and just pack a filter with all these addresses.
That way you never need to resize the filter. The wallet just persists the filter and always uses the same one.
This could plausible work if not for inputs. To build a functioning wallet you also need to know when your outgoing spends confirm.
The way Bip37 handles this is whenever an output matches the filter the outpoint (txid, output index) of the output is serialized and added
into the filter. Then the filter matching algorithm the full nodes run tests the outpoint of each input against the filter.
What this means is even if you do nothing and don't add any more addresses into your filter, the filter's false positive rate will grow
on it's own as outpoints from filter matches are serialized and added into the filter. Thus even in the "one big filter" case, you still need
to periodically resize the filter to get the false positive rate back down.... and blow your privacy.
My recommendation would be to scrap the current bip37 filter matching algorithm and instead have full nodes test the scripts (or data elements)
of all the outputs as well as the linked script pubkey for each input. This would mean that a wallet could then create the "one big filter" containing
all of its addresses and still get filter matches on incoming payments as well as spends and never have to resize or create a
new filter.
EDIT: Actually I don't think this easily works as historical blocks don't have the pkscripts of the inputs stored with them. They are stored in the stxo index but it would put a pretty heavy burden on the node to load the stxos and parse them every time it wants to build a merkle block.
There's still another possible privacy leak however, when merging inputs. The probability that more than one input would match the fitler is
again the false positive rate of the filter * the number of input matches. Thus if someone merges utxos you the nodes they connect to
could say with near certainty those transactions are part of the wallet. So it seems like we can improve on bip37 but only to an extent.