Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save chris-belcher/7e92810f07328fdfdef2ce444aad0968 to your computer and use it in GitHub Desktop.
Save chris-belcher/7e92810f07328fdfdef2ce444aad0968 to your computer and use it in GitHub Desktop.
Plan to improve the privacy of JoinMarket's tumbler script

Plan to improve the privacy of JoinMarket's tumbler script

24/02/2019

JoinMarket has a tumbler application which aims to send bitcoins in a way that delinks the origin and destination.

I have some thoughts on how and why to improve the tumbler algorithm.

Feel free to bikeshed some of these parameters (averages, counts, etc), as my important points are about other stuff.

Many of the thoughts in this post came to me while writing the big bitcoin privacy literature review article of winter-2018.

Following review from other people interested in JoinMarket, I'll try to implement these ideas soonish.

Wait longer between coinjoins and switch to uniform distribution

Waiting longer is free in money but costly in time. It increases the anonymity set by forcing an adversary to consider the longer time period.

The maximum wait time should be (finger in the air time) 2 hours, which is an average wait time of 1 hour.

Right now the tumbler waits a number of minutes drawn from an exponential distribution with an average of 30 minutes.

But the exponential distribution clumps up the txes towards the beginning of the time interval. In hindsight the exponential distribution was just cargo-culting; I copied the distribution of bitcoin block intervals.

Generate amount fractions from the uniform distribution

Right now amount fractions are generated from a power law with a default exponent of 100(!) That exponent makes no sense. Using a power law doesn't have a justification either.

The power law distribution is essentially cargo culting from me back in 2015, it came from a vague idea that everything in money and economics is a power law.

Also the random variable is a fraction so I'm not sure power law makes sense there.

Uniformly chopping up the interval [0, 1] is probably best. For example, to split up the interval [0, 1] into 3 chunks, generate 2 random numbers X and Y in [0, 1] then the three chunks are (X, Y-X, 1-Y) i.e. the random numbers are knives that tell you where to cut the real line.

Increase the number of coinjoin counterparties

This modification increases privacy by involving more entities. Consider the case where a taker creates 10 coinjoins each with 2 other counterparties, and compare that to creating 3 coinjoins each with 10 counterparties. The former case can have the same makers used many times for coinjoins, not so in the latter case which has more maker entities involved. So it's reasonable to say it has more privacy. Therefore leaning towards higher maker counts instead of more transactions seems that it would improve privacy.

This modification also makes coinjoins harder to untangle into their constituent makers and taker inputs. That analysis depends on solving a subset-sum problem using the amounts of the inputs and outputs. It can lead to clustering together inputs which are owned by individual coinjoin peers and linking them with their change addresses.

The paper (Möser, Malte and Rainer Böhme. “Join Me on a Market for Anonymity.” (2016).) studying JoinMarket transactions tried to do this subset-sum analysis. It studied the coinjoins in the years 2015-16 when JoinMarket mostly created 3-party coinjoins, yet it was still unable to fully solve for the makers and taker's inputs in about a third of all coinjoins. The Appendix shows an email exchange between me and an author of that paper who confirms that increasing the maker count would improve the privacy situation with subset-sum solving. Someone on the bitcointalk forum posted a script which also attempts to unmix inputs into makers and taker. Their script also from 2016 succeeds 24% of the time.

The sendpayment script should also have its default number of counterparties increased to match what tumbler does.

If N counterparties cant be found then search for N-1 counterparties

One thing stopping the counterparty-count parameter from being increased is that sometimes there may not be enough counterparties in the orderbook. If so then the tumbler script will stall.

Instead we should use an algorithm that searches for N counterparties and if it can't find them then search for N-1, recursively, down to a minimum counterparty count (probably 3) below which the function will return fail.

I think the default maximum number of counterparties to search for should be 15 (but as before, feel free to bikeshed). If the tumbler cant find that many counterparties and settles for 12 or 10 instead then that's fine.

Reduce number of coinjoins

The previous modication of increasing the counterparty count has the effect of using more block space and hence costing more in miner fees.

We can partially counteract that by reducing the total number of transactions. And also as previously examined: more counterparties beats more transactions.

I think the number of coinjoins per mixdepth should be 2 or 3.

Add a routine at the start of the tumbler schedule where all mixdepths are fully-spent with a no-change-coinjoin

Equal-output coinjoins reveal their change addresses. A way to avoid this is to create no-change-coinjoins also called sweep-coinjoins. A naïve question might be why doesn't the tumbler script only create such no-change-coinjoins? The answer is then it would be easy to follow the trail of coins, because makers virtually-never create no-change-coinjoins. So to unmix such a tumbler script only requires following the coinjoin outputs which are later used to create no-change-coinjoin transactions. What tumbler should aim to do is create coinjoins that are in the same anonymity set as yield-generator coinjoins (and to a lesser extent send-payment coinjoins), and that requires that they create change addresses.

However without loss of privacy we can create no-change-coinjoins using coins that are sent into the wallet from outside JoinMarket. Consider when a user buys coins from an exchange where they give up all their personal dox because of AML/KYC, and they send the coins straight into a JoinMarket wallet. The exchange will know that the spending transactions are JoinMarket coinjoins, and if there was a change address then the exchange could still link it to the user. So no-change-coinjoins are appropriate here.

Therefore the proposal is that stage 1 of the tumbler script will sweep every mixdepth into one no-change-coinjoin, and stage 2 will be the same as today with with-change-coinjoins of randomly-generated amounts being sent to the next mixdepth and ultimately to the destination.

The tumbler already makes sweep coinjoins for the last coinjoin that deals with a given mixdepth, this proposal then is not too different from the status quo. It can also be seen as a way to increase the number of mixdepths used but in a relatively cheap way with a single coinjoins that uses only one coinjoin's worth of block space.

The no-change-coinjoin proposal will also help with amount-based tracking. Imagine if the user bought 1.78213974 BTC from an exchange and they create a no-change-coinjoin with 15 other counterparties, that would result in 16 UTXOs having a face value of (slightly less than) 1.78213974 BTC. An adversary who uses tracking based on amounts now has to contend with 15 other UTXOs worth the same amount.

The wait time between all the no-change-coinjoins should be made quite long so they're not linked together (i suggest max 4 hours, with an average of 2 hours), and the wait time between with change-coinjoins can be shorter.

Sometimes create coinjoins with round number amounts

When people use the sendpayment script they choose the coinjoin amount themselves. Humans choosing payment amounts gives rise to round numbers, either in bitcoin (eg. 1 BTC, 0.1 BTC, 0.01 BTC) or in another currency (2.24159873 BTC which when converted to USD may be close to $100). Tumbler should replicate this behaviour.

It should be done in two ways: 1) Occasionally chop off some decimal places (ie. uses the round() function) so that there are only 1 to 3 significant figures 2) Query bitcoin exchange rates and choose a round number amount of some fiat currency which is near the randomly-generated number being sent.

There is a paper that analyses the blockchain with round numbers to try to detect the geographical economic location of bitcoin users ("Gervais, Arthur & Ritzdorf, Hubert & Lucic, Mario & Lenders, Vincent & Capkun, Srdjan. (2016). Quantifying Location Privacy Leakage from Transaction Prices. 9879. 382-405. 10.1007/978-3-319-45741-3_20. "). The paper has some measurements which we could copy for choosing which fiat currencies to use in which proportion, but its probably good enough to guess the weights ourselves eg. USD 70%, EUR 20%, JPY 3% and so on.

Appendix

>
> Hello,
>
> I'm reading your 2016 paper Join Me On A Market For Anonymity.
> This is written:
>
> "To tell apart takers’ inputs from makers’ inputs, we use subset
> matching (cf. [5, 26]). We could unanimously identify the taker’s subset
> of inputs and outputs for 5,788 transactions (67 %). In these cases, the
> taker’s subset is the combination that loses value. The makers’ subsets
> usually increase as they collect a maker fee from the taker."
>
> I'm very interested in the 67% figure, why couldn't you separate the
> taker's and maker's UTXOs in 100% of cases? What happened in the 33% of
> transactions where it wasn't possible to find the taker's UTXOs?
>
> Thanks in advance,
>
> Regards
> Chris Belcher

Hi Chris,
if I remember correctly, there were two cases where we couldn’t tell them apart:

a) there were multiple possibilities to group inputs/outputs into subsets
b) there were too many inputs and outputs, making subset matching computationally infeasible (at least with my algorithm)

Best,
Malte
@chris-belcher
Copy link
Author

Not convinced a power law isn't a good idea ... I'll have a think about it.

One thing worth noting is that in nature we usually only see power laws that span many different orders of magnitude. While in tumbler the amount_fraction has an upper bound of 1. And it doesn't make sense to create many coinjoins with amount_fractions such as 0.00001 (or 10^-N where N is bigger than about 3) because thats not very efficient with block space.

Just to flesh out the last point more. Imagine a tumbler user was tumbling 20btc, it might make sense that they create a few thousand coinjoin transactions with output amounts around 0.01btc, in order to better blend in with other more numerious (and presumably poorer) joinmarket users who transact 0.01btc. But those ~thousand transactions would be very costly in block space. Joinmarket works by having the tumbler user pay makers for their anonymity set, as that's a more efficient way to build up an anonymity set rather than trying to blend in with other takers. Therefore its not worth using amount_fractions with values such as 0.000001 and so power laws aren't a very good idea.

Things to remember: IRC throttling is a fairly meaningful limitation in our infrastructure (and we have to be conservative, we can't fine tune as we're using various), so as you know, numbers above 10 get unrealistic quickly (we have several timeouts that may need fiddling around with, which isn't realistic).

Thanks for pointing this out. I may spend some time playing around with it. In theory it should be acceptable for the joinmarket peers to wait a little bit longer to smooth out bandwidth spikes. But if that's not possible then a value of around 10 should be fine too.

I kinda like this idea, though I'm not sure if it helps very much.

Round numbers could sometimes be used to distinguish between coinjoins created by sendpayment and by tumbler, because it's likely that sendpayment coinjoins use round number amounts more often (for example if the sendpayment user is buying something with bitcoin) while tumbler's are completely randomly generated.

There is also that paper where round numbers are used to estimate the geographical distribution of bitcoin users. Also using round numbers to deduce change outputs is fairly well established.

@chris-belcher
Copy link
Author

One thing I have maybe changed my mind about is this:

But the exponential distribution clumps up the txes towards the beginning of the time interval. In hindsight the exponential distribution was just cargo-culting; I copied the distribution of bitcoin block intervals.

Exponential distribution arises from poisson processes. In theory uniform leaks information that possion doesn't.

Regardless of which distribution we decide to use, the average wait time should definitely be longer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment