public
Last active

Peer review of "Quantitative Analysis of the Full Bitcoin Transaction Graph"

  • Download Gist
ron-shamir-review.md
Markdown

This is a review of "Quantitative Analysis of the Full Bitcoin Transaction Graph" by Dorit Ron and Adi Shamir.

There are some incorrect details and analyses that warrant attention.

Oct. 31 UPDATE

The authors have introduced several revisions to their paper, available at the same URL as before.

The criticism below may be outdated in part or in full.

tl;dr (the short [edit: old] version)

The Ron/Shamir paper contains provably-false key assumptions. Further, their data source (website scraping) is a secondary data source known to have served invalid data in the past.

We do not claim this wholly invalidates their statistical results, but given the web wallet and cold storage examples, seems likely to introduce statistically significant changes in the results.

Appearance of data and conclusions obtained by a web crawl

rather than close analysis of the actual bitcoin system.

Quote #1: "On May 13th 2012 we downloaded the full public record of this system, which consisted of about 180,000 HTML files."

Quote #2: "Nodes broadcast transactions to this network, which records them in publicly available web pages, called block chains, after validating them with a proof-of-work system."

Quote #3: "The entire activity in the Bitcoin network is publicly available through the internet and is recorded in the form of a block chain, starting at block 0 (created back on the 3rd of January 2009). Each block reports on as little as a single transaction to as much as over a thousand transactions, and provides hyperlinks to other blocks and to other activities of each address."

While the authors do appear be to aware that bitcoin is based on public/private key cryptographic signatures, these quotes do not seem to indicate that the block chain, singular, is a globally shared binary structure, based on distributed consensus. Blocks are not web pages containing hyperlinks, even though http://blockexplorer.com/ and http://blockchain.info/ present them as such, for display purposes.

Further, while it may not be material for the results of this particular study, web block explorers are not authoritative sources for bitcoin data and have sometimes been known to display wildly false information.

Fundamental assumptions of transaction address ownership appear flawed

Quote #1: "A very important feature of the Bitcoin network is that a transaction involving multiple sending addresses can only be carried out by the common owner of all those addresses, as it is demanded by the Bitcoin system that “Whoever sent this transaction owns all of these addresses”. This legal requirement is also tech- nically ensured by the fact that each received amount must have a cryptographic digital signature that unlocks it from the prior transaction."

This is unconditional equating of "multiple sending addresses" to a "common owner" is false. We may demonstrate this from a theoretical perspective, and also with practical examples from today's block chain.

Each bitcoin transaction contains a number of inputs, and a number of outputs. Ron and Shamir assume that "multiple sending addresses can only be carried out by the common owner of all those addresses", when in fact bitcoin is explicitly designed to permit multiple owners, individually and independently adding signatures to a single transaction.

Read the source code, for the canonical signature checking details: https://github.com/bitcoin/bitcoin/blob/master/src/script.cpp#L1064 This wiki link describes signature checking detail: https://en.bitcoin.it/wiki/Contracts#Theory This forum post provides a concrete example of multiple owners coordinating to create a single transaction containing "multiple sending addresses": https://bitcointalk.org/index.php?topic=112007.0

It is acknowledged that these multi-owner transactions are rare at this time. However, there is are two existing use cases that are very statistically significant: shared coin pools (web wallets) and change transactions.

Web wallets provide an easy counter-example of the "multiple sending addresses == common owner" assumption. Websites dubbed "web wallets" provide a centralized, HTTP-based web interface to the otherwise decentralized P2P bitcoin network. Web wallets typically pool the bitcoins from all their web users into two large pools, a "hot wallet" and a "cold wallet."

Transactions sent to web wallet websites, and sent from web wallet websites, will clearly appear as clusters of bitcoins within the blockchain dataset.

Simplified example: Alice, Bob and Carla each deposit 10 BTC in Wallet.Example.Com. Wallet.Example.Com now controls a single shared pool of 30 BTC. Anyone who makes a withdrawal from Wallet.Example.Com, including new users David, Rick and James, will receive coins from that 30 BTC pool.

Another common practice seen in the field (and therefore, in any blockchain data analysis) is "wallet cold storage." Bitcoin exchanges, merchants and users often keep the majority of their bitcoins offline, a technique called "cold wallet" or "cold storage." A "hot wallet" connected to the Internet is then used to store the remaining bitcoins. If a thief steals the hot wallet, the damage is limited.

This is recommended -- and for large sites, necessary -- security practice. The cold wallet of a large exchange will indeed appear as coins that have not been spent in a long time.

Any bitcoins that are permanently lost, due to wallet deletion, also appear within the data as old, unmoved coins. One cannot distinguish between unspent "coins under the mattress" and lost/destroyed coins.

Finally, this analysis does not appear to include "change transactions." When someone sends bitcoin, the system will potentially create two outputs: (1) the bitcoin sent to the receipient, (2) bitcoins sent back to yourself. This preserves the rule that 100% of a bitcoin transaction's inputs are spent. You can never spend half of a 100BTC transaction output: you must spend 100BTC... even if that means sending some bitcoins back to yourself.

The Ron/Shamir paper does mention Deepbit and MtGox as large "users", but does not indicate that these are essentially multi-owner or multi-user sites. Building graphs that ignore the multi-user aspect of MtGox will produce conclusions different from those that take it into account.

Additional resources

davout emailed the authors, and got a response: https://bitcointalk.org/index.php?topic=118797.msg1280496#msg1280496

Meni Rosenfeld also emailed the authors: https://bitcointalk.org/index.php?topic=118797.msg1281470#msg1281470

The authors are also missing the concept of change in transactions, which is the most probable explanation for the "Long Chains" and other described phenomena.

On top of that they talk about Bitcoin in terms of Bitcoins, while for the period they studied the value changed dramatically. Their example of the 90k transaction in Nov 2010 with no descendants is extremely misleading when that was only about a month after MtGox trading picked up to the point where coins really had any nominal value at all. For all we know that was just someone experimenting with some transaction making code, and accidentally losing their wallet.

The key problem IMHO is the 78% observation that is grossly misinterpreted and now broadcast (in this misinterpretation) by the mass media: When a bitcoin address is used for spending, ALL THE MONEY in that address is always spent by design (see https://en.bitcoin.it/wiki/Transactions), and it is common practice to send the change to a brand new address, which, by being brand new, cannot be associated with any other addresses through the methods used in this paper. So if everybody kept to the standard practices, 100% of bitcoins would be in addresses that have never been used for spending!

This means that the 78% the article talks about are by no means out of circulation, stored under some mattress or anything like that. The astonishing thing is that 22% of bitcoins are in RE-USED addresses!

ALL THE MONEY in that address is always spent by design

Not all the Bitcoins, just whole numbers of inputs. If I send two lots of 10 BTC to the same address and then try and spend 5 BTC, there will be 10 BTC left, and 5 BTC in a new change address.
Recent example of inputs being selectively used: http://blockchain.info/address/1Ccv5aRo37sWQZ6hxXTebGtm7oENmTGKfV

That would explain the high number of re-used addresses, yet still leaves valid the argument that non-spent addresses do not imply savings accounts.

xpost from reddit: I think jgarzik had a reading comprehension problem here. It was clear to me after reading the paper that "users" could be a business or an individual - it referred to a single controlling entity, not a natural person. They even went and named MtGox and Deepbit as examples of "users" and both are clearly not a single person.
The most damning statistic in that paper (and the one not being talked about at all) is that there are only 75 active users/businesses of the blockchain with any kind of volume - and that's a liberal estimate!

Web wallets are clearly multi-owner.

Updating gist to reflect that distinction.

Updated gist to reflect authors' revisions.

Fundamental assumptions of transaction address ownership appear flawed.This means that the 78% the article talks about are by no means out of circulation, stored under some mattress or anything like that. The astonishing thing is that 22% of bitcoins are in RE-USED addresses! http://www.coinsigner.com

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.