This is a review of "Quantitative Analysis of the Full Bitcoin Transaction Graph" by Dorit Ron and Adi Shamir.
There are some incorrect details and analyses that warrant attention.
Oct. 31 UPDATE
The authors have introduced several revisions to their paper, available at the same URL as before.
The criticism below may be outdated in part or in full.
tl;dr (the short [edit: old] version)
The Ron/Shamir paper contains provably-false key assumptions. Further, their data source (website scraping) is a secondary data source known to have served invalid data in the past.
We do not claim this wholly invalidates their statistical results, but given the web wallet and cold storage examples, seems likely to introduce statistically significant changes in the results.
Appearance of data and conclusions obtained by a web crawl
rather than close analysis of the actual bitcoin system.
Quote #1: "On May 13th 2012 we downloaded the full public record of this system, which consisted of about 180,000 HTML files."
Quote #2: "Nodes broadcast transactions to this network, which records them in publicly available web pages, called block chains, after validating them with a proof-of-work system."
Quote #3: "The entire activity in the Bitcoin network is publicly available through the internet and is recorded in the form of a block chain, starting at block 0 (created back on the 3rd of January 2009). Each block reports on as little as a single transaction to as much as over a thousand transactions, and provides hyperlinks to other blocks and to other activities of each address."
While the authors do appear be to aware that bitcoin is based on public/private key cryptographic signatures, these quotes do not seem to indicate that the block chain, singular, is a globally shared binary structure, based on distributed consensus. Blocks are not web pages containing hyperlinks, even though http://blockexplorer.com/ and http://blockchain.info/ present them as such, for display purposes.
Further, while it may not be material for the results of this particular study, web block explorers are not authoritative sources for bitcoin data and have sometimes been known to display wildly false information.
Fundamental assumptions of transaction address ownership appear flawed
Quote #1: "A very important feature of the Bitcoin network is that a transaction involving multiple sending addresses can only be carried out by the common owner of all those addresses, as it is demanded by the Bitcoin system that “Whoever sent this transaction owns all of these addresses”. This legal requirement is also tech- nically ensured by the fact that each received amount must have a cryptographic digital signature that unlocks it from the prior transaction."
This is unconditional equating of "multiple sending addresses" to a "common owner" is false. We may demonstrate this from a theoretical perspective, and also with practical examples from today's block chain.
Each bitcoin transaction contains a number of inputs, and a number of outputs. Ron and Shamir assume that "multiple sending addresses can only be carried out by the common owner of all those addresses", when in fact bitcoin is explicitly designed to permit multiple owners, individually and independently adding signatures to a single transaction.
Read the source code, for the canonical signature checking details: https://github.com/bitcoin/bitcoin/blob/master/src/script.cpp#L1064 This wiki link describes signature checking detail: https://en.bitcoin.it/wiki/Contracts#Theory This forum post provides a concrete example of multiple owners coordinating to create a single transaction containing "multiple sending addresses": https://bitcointalk.org/index.php?topic=112007.0
It is acknowledged that these multi-owner transactions are rare at this time. However, there is are two existing use cases that are very statistically significant: shared coin pools (web wallets) and change transactions.
Web wallets provide an easy counter-example of the "multiple sending addresses == common owner" assumption. Websites dubbed "web wallets" provide a centralized, HTTP-based web interface to the otherwise decentralized P2P bitcoin network. Web wallets typically pool the bitcoins from all their web users into two large pools, a "hot wallet" and a "cold wallet."
Transactions sent to web wallet websites, and sent from web wallet websites, will clearly appear as clusters of bitcoins within the blockchain dataset.
Simplified example: Alice, Bob and Carla each deposit 10 BTC in Wallet.Example.Com. Wallet.Example.Com now controls a single shared pool of 30 BTC. Anyone who makes a withdrawal from Wallet.Example.Com, including new users David, Rick and James, will receive coins from that 30 BTC pool.
Another common practice seen in the field (and therefore, in any blockchain data analysis) is "wallet cold storage." Bitcoin exchanges, merchants and users often keep the majority of their bitcoins offline, a technique called "cold wallet" or "cold storage." A "hot wallet" connected to the Internet is then used to store the remaining bitcoins. If a thief steals the hot wallet, the damage is limited.
This is recommended -- and for large sites, necessary -- security practice. The cold wallet of a large exchange will indeed appear as coins that have not been spent in a long time.
Any bitcoins that are permanently lost, due to wallet deletion, also appear within the data as old, unmoved coins. One cannot distinguish between unspent "coins under the mattress" and lost/destroyed coins.
Finally, this analysis does not appear to include "change transactions." When someone sends bitcoin, the system will potentially create two outputs: (1) the bitcoin sent to the receipient, (2) bitcoins sent back to yourself. This preserves the rule that 100% of a bitcoin transaction's inputs are spent. You can never spend half of a 100BTC transaction output: you must spend 100BTC... even if that means sending some bitcoins back to yourself.
The Ron/Shamir paper does mention Deepbit and MtGox as large "users", but does not indicate that these are essentially multi-owner or multi-user sites. Building graphs that ignore the multi-user aspect of MtGox will produce conclusions different from those that take it into account.
davout emailed the authors, and got a response: https://bitcointalk.org/index.php?topic=118797.msg1280496#msg1280496
Meni Rosenfeld also emailed the authors: https://bitcointalk.org/index.php?topic=118797.msg1281470#msg1281470