gavinandresen/validation_storage_costs.md

## validation_storage_costs.md

      
    Raw
  

              validation_storage_costs.md
            
          
    Single cost metric for transactions and blocks
Building on the work of Jonas Nick, Greg Sanders and Mark Friedenbach
(see video: https://youtu.be/37LiYOOevqs?t=2h16m7s  slides: https://scalingbitcoin.org/hongkong2015/presentations/DAY2/3_tweaking_the_chain_3_nick.pdf for background)
The goal: create a consensus rule or rules to replace the hard-coded limits Satoshi hurriedly put into place in 2010 that reflects all we have learned since then.
Desirable properties for the rule or rules:

Simple (because simple designs are less likely to lead to bugs or security flaws)
Dynamic, adjusting as the ecosystem evolves (not set arbitrarily by developers or a committee of experts)
Safe, not subject to attack

Nick et. al. suggest a "full cost metric" that takes into account transaction size (bandwidth cost), and signature hashing and verification (cpu cost), based on measurements of actual hardware/bandwidth costs and with an upper threshold of the average cost metric of some number of recent blocks  (TODO: check with Jonas, is this actually what was being proposed?)
They suggest that miners be allowed to build blocks up to half that threshold, but propose that miners only be allowed to build larger blocks if their blocks decrease the size of the UTXO set. The more they decrease the UTXO set, the larger their blocks are allowed to be, up to the threshold.  (TODO: check... can't be right, because would mean a constant downward spiral in allowed validation cost)
This is an interesting proposal, but ignores two critical related issues:  how do miners decide which transactions to put into their blocks? And how do wallets decide how much to pay in fees? The non-linear nature of allowed block size at the half-threshold level makes selecting which transactions to include in the block a hard problem, and makes the wallet's task of choosing an appropriate transaction fee even harder.
I think it makes sense to start from basic principles, and realize that there are three relevant costs:

bandwidth cost  (related to transaction size, in bytes)
cpu cost (dominated by signature hash creation and ECDSA validation; see Nick et. al.'s presentation)
storage cost (dominated by UTXO-in-fast-memory storage costs)

The actual magnitude of those costs does not matter if the consensus rule is set based on the average (or median) of recent blocks; the consensus rule would not be "block must cost less than 0.00011 cents to validate and store" but might be "block must cost less to validate and store than twice the average cost of blocks in the last difficulty adjustment period."  As long as the same cost metric is used to measure historical blocks, new blocks being tested for consensus-validity, and individual transactions, everything works out. Wallets will use the cost metric to decide what transaction fee is necessary and miners will sort transactions by greatest fee-per-cost metric to maximize the fees they collect.
Using real-world money as "cost" allows CPU/memory/bandwidth to be combined into a single unified cost metric (whose units would be "reference hardware 2016 dollars").
Calculating a base-line bandwidth cost is straightforward; if the average worldwide broadband internet user pays the equivalent of $50 per month for 300GB of data and we assume each node will receive and then transmit each transaction once, on average, the cost per byte of transaction is  $50/150,000,000,000.  Alternatively, bandwidth costs for a cloud service provider like Amazon Web Services could be use to determine "cost metric ruler."
Calculating a base-line CPU cost is a little less straightforward. CPU costs for a cloud service provider could be used after running benchmarks to see how how many ECDSA validations or signature hashes can be performed for some amount of money. Alternatively, benchmarks could be run on a typical personal computer, and then a CPU cost calculated based on how many validations could be done in a typical useful lifetime for a personal computer (e.g. 4 or 5 years).
Calculating a base-line UTXO storage cost is the most difficult, but is also possible. We can measure the size of the average UTXO and measure how long the median transaction output remained in the UTXO set (assigning a time of infinity for any currently unspent outputs), and use the same technique to estimate how much it costs to rent or own that much high-speed memory for that amount of time. TODO: a more accurate accounting would be to use a discount rate based on drop in cost of memory over time for transactions that NEVER leave the UTXO set.
Calculating all of the above for a single reference point in time will produce constants for a cost function for a transaction or a block of transactions:
Cost(size_in_bytes, ecdsa_validations, sighash_bytes, utxo_change_in_bytes)

The cost function will never be perfect, and may become less perfect over time if (for example) CPU costs drop much faster than memory or bandwidth costs. Software optimizations may also change the actual costs. Fortunately, the cost metric does not have to be perfect, as long as it is consistent it will serve as a coordination mechanism between transaction creators and miners, and an anti-denial-of-service protection mechanism in the unlikely event of a "rogue miner" trying to produce an extremely expensive-to-validate block.
If the benefits of resetting the cost metric formula ever outweighed the costs, it could be changed via hard-fork. However, since the benefits are small (resetting the metric might just make transactions with lots of signature hashing pay a few satoshis less in fees, while making transactions with OP_RETURN data pay a few satoshis more) and the costs are large (all transaction-creating software would have to be updated to calculate fees using the new cost metric) that is unlikely.
TODO: work out an actual cost metric formula, then apply it to recent blocks and transactions and try to estimate how much disruption there might be if we switched overnight and wallets were just using today's simple fee-per-kilobyte metric.