Ayms/fullnodes.md Secret

## fullnodes.md

      
    Raw
  

              fullnodes.md
            
          
    Proof of Participation (PoP)

The basic problem

Although this is not clear in the title this is a continuation of [1] bitcoin/bitcoin#8738 which was closed, then suggests that 8/10 days to resync the whole blockchain is something normal for not very new configurations.
Anyway, whether it's normal or not, the answer will not solve what will happen in the future, ie "normal users" full nodes will not be welcome to join the network any longer, increasing the recentralization of the bitcoin network.
At the time of writing the blocks + chain state represent ~100 GB, so with a usual 10 Mbps connection it should take less than a day to download it and resync from the downloaded state, which is far less than the 10 days required to download the blocks and process the chain state.
Rationale

This proposal is inspired from [Proof of Bitcoin node] (http://spreadcoin.info/news/wp-content/uploads/2015/10/Proof-of-Bitcoin-Node-DRAFTv1-1.pdf) and Proof of Activity (PoA) (and included references) which explains very well the situation in general for crypto currency networks, it is inspired too from some thoughts of ongoing project Convergence and specs
We will concentrate on the Bitcoin network here but this is  a piori valid for most of crypto currency networks.
While the recentralization of mining power is already a reality, obviously the recentralization of full nodes is ongoing.
The trend is unlikely to revert since there is absolutely no incentive for people to run a full node and the difficulty to build the chain state is increasing so fast that it will soon become impossible for normal peers to run a full node (see blockchain download speed, especially for free, even the 2015 incentive program only leaded to 10K nodes only.
As stated in the above thread the difficulty has nothing to do (for now) with bandwidth issues (or storage issues, the later not being a problem) but just with the difficulty of building the chain state from zero.
With the subsidies decreasing in time and mining becoming less and less profitable, some mining pools will go away, it appears unlikely that this effect will provoke a redecentralization of the mining power, this would probably be the contrary where a few that have invested in it will be concentrating the mining power.
The number of full nodes today is only ~5000 nodes for Bitcoin (and maybe ~7000 for Ethereum), which are both ridiculous numbers for p2p networks and this is decreasing for Bitcoin, of course we have other peers like partial full nodes or SPV clients but they can't participate to the network (ie validating transactions, sending them forward and validating blocks), so they are of no use. Unfortunately (or fortunately for some) Ethereum showed how the blockchain is not immutable and can be controlled when they implemented their hard forks to defeat the DAO hacker, which resulted in two different Ethereum blockchains (ie one that followed the fork and another one that did not, future will say what will happen), this proves against the common belief that currently the blockchains can be controlled and that we are not really dealing with decentralized systems.
Is this phenomenon a priority? Maybe not but we think it should be, even Satoshi's paper itself states that full nodes are not supposed to be operated by "normal" peers, despite of all the respect we have for the proposal we think that this vision is wrong, the direct consequence of this and the above is that at the end the full nodes too might be controlled by the miners and/or centralized entities
Because, until they are done with the network (ie until the fees to compensate the subsidies' decrease are too low to be profitable and/or until they have made enough money before the collapse of the newtork), we can be sure that they will continue to sustain the network, ie implement full nodes.
This is maybe a wrong pessimistic vision but it's probably not unrealistic, we will not detail here the well known vectors of attack in case of recentralization.
Even miners are not really obliged to run a full node, they could be lazy and just prune, they would need to have the full state at the beginning but then can remove blocks afterwards and just concentrate on new ones, the only risk for them being if the blockchain rolls back to a state anterior to the blocks they have.
For bitcoin core client, pruning is not the default but is likely to become it, this does not eliminate the need to have the full blockchain and state when you start it.
This proposal intends to define solutions to motivate peers to run a full node and/or get efficient means to retrieve the blocks and chain state, it's probably not the target to have millions of full nodes but to make sure there are enough and they are independent
Slight comparison with the Tor network

The comparison might look irrelevant but is not, the order of magnitude is the same, the Tor network is basically 1000 Guards, 1000 Relays, 1000 Exits, working nodes at a given time knowing that several can perform the three functions, if you like you can multiply these numbers by 5 but the order of magnitude remains the same.
But unlike crypto currency networks, the Tor network does not pretend to be decentralized, because that's a centralized network, you cannot extend a circuit to another node if you are not registered/approved in/by the authority nodes, so basically you can't do anything in that case.
The reason for this is probably for the Tor project to be able to control/detect/monitor attacks (and maybe users), so most likely the Tor network will never scale and is not willing to, and the project does not motivate any longer peers to run their own Tor node, because their upload bandwidth will just screw up the circuits' efficiency.
We can see here that the most famous crypto currency networks are not even bigger than the Tor network, even after years, while running full nodes has nothing to do with the problematic of running Tor nodes as explained above, and they pretend to be decentralized networks, not centralized like Tor.
Slight comparison with the bittorrent network

A slight comparison with the bittorrent network, which is estimated to be ~200 M peers, is that the incentive for the peers is of course to download, as we all know quasi all of them freeride but they must be there when they download and then participate to the network, but this is not even mandatory to participate, they can totally freeride, do nothing and just take benefit of the network (see https://github.com/Ayms/torrent-live), like it's not even mandatory for mining pools to run full nodes and for miners/coins owners to participate, so probably an incentive must be found to run full nodes and for those that have coins to participate, even the incentive program produced only 10 K full nodes...
Background

Implementing a full node does not mean maintaining the blocks and chain state only but participating to the network.
It is not trivial for everybody to run one, upnp or port forwarding must be activated at their router level so they can receive messages, they might think that they are running a full node while they are not, increasing the problem since others might think that they are full nodes and attempt to connect. But unlike bittorrent clients, bitcoin core is not intrusive and for example activation of upnp is not the default, which is good.
Bitcoin ecosystem is very fragile, any wrong implementation can be catastrophic and potentially not recoverable, that's why any change is tested again and again before it is introduced into the network, so incentive to run full node should represent slight modifications, which is not the case of solutions mentioned in [Proof of Bitcoin node] (http://spreadcoin.info/news/wp-content/uploads/2015/10/Proof-of-Bitcoin-Node-DRAFTv1-1.pdf, not talking of the spreadcoin principles that look abandoned)
PoA addresses too (partially?) the "total freeriders" issue, ie (for bitcoin) peers that just send transactions and participate to nothing, per analogy like torrent-live for bittorrent
But this does not address completely two problems:

incentive for new comers to run full nodes
download the blockchain and state in an acceptable time frame

Because PoA would reward only peers that have already some satoshis (and although it states the contrary it still seems to encourage the "rich get richer" policy since for a basic distribution a peer with more satoshi has more chance to get chosen) and does not propose anything for new peers to get efficiently the blockchain + state.
It was proposed in [3] to test PoA in LiteCoin, but LiteCoin declined for unclear centralization issues [4], maybe the problem is that peers can collude inside pools to drain the PoA rewards.
But maybe the difficulty to start a full node can turn into an advantage, if today is something like 10 days to start it for a normal peer, it's not very difficult to envision how fast it will increase, becoming completely impossible for normal peers to run a full node.
Therefore, the proposal is to allow the peers to download faster all the information, run full nodes and be rewarded for this, knowing that the difficulty will increase in the future, even if more efficient, the download will be longer, storage will increase and unfortunate desync of the blockchain state (like shutting down the computer or the bitcoin client, crash, etc) will require more and more time, so the best method to spare computing capacities and to get rewarded is just to be there all the time, participate, and take necessary precautions for this (the "advantage" mentioned above).
The different solutions proposed (see Fast bootstrapping with a pre-generated UTXO-set database) seem to involve the same mechanisms: freeze the blockchain state, process/store it in a given format (issue 1: needs some computation and storage) that is not defined (issue 2: to define/standardize), validate it by some means (issue 3: needs to be defined too, not sure solutions involving the core team keys is really adequate), distribute it via torrents and/or bitcoin peers (issue 4: blocks + state will become a very huge torrent, and is already in fact).
And still those solutions don't explain why someone would have any interest to run a full node.
Proposal

In what follows the "chainstate" means the blockchain state frozen at a certain height in a given format as mentioned above, or just the chainstate directory in bitcoin core (assuming it can be frozen for a given height and is the same for everybody, which apparently is not the case), and "torrent" is not necessarily the same than a bittorrent torrent while it seems to provide some advantages if both are compatible.
The suggestion is simple and would be the following (a "peer" below is a full node peer):

bitcoin peers freeze the blockchain state every X blocks (2016 for example)
bitcoin peers create a "torrent" for this state, ie if we follow the bittorrent model they create the related metadata and infohash
in fact they create two "torrents", one which is the delta of the blocks directory from previous torrent and one with the chainstate
so bitcoin peers don't have to duplicate data, neither to compute anything to serve the pieces or store them for intermediate states (except to maintain the chainstate format), they are supposed to serve the related torrents directly from their blockchain directory,  be seeded also in the bittorrent network
if they don't have the chainstate torrent (ie someone requests an older chainstate), they just don't serve it
they all end up with the same infohashes and metadata, the torrents cannot be faked (first because it's supposed to be impossible to fake the infohash+metadata, second because the peers have all the same)
this creates a chain of tuples of incremental/mutable "torrents"
the peers maintain a "rolling table" and every X blocks they reach a consensus of what N peers to reward, the process must allow that over a period all full nodes peers are rewarded

To get rewarded the peers must prove three things:
1- they have the blocks and chainstate for a given height
To check this when they are elected to be rewarded they must prove that they can compute the infohash of the blocks and chainstate for the frozen height concatenated with a random set of bytes so it cannot be precomputed.
2- they participate to the network
Unlike usual systems the peers do not register themselves in the table but get registered by others, ie when a peer continuously receive valid transactions and blocks from another peer (not originated by themselves) it registers this peer in the table, ie IP:port of the other peer, if the peer closes the connection or misbehave it is removed from the table (same concepts as Convergence.
3- they serve the blocks and chainstate with an acceptable upload bandwidth
The peers ask a random block piece and chainstate piece (piece can be a 16 kB data part of the metadata pieces) to the peers and check that what they receive is correct. Some peers could just fake this by asking to someone else the requested information but anyway this still proves that they serve pieces. The peers must check that the resulting upload bandwidth is superior to 500 kbps.
(To make sure they are up to date, peers exchange the tuples and check that they are the same, a bit like what they are doing looking at the block height they receive from other peers to check that they are up to date with the blockchain.)
For what follows, this is an implementation suggestion, probably others can be elaborated
Peers implement a DHT, similar to bittorrent one (so simple and efficient, but insecure, in what follows we don't detail the length of the hash operation which is not fundamental at this stage)
Their nodeID is a Nth hash of their public key (IP address) where N is the number corresponding to the nth first bytes (TBD) of the last PoP block mined, they renew it every X (TBD) time a new block is mined and discard their routing tables and all associated information, this is to prevent that they can choose/foresee their nodeID and position themselves where they like (see Convergence, it's a bit different, the nodeIDs are the fingerprint of the temporary public key used for onion keys)
Peers store [tuple(?):IP:port:nodeId:public bitcoin key]:(signature of [previous]) in the closest nodes from their [public key or the hash of tuple + block_id last mined (hash of block infohash | chain state infohash) ]
The mechanism to reward them would be about the same than for PoA:

miners mine (no should not depend on miners) one empty block (see 1 and 2 page 6 in []) every X (2016) blocks
while receiving the block the peers perform a lookup in the DHT and get the K (TBD) closest peers to block_ID (ie mined hash in reverse order)
peers that find themselves as closest check that the stored nodeIds match the advertised one by sending a message to other closest peers, and is indeed a Nth hash of the public key (so if someone cheats about its nodeID it is eliminated)
closest peers check that they advertise all the same tuples
closest peers ask a random block to others below the last frozen block height

progressive hash of all transactions since the begining
If the blockchain rolls back, so do the infohashes.
pruning
dash incentive: place nodes in order since last paid, select first 10%, new nodes at the end, reward the closest hash to double sha256 of block pow 100 blocks ago
proof of service
Unlike most of crypto currency processes we rely here on IP addresses and not computing capabilities, because we believe it's much harder to have millions of IPs (see Study where monitors are known to be millions but at certainly not in reality) than having huge processing capabilities with a few IPs, therefore this protects the PoP from being organized and centralized in pools.
A large attacker could attempt to control millions of IP but will probably not be able to compete with the rest of the network, because the basic concept here is for people to run a node and get money for it doing almost nothing, except maintaining the nodes, which is already some work.
which can maybe constitute another blockchain and be checked like this, or by storing the trusted state in something like the bittorrent DHT (see https://torrentfreak.com/mutable-torrents-proposal-makes-bittorrent-resilient-160813/), to be defined if necessary, the trusted state is supposed to be stored in the bitcoin network first
Attacks are still possible but then this would mean that the entire bitcoin network is already compromised
Basically it's somewhere equivalent than adding a trustable getChainState message requesting the chain state pieces like blocks for a given height
New comers that don't have the required computation capabilities will just follow the trusted torrent chain, download all the torrent blocks and the last state torrent, then compute the rest
Let's take the same numbers than here: https://bitcointalk.org/index.php?topic=109467.0 if the block size does not change in 200 years we would need ~100 days to download everything (approximation does not include the chainstate size) with a 10 Mbps connection, and probably in 200 years 10 Mbps would be something ridiculous, so this still works, and probably still works if the block size increases without exaggeration
To reach the required bandwidth the 8 connections model will have to be changed too
rollback
proof of activity
freeriding miners
exchange for real don't care bitcoin
[1] http://spreadcoin.info/news/wp-content/uploads/2015/10/Proof-of-Bitcoin-Node-DRAFTv1-1.pdf
[2] https://eprint.iacr.org/2014/452.pdf