1TB HDD is now available for under $40 USD. How is the 100GB storage requirement preventing anyone from setting up full nodes? On Apr 16, 2017 11:55 PM, "David Vorick via bitcoin-dev" < bitcoin-dev@lists.linuxfoundation.org> wrote: > *Rationale:* > > A node that stores the full blockchain (I will use the term archival node) > requires over 100GB of disk space, which I believe is one of the most > significant barriers to more people running full nodes. And I believe the > ecosystem would benefit substantially if more users were running full nodes. > > The best alternative today to storing the full blockchain is to run a > pruned node, which keeps only the UTXO set and throws away already verified > blocks. The operator of the pruned node is able to enjoy the full security > benefits of a full node, but is essentially leeching the network, as they > performed a large download likely without contributing anything back. > > This puts more pressure on the archival nodes, as the archival nodes need > to pick up the slack and help new nodes bootstrap to the network. As the > pressure on archival nodes grows, fewer people will be able to actually run > archival nodes, and the situation will degrade. The situation would likely > become problematic quickly if bitcoin-core were to ship with the defaults > set to a pruned node. > > Even further, the people most likely to care about saving 100GB of disk > space are also the people least likely to care about some extra bandwidth > usage. For datacenter nodes, and for nodes doing lots of bandwidth, the > bandwidth is usually the biggest cost of running the node. For home users > however, as long as they stay under their bandwidth cap, the bandwidth is > actually free. Ideally, new nodes would be able to bootstrap from nodes > that do not have to pay for their bandwidth, instead of needing to rely on > a decreasing percentage of heavy-duty archival nodes. > > I have (perhaps incorrectly) identified disk space consumption as the most > significant factor in your average user choosing to run a pruned node or a > lite client instead of a full node. The average user is not typically too > worried about bandwidth, and is also not typically too worried about > initial blockchain download time. But the 100GB hit to your disk space can > be a huge psychological factor, especially if your hard drive only has > 500GB available in the first place, and 250+ GB is already consumed by > other files you have. > > I believe that improving the disk usage situation would greatly benefit > decentralization, especially if it could be done without putting pressure > on archival nodes. > > *Small Nodes Proposal:* > > I propose an alternative to the pruned node that does not put undue > pressure on archival nodes, and would be acceptable and non-risky to ship > as a default in bitcoin-core. For lack of a better name, I'll call this new > type of node a 'small node'. The intention is that bitcoin-core would > eventually ship 'small nodes' by default, such that the expected amount of > disk consumption drops from today's 100+ GB to less than 30 GB. > > My alternative proposal has the following properties: > > + Full nodes only need to store ~20% of the blockchain > + With very high probability, a new node will be able to recover the > entire blockchain by connecting to 6 random small node peers. > + An attacker that can eliminate a chosen+ 95% of the full nodes running > today will be unable to prevent new nodes from downloading the full > blockchain, even if the attacker is also able to eliminate all archival > nodes. (assuming all nodes today were small nodes instead of archival nodes) > > Method: > > A small node will pick an index [5, 256). This index is that node's > permanent index. When storing a block, instead of storing the full block, > the node will use Reed-Solomon coding to erasure code the block using a > 5-of-256 scheme. The result will be 256 pieces that are 20% of the size of > the block each. The node picks the piece that corresponds to its index, and > stores that instead. (Indexes 0-4 are reserved for archival nodes - > explained later) > > The node is now storing a fragment of every block. Alone, this fragment > cannot be used to recover any piece of the blockchain. However, when paired > with any 5 unique fragments (fragments of the same index will not be > unique), the full block can be recovered. > > Nodes can optionally store more than 1 fragment each. At 5 fragments, the > node becomes a full archival node, and the chosen indexes should be 0-4. > This is advantageous for the archival node as the encoded data for the > first 5 indexes will actually be identical to the block itself - there is > no computational overhead for selecting the first indexes. There is also no > need to choose random indexes, because the full block can be recovered no > matter which indexes are chosen. > > When connecting to new peers, the indexes of each peer needs to be known. > Once peers totaling 5 unique indexes are discovered, blockchain download > can begin. Connecting to just 5 small node peers provides a >95% chance of > getting 5 uniques, with exponentially improving odds of success as you > connect to more peers. Connecting to a single archive node guarantees that > any gaps can be filled. > > A good encoder should be able to turn a block into a 5-of-256 piece set in > under 10 milliseconds using a single core on a standard consumer desktop. > This should not slow down initial blockchain download substantially, though > the overhead is more than a rounding error. > > *DoS Prevention:* > > A malicious node may provide garbage data instead of the actual piece. > Given just the garbage data and 4 other correct pieces, it is impossible > (best I know anyway) to tell which piece is the garbage piece. > > One option in this case would be to seek out an archival node that could > verify the correctness of the pieces, and identify the malicious node. > > Another option would be to have the small nodes store a cryptographic > checksum of each piece. Obtaining the cryptographic checksum for all 256 > pieces would incur a nontrivial amount of hashing (post segwit, as much as > 100MB of extra hashing per block), and would require an additional ~4kb of > storage per block. The hashing overhead here may be prohibitive. > > Another solution would be to find additional pieces and brute-force > combinations of 5 until a working combination was discovered. Though this > sounds nasty, it should take less than five seconds of computation to find > the working combination given 5 correct pieces and 2 incorrect pieces. This > computation only needs to be performed once to identify the malicious peers. > > I also believe that alternative erasure coding schemes exist which > actually are able to identify the bad pieces given sufficient good pieces, > however I don't know if they have the same computational performance as the > best Reed-Solomon coding implementations. > > *Deployment:* > > Small nodes are completely useless unless the critical mass of 5 pieces > can be obtained. The first version that supports small node block downloads > should default everyone to an archival node (meaning indexes 0-4 are used) > > Once there are enough small-node-enabled archive nodes, the default can be > switched so that nodes only have a single index by default. In the first > few days, when there are only a few small nodes, the previously-deployed > archival nodes can help fill in the gaps, and the small nodes can be useful > for blockchain download right away. > > ---------------------------------- > > This represents a non-trivial amount of code, but I believe that the > result would be a non-trivial increase in the percentage of users running > full nodes, and a healthier overall network. > > _______________________________________________ > bitcoin-dev mailing list > bitcoin-dev@lists.linuxfoundation.org > https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev > >