Re: [bitcoin-dev] Small Nodes: A Better Alternative to Pruned Nodes

From: Aymeric Vitte <vitteaymeric@gmail•com>
To: bitcoin-dev@lists•linuxfoundation.org
Subject: Re: [bitcoin-dev] Small Nodes: A Better Alternative to Pruned Nodes
Date: Mon, 17 Apr 2017 12:14:25 +0200	[thread overview]
Message-ID: <19dbfef2-3791-8fe7-1c00-c4052c3d6c45@gmail.com> (raw)
In-Reply-To: <CAFVRnypbQQ-vsSLqv48cYaqTCty4R1DmFRqfAvxe4mAqyQNXxQ@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 9474 bytes --]

While I fully agree with the intent (increasing full nodes so a big
miner waking up in a bad mood can't threaten the world any longer every
day as it is now) I am not sure to get the interest of this proposal,
because:

- it's probably not a good idea to encourage the home users to run full
nodes, there are many people running servers far from their capacity
that could easily run efficient full nodes

- if someone can't allocate 100 GB today to run a full node, then we
can't expect him to allocate more in the future

- the download time is a real concern

- this proposal is a kind of reinventing torrents, while limiting the
number of connections to something not efficient at all, I don't see why
something that is proven to be super efficient (torrents) would be
needed to be reinvented, I am not saying that it should be used as the
bittorrent network is doing but the concepts can be reused

- I don't get at all the concept of "archival" nodes since it's another
useless step toward centralization

I think the only way to increase full nodes it to design an incentive
for people to run them


Le 17/04/2017 à 08:54, David Vorick via bitcoin-dev a écrit :
> *Rationale:*
>
> A node that stores the full blockchain (I will use the term archival
> node) requires over 100GB of disk space, which I believe is one of the
> most significant barriers to more people running full nodes. And I
> believe the ecosystem would benefit substantially if more users were
> running full nodes.
>
> The best alternative today to storing the full blockchain is to run a
> pruned node, which keeps only the UTXO set and throws away already
> verified blocks. The operator of the pruned node is able to enjoy the
> full security benefits of a full node, but is essentially leeching the
> network, as they performed a large download likely without
> contributing anything back.
>
> This puts more pressure on the archival nodes, as the archival nodes
> need to pick up the slack and help new nodes bootstrap to the network.
> As the pressure on archival nodes grows, fewer people will be able to
> actually run archival nodes, and the situation will degrade. The
> situation would likely become problematic quickly if bitcoin-core were
> to ship with the defaults set to a pruned node.
>
> Even further, the people most likely to care about saving 100GB of
> disk space are also the people least likely to care about some extra
> bandwidth usage. For datacenter nodes, and for nodes doing lots of
> bandwidth, the bandwidth is usually the biggest cost of running the
> node. For home users however, as long as they stay under their
> bandwidth cap, the bandwidth is actually free. Ideally, new nodes
> would be able to bootstrap from nodes that do not have to pay for
> their bandwidth, instead of needing to rely on a decreasing percentage
> of heavy-duty archival nodes.
>
> I have (perhaps incorrectly) identified disk space consumption as the
> most significant factor in your average user choosing to run a pruned
> node or a lite client instead of a full node. The average user is not
> typically too worried about bandwidth, and is also not typically too
> worried about initial blockchain download time. But the 100GB hit to
> your disk space can be a huge psychological factor, especially if your
> hard drive only has 500GB available in the first place, and 250+ GB is
> already consumed by other files you have.
>
> I believe that improving the disk usage situation would greatly
> benefit decentralization, especially if it could be done without
> putting pressure on archival nodes.
>
> *Small Nodes Proposal:*
>
> I propose an alternative to the pruned node that does not put undue
> pressure on archival nodes, and would be acceptable and non-risky to
> ship as a default in bitcoin-core. For lack of a better name, I'll
> call this new type of node a 'small node'. The intention is that
> bitcoin-core would eventually ship 'small nodes' by default, such that
> the expected amount of disk consumption drops from today's 100+ GB to
> less than 30 GB.
>
> My alternative proposal has the following properties:
>
> + Full nodes only need to store ~20% of the blockchain
> + With very high probability, a new node will be able to recover the
> entire blockchain by connecting to 6 random small node peers.
> + An attacker that can eliminate a chosen+ 95% of the full nodes
> running today will be unable to prevent new nodes from downloading the
> full blockchain, even if the attacker is also able to eliminate all
> archival nodes. (assuming all nodes today were small nodes instead of
> archival nodes)
>
> Method:
>
> A small node will pick an index [5, 256). This index is that node's
> permanent index. When storing a block, instead of storing the full
> block, the node will use Reed-Solomon coding to erasure code the block
> using a 5-of-256 scheme. The result will be 256 pieces that are 20% of
> the size of the block each. The node picks the piece that corresponds
> to its index, and stores that instead. (Indexes 0-4 are reserved for
> archival nodes - explained later)
>
> The node is now storing a fragment of every block. Alone, this
> fragment cannot be used to recover any piece of the blockchain.
> However, when paired with any 5 unique fragments (fragments of the
> same index will not be unique), the full block can be recovered.
>
> Nodes can optionally store more than 1 fragment each. At 5 fragments,
> the node becomes a full archival node, and the chosen indexes should
> be 0-4. This is advantageous for the archival node as the encoded data
> for the first 5 indexes will actually be identical to the block itself
> - there is no computational overhead for selecting the first indexes.
> There is also no need to choose random indexes, because the full block
> can be recovered no matter which indexes are chosen.
>
> When connecting to new peers, the indexes of each peer needs to be
> known. Once peers totaling 5 unique indexes are discovered, blockchain
> download can begin. Connecting to just 5 small node peers provides a
> >95% chance of getting 5 uniques, with exponentially improving odds of
> success as you connect to more peers. Connecting to a single archive
> node guarantees that any gaps can be filled.
>
> A good encoder should be able to turn a block into a 5-of-256 piece
> set in under 10 milliseconds using a single core on a standard
> consumer desktop. This should not slow down initial blockchain
> download substantially, though the overhead is more than a rounding error.
>
> *DoS Prevention:*
>
> A malicious node may provide garbage data instead of the actual piece.
> Given just the garbage data and 4 other correct pieces, it is
> impossible (best I know anyway) to tell which piece is the garbage piece.
>
> One option in this case would be to seek out an archival node that
> could verify the correctness of the pieces, and identify the malicious
> node.
>
> Another option would be to have the small nodes store a cryptographic
> checksum of each piece. Obtaining the cryptographic checksum for all
> 256 pieces would incur a nontrivial amount of hashing (post segwit, as
> much as 100MB of extra hashing per block), and would require an
> additional ~4kb of storage per block. The hashing overhead here may be
> prohibitive.
>
> Another solution would be to find additional pieces and brute-force
> combinations of 5 until a working combination was discovered. Though
> this sounds nasty, it should take less than five seconds of
> computation to find the working combination given 5 correct pieces and
> 2 incorrect pieces. This computation only needs to be performed once
> to identify the malicious peers.
>
> I also believe that alternative erasure coding schemes exist which
> actually are able to identify the bad pieces given sufficient good
> pieces, however I don't know if they have the same computational
> performance as the best Reed-Solomon coding implementations.
>
> *Deployment:*
>
> Small nodes are completely useless unless the critical mass of 5
> pieces can be obtained. The first version that supports small node
> block downloads should default everyone to an archival node (meaning
> indexes 0-4 are used)
>
> Once there are enough small-node-enabled archive nodes, the default
> can be switched so that nodes only have a single index by default. In
> the first few days, when there are only a few small nodes, the
> previously-deployed archival nodes can help fill in the gaps, and the
> small nodes can be useful for blockchain download right away.
>
> ----------------------------------
>
> This represents a non-trivial amount of code, but I believe that the
> result would be a non-trivial increase in the percentage of users
> running full nodes, and a healthier overall network.
>
>
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists•linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev

-- 
Zcash wallets made simple: https://github.com/Ayms/zcash-wallets
Bitcoin wallets made simple: https://github.com/Ayms/bitcoin-wallets
Get the torrent dynamic blocklist: http://peersm.com/getblocklist
Check the 10 M passwords list: http://peersm.com/findmyass
Anti-spies and private torrents, dynamic blocklist: http://torrent-live.org
Peersm : http://www.peersm.com
torrent-live: https://github.com/Ayms/torrent-live
node-Tor : https://www.github.com/Ayms/node-Tor
GitHub : https://www.github.com/Ayms


[-- Attachment #2: Type: text/html, Size: 13275 bytes --]