public inbox for bitcoindev@googlegroups.com
 help / color / mirror / Atom feed
* [bitcoin-dev] Small Nodes: A Better Alternative to Pruned Nodes
@ 2017-04-17  6:54 David Vorick
  2017-04-17  7:11 ` Danny Thorpe
                   ` (4 more replies)
  0 siblings, 5 replies; 24+ messages in thread
From: David Vorick @ 2017-04-17  6:54 UTC (permalink / raw)
  To: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 7343 bytes --]

*Rationale:*

A node that stores the full blockchain (I will use the term archival node)
requires over 100GB of disk space, which I believe is one of the most
significant barriers to more people running full nodes. And I believe the
ecosystem would benefit substantially if more users were running full nodes.

The best alternative today to storing the full blockchain is to run a
pruned node, which keeps only the UTXO set and throws away already verified
blocks. The operator of the pruned node is able to enjoy the full security
benefits of a full node, but is essentially leeching the network, as they
performed a large download likely without contributing anything back.

This puts more pressure on the archival nodes, as the archival nodes need
to pick up the slack and help new nodes bootstrap to the network. As the
pressure on archival nodes grows, fewer people will be able to actually run
archival nodes, and the situation will degrade. The situation would likely
become problematic quickly if bitcoin-core were to ship with the defaults
set to a pruned node.

Even further, the people most likely to care about saving 100GB of disk
space are also the people least likely to care about some extra bandwidth
usage. For datacenter nodes, and for nodes doing lots of bandwidth, the
bandwidth is usually the biggest cost of running the node. For home users
however, as long as they stay under their bandwidth cap, the bandwidth is
actually free. Ideally, new nodes would be able to bootstrap from nodes
that do not have to pay for their bandwidth, instead of needing to rely on
a decreasing percentage of heavy-duty archival nodes.

I have (perhaps incorrectly) identified disk space consumption as the most
significant factor in your average user choosing to run a pruned node or a
lite client instead of a full node. The average user is not typically too
worried about bandwidth, and is also not typically too worried about
initial blockchain download time. But the 100GB hit to your disk space can
be a huge psychological factor, especially if your hard drive only has
500GB available in the first place, and 250+ GB is already consumed by
other files you have.

I believe that improving the disk usage situation would greatly benefit
decentralization, especially if it could be done without putting pressure
on archival nodes.

*Small Nodes Proposal:*

I propose an alternative to the pruned node that does not put undue
pressure on archival nodes, and would be acceptable and non-risky to ship
as a default in bitcoin-core. For lack of a better name, I'll call this new
type of node a 'small node'. The intention is that bitcoin-core would
eventually ship 'small nodes' by default, such that the expected amount of
disk consumption drops from today's 100+ GB to less than 30 GB.

My alternative proposal has the following properties:

+ Full nodes only need to store ~20% of the blockchain
+ With very high probability, a new node will be able to recover the entire
blockchain by connecting to 6 random small node peers.
+ An attacker that can eliminate a chosen+ 95% of the full nodes running
today will be unable to prevent new nodes from downloading the full
blockchain, even if the attacker is also able to eliminate all archival
nodes. (assuming all nodes today were small nodes instead of archival nodes)

Method:

A small node will pick an index [5, 256). This index is that node's
permanent index. When storing a block, instead of storing the full block,
the node will use Reed-Solomon coding to erasure code the block using a
5-of-256 scheme. The result will be 256 pieces that are 20% of the size of
the block each. The node picks the piece that corresponds to its index, and
stores that instead. (Indexes 0-4 are reserved for archival nodes -
explained later)

The node is now storing a fragment of every block. Alone, this fragment
cannot be used to recover any piece of the blockchain. However, when paired
with any 5 unique fragments (fragments of the same index will not be
unique), the full block can be recovered.

Nodes can optionally store more than 1 fragment each. At 5 fragments, the
node becomes a full archival node, and the chosen indexes should be 0-4.
This is advantageous for the archival node as the encoded data for the
first 5 indexes will actually be identical to the block itself - there is
no computational overhead for selecting the first indexes. There is also no
need to choose random indexes, because the full block can be recovered no
matter which indexes are chosen.

When connecting to new peers, the indexes of each peer needs to be known.
Once peers totaling 5 unique indexes are discovered, blockchain download
can begin. Connecting to just 5 small node peers provides a >95% chance of
getting 5 uniques, with exponentially improving odds of success as you
connect to more peers. Connecting to a single archive node guarantees that
any gaps can be filled.

A good encoder should be able to turn a block into a 5-of-256 piece set in
under 10 milliseconds using a single core on a standard consumer desktop.
This should not slow down initial blockchain download substantially, though
the overhead is more than a rounding error.

*DoS Prevention:*

A malicious node may provide garbage data instead of the actual piece.
Given just the garbage data and 4 other correct pieces, it is impossible
(best I know anyway) to tell which piece is the garbage piece.

One option in this case would be to seek out an archival node that could
verify the correctness of the pieces, and identify the malicious node.

Another option would be to have the small nodes store a cryptographic
checksum of each piece. Obtaining the cryptographic checksum for all 256
pieces would incur a nontrivial amount of hashing (post segwit, as much as
100MB of extra hashing per block), and would require an additional ~4kb of
storage per block. The hashing overhead here may be prohibitive.

Another solution would be to find additional pieces and brute-force
combinations of 5 until a working combination was discovered. Though this
sounds nasty, it should take less than five seconds of computation to find
the working combination given 5 correct pieces and 2 incorrect pieces. This
computation only needs to be performed once to identify the malicious peers.

I also believe that alternative erasure coding schemes exist which actually
are able to identify the bad pieces given sufficient good pieces, however I
don't know if they have the same computational performance as the best
Reed-Solomon coding implementations.

*Deployment:*

Small nodes are completely useless unless the critical mass of 5 pieces can
be obtained. The first version that supports small node block downloads
should default everyone to an archival node (meaning indexes 0-4 are used)

Once there are enough small-node-enabled archive nodes, the default can be
switched so that nodes only have a single index by default. In the first
few days, when there are only a few small nodes, the previously-deployed
archival nodes can help fill in the gaps, and the small nodes can be useful
for blockchain download right away.

----------------------------------

This represents a non-trivial amount of code, but I believe that the result
would be a non-trivial increase in the percentage of users running full
nodes, and a healthier overall network.

[-- Attachment #2: Type: text/html, Size: 7933 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [bitcoin-dev] Small Nodes: A Better Alternative to Pruned Nodes
  2017-04-17  6:54 [bitcoin-dev] Small Nodes: A Better Alternative to Pruned Nodes David Vorick
@ 2017-04-17  7:11 ` Danny Thorpe
  2017-04-17  7:27   ` David Vorick
                     ` (2 more replies)
  2017-04-17 10:14 ` Aymeric Vitte
                   ` (3 subsequent siblings)
  4 siblings, 3 replies; 24+ messages in thread
From: Danny Thorpe @ 2017-04-17  7:11 UTC (permalink / raw)
  To: David Vorick; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 8017 bytes --]

1TB HDD is now available for under $40 USD.  How is the 100GB storage
requirement preventing anyone from setting up full nodes?

On Apr 16, 2017 11:55 PM, "David Vorick via bitcoin-dev" <
bitcoin-dev@lists•linuxfoundation.org> wrote:

> *Rationale:*
>
> A node that stores the full blockchain (I will use the term archival node)
> requires over 100GB of disk space, which I believe is one of the most
> significant barriers to more people running full nodes. And I believe the
> ecosystem would benefit substantially if more users were running full nodes.
>
> The best alternative today to storing the full blockchain is to run a
> pruned node, which keeps only the UTXO set and throws away already verified
> blocks. The operator of the pruned node is able to enjoy the full security
> benefits of a full node, but is essentially leeching the network, as they
> performed a large download likely without contributing anything back.
>
> This puts more pressure on the archival nodes, as the archival nodes need
> to pick up the slack and help new nodes bootstrap to the network. As the
> pressure on archival nodes grows, fewer people will be able to actually run
> archival nodes, and the situation will degrade. The situation would likely
> become problematic quickly if bitcoin-core were to ship with the defaults
> set to a pruned node.
>
> Even further, the people most likely to care about saving 100GB of disk
> space are also the people least likely to care about some extra bandwidth
> usage. For datacenter nodes, and for nodes doing lots of bandwidth, the
> bandwidth is usually the biggest cost of running the node. For home users
> however, as long as they stay under their bandwidth cap, the bandwidth is
> actually free. Ideally, new nodes would be able to bootstrap from nodes
> that do not have to pay for their bandwidth, instead of needing to rely on
> a decreasing percentage of heavy-duty archival nodes.
>
> I have (perhaps incorrectly) identified disk space consumption as the most
> significant factor in your average user choosing to run a pruned node or a
> lite client instead of a full node. The average user is not typically too
> worried about bandwidth, and is also not typically too worried about
> initial blockchain download time. But the 100GB hit to your disk space can
> be a huge psychological factor, especially if your hard drive only has
> 500GB available in the first place, and 250+ GB is already consumed by
> other files you have.
>
> I believe that improving the disk usage situation would greatly benefit
> decentralization, especially if it could be done without putting pressure
> on archival nodes.
>
> *Small Nodes Proposal:*
>
> I propose an alternative to the pruned node that does not put undue
> pressure on archival nodes, and would be acceptable and non-risky to ship
> as a default in bitcoin-core. For lack of a better name, I'll call this new
> type of node a 'small node'. The intention is that bitcoin-core would
> eventually ship 'small nodes' by default, such that the expected amount of
> disk consumption drops from today's 100+ GB to less than 30 GB.
>
> My alternative proposal has the following properties:
>
> + Full nodes only need to store ~20% of the blockchain
> + With very high probability, a new node will be able to recover the
> entire blockchain by connecting to 6 random small node peers.
> + An attacker that can eliminate a chosen+ 95% of the full nodes running
> today will be unable to prevent new nodes from downloading the full
> blockchain, even if the attacker is also able to eliminate all archival
> nodes. (assuming all nodes today were small nodes instead of archival nodes)
>
> Method:
>
> A small node will pick an index [5, 256). This index is that node's
> permanent index. When storing a block, instead of storing the full block,
> the node will use Reed-Solomon coding to erasure code the block using a
> 5-of-256 scheme. The result will be 256 pieces that are 20% of the size of
> the block each. The node picks the piece that corresponds to its index, and
> stores that instead. (Indexes 0-4 are reserved for archival nodes -
> explained later)
>
> The node is now storing a fragment of every block. Alone, this fragment
> cannot be used to recover any piece of the blockchain. However, when paired
> with any 5 unique fragments (fragments of the same index will not be
> unique), the full block can be recovered.
>
> Nodes can optionally store more than 1 fragment each. At 5 fragments, the
> node becomes a full archival node, and the chosen indexes should be 0-4.
> This is advantageous for the archival node as the encoded data for the
> first 5 indexes will actually be identical to the block itself - there is
> no computational overhead for selecting the first indexes. There is also no
> need to choose random indexes, because the full block can be recovered no
> matter which indexes are chosen.
>
> When connecting to new peers, the indexes of each peer needs to be known.
> Once peers totaling 5 unique indexes are discovered, blockchain download
> can begin. Connecting to just 5 small node peers provides a >95% chance of
> getting 5 uniques, with exponentially improving odds of success as you
> connect to more peers. Connecting to a single archive node guarantees that
> any gaps can be filled.
>
> A good encoder should be able to turn a block into a 5-of-256 piece set in
> under 10 milliseconds using a single core on a standard consumer desktop.
> This should not slow down initial blockchain download substantially, though
> the overhead is more than a rounding error.
>
> *DoS Prevention:*
>
> A malicious node may provide garbage data instead of the actual piece.
> Given just the garbage data and 4 other correct pieces, it is impossible
> (best I know anyway) to tell which piece is the garbage piece.
>
> One option in this case would be to seek out an archival node that could
> verify the correctness of the pieces, and identify the malicious node.
>
> Another option would be to have the small nodes store a cryptographic
> checksum of each piece. Obtaining the cryptographic checksum for all 256
> pieces would incur a nontrivial amount of hashing (post segwit, as much as
> 100MB of extra hashing per block), and would require an additional ~4kb of
> storage per block. The hashing overhead here may be prohibitive.
>
> Another solution would be to find additional pieces and brute-force
> combinations of 5 until a working combination was discovered. Though this
> sounds nasty, it should take less than five seconds of computation to find
> the working combination given 5 correct pieces and 2 incorrect pieces. This
> computation only needs to be performed once to identify the malicious peers.
>
> I also believe that alternative erasure coding schemes exist which
> actually are able to identify the bad pieces given sufficient good pieces,
> however I don't know if they have the same computational performance as the
> best Reed-Solomon coding implementations.
>
> *Deployment:*
>
> Small nodes are completely useless unless the critical mass of 5 pieces
> can be obtained. The first version that supports small node block downloads
> should default everyone to an archival node (meaning indexes 0-4 are used)
>
> Once there are enough small-node-enabled archive nodes, the default can be
> switched so that nodes only have a single index by default. In the first
> few days, when there are only a few small nodes, the previously-deployed
> archival nodes can help fill in the gaps, and the small nodes can be useful
> for blockchain download right away.
>
> ----------------------------------
>
> This represents a non-trivial amount of code, but I believe that the
> result would be a non-trivial increase in the percentage of users running
> full nodes, and a healthier overall network.
>
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists•linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>
>

[-- Attachment #2: Type: text/html, Size: 8891 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [bitcoin-dev] Small Nodes: A Better Alternative to Pruned Nodes
  2017-04-17  7:11 ` Danny Thorpe
@ 2017-04-17  7:27   ` David Vorick
  2017-04-20 15:50   ` Erik Aronesty
  2017-04-21 13:35   ` David Kaufman
  2 siblings, 0 replies; 24+ messages in thread
From: David Vorick @ 2017-04-17  7:27 UTC (permalink / raw)
  To: Danny Thorpe; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 683 bytes --]

Most people do not want to go out and buy new hardware to run a Bitcoin
node. The want to use the hardware that they already own, and usually that
hardware is going to have a non-generous amount of disk space. 500GB SSD
with no HDD is common in computers today.

But really, the best test is to go out and talk to people. Ask them if they
run a full node, and if they say no, ask them why not. In my experience,
the most common answer by a significant margin is that they don't want to
lose the disk space. That psychology is far more important than any example
of cheap hard drives. People don't want to go out and buy a hard drive so
that they can run Bitcoin. It's a non-starter.

[-- Attachment #2: Type: text/html, Size: 738 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [bitcoin-dev] Small Nodes: A Better Alternative to Pruned Nodes
  2017-04-17  6:54 [bitcoin-dev] Small Nodes: A Better Alternative to Pruned Nodes David Vorick
  2017-04-17  7:11 ` Danny Thorpe
@ 2017-04-17 10:14 ` Aymeric Vitte
  2017-04-19 17:30   ` David Vorick
  2017-04-18  7:43 ` Jonas Schnelli
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 24+ messages in thread
From: Aymeric Vitte @ 2017-04-17 10:14 UTC (permalink / raw)
  To: bitcoin-dev

[-- Attachment #1: Type: text/plain, Size: 9474 bytes --]

While I fully agree with the intent (increasing full nodes so a big
miner waking up in a bad mood can't threaten the world any longer every
day as it is now) I am not sure to get the interest of this proposal,
because:

- it's probably not a good idea to encourage the home users to run full
nodes, there are many people running servers far from their capacity
that could easily run efficient full nodes

- if someone can't allocate 100 GB today to run a full node, then we
can't expect him to allocate more in the future

- the download time is a real concern

- this proposal is a kind of reinventing torrents, while limiting the
number of connections to something not efficient at all, I don't see why
something that is proven to be super efficient (torrents) would be
needed to be reinvented, I am not saying that it should be used as the
bittorrent network is doing but the concepts can be reused

- I don't get at all the concept of "archival" nodes since it's another
useless step toward centralization

I think the only way to increase full nodes it to design an incentive
for people to run them


Le 17/04/2017 à 08:54, David Vorick via bitcoin-dev a écrit :
> *Rationale:*
>
> A node that stores the full blockchain (I will use the term archival
> node) requires over 100GB of disk space, which I believe is one of the
> most significant barriers to more people running full nodes. And I
> believe the ecosystem would benefit substantially if more users were
> running full nodes.
>
> The best alternative today to storing the full blockchain is to run a
> pruned node, which keeps only the UTXO set and throws away already
> verified blocks. The operator of the pruned node is able to enjoy the
> full security benefits of a full node, but is essentially leeching the
> network, as they performed a large download likely without
> contributing anything back.
>
> This puts more pressure on the archival nodes, as the archival nodes
> need to pick up the slack and help new nodes bootstrap to the network.
> As the pressure on archival nodes grows, fewer people will be able to
> actually run archival nodes, and the situation will degrade. The
> situation would likely become problematic quickly if bitcoin-core were
> to ship with the defaults set to a pruned node.
>
> Even further, the people most likely to care about saving 100GB of
> disk space are also the people least likely to care about some extra
> bandwidth usage. For datacenter nodes, and for nodes doing lots of
> bandwidth, the bandwidth is usually the biggest cost of running the
> node. For home users however, as long as they stay under their
> bandwidth cap, the bandwidth is actually free. Ideally, new nodes
> would be able to bootstrap from nodes that do not have to pay for
> their bandwidth, instead of needing to rely on a decreasing percentage
> of heavy-duty archival nodes.
>
> I have (perhaps incorrectly) identified disk space consumption as the
> most significant factor in your average user choosing to run a pruned
> node or a lite client instead of a full node. The average user is not
> typically too worried about bandwidth, and is also not typically too
> worried about initial blockchain download time. But the 100GB hit to
> your disk space can be a huge psychological factor, especially if your
> hard drive only has 500GB available in the first place, and 250+ GB is
> already consumed by other files you have.
>
> I believe that improving the disk usage situation would greatly
> benefit decentralization, especially if it could be done without
> putting pressure on archival nodes.
>
> *Small Nodes Proposal:*
>
> I propose an alternative to the pruned node that does not put undue
> pressure on archival nodes, and would be acceptable and non-risky to
> ship as a default in bitcoin-core. For lack of a better name, I'll
> call this new type of node a 'small node'. The intention is that
> bitcoin-core would eventually ship 'small nodes' by default, such that
> the expected amount of disk consumption drops from today's 100+ GB to
> less than 30 GB.
>
> My alternative proposal has the following properties:
>
> + Full nodes only need to store ~20% of the blockchain
> + With very high probability, a new node will be able to recover the
> entire blockchain by connecting to 6 random small node peers.
> + An attacker that can eliminate a chosen+ 95% of the full nodes
> running today will be unable to prevent new nodes from downloading the
> full blockchain, even if the attacker is also able to eliminate all
> archival nodes. (assuming all nodes today were small nodes instead of
> archival nodes)
>
> Method:
>
> A small node will pick an index [5, 256). This index is that node's
> permanent index. When storing a block, instead of storing the full
> block, the node will use Reed-Solomon coding to erasure code the block
> using a 5-of-256 scheme. The result will be 256 pieces that are 20% of
> the size of the block each. The node picks the piece that corresponds
> to its index, and stores that instead. (Indexes 0-4 are reserved for
> archival nodes - explained later)
>
> The node is now storing a fragment of every block. Alone, this
> fragment cannot be used to recover any piece of the blockchain.
> However, when paired with any 5 unique fragments (fragments of the
> same index will not be unique), the full block can be recovered.
>
> Nodes can optionally store more than 1 fragment each. At 5 fragments,
> the node becomes a full archival node, and the chosen indexes should
> be 0-4. This is advantageous for the archival node as the encoded data
> for the first 5 indexes will actually be identical to the block itself
> - there is no computational overhead for selecting the first indexes.
> There is also no need to choose random indexes, because the full block
> can be recovered no matter which indexes are chosen.
>
> When connecting to new peers, the indexes of each peer needs to be
> known. Once peers totaling 5 unique indexes are discovered, blockchain
> download can begin. Connecting to just 5 small node peers provides a
> >95% chance of getting 5 uniques, with exponentially improving odds of
> success as you connect to more peers. Connecting to a single archive
> node guarantees that any gaps can be filled.
>
> A good encoder should be able to turn a block into a 5-of-256 piece
> set in under 10 milliseconds using a single core on a standard
> consumer desktop. This should not slow down initial blockchain
> download substantially, though the overhead is more than a rounding error.
>
> *DoS Prevention:*
>
> A malicious node may provide garbage data instead of the actual piece.
> Given just the garbage data and 4 other correct pieces, it is
> impossible (best I know anyway) to tell which piece is the garbage piece.
>
> One option in this case would be to seek out an archival node that
> could verify the correctness of the pieces, and identify the malicious
> node.
>
> Another option would be to have the small nodes store a cryptographic
> checksum of each piece. Obtaining the cryptographic checksum for all
> 256 pieces would incur a nontrivial amount of hashing (post segwit, as
> much as 100MB of extra hashing per block), and would require an
> additional ~4kb of storage per block. The hashing overhead here may be
> prohibitive.
>
> Another solution would be to find additional pieces and brute-force
> combinations of 5 until a working combination was discovered. Though
> this sounds nasty, it should take less than five seconds of
> computation to find the working combination given 5 correct pieces and
> 2 incorrect pieces. This computation only needs to be performed once
> to identify the malicious peers.
>
> I also believe that alternative erasure coding schemes exist which
> actually are able to identify the bad pieces given sufficient good
> pieces, however I don't know if they have the same computational
> performance as the best Reed-Solomon coding implementations.
>
> *Deployment:*
>
> Small nodes are completely useless unless the critical mass of 5
> pieces can be obtained. The first version that supports small node
> block downloads should default everyone to an archival node (meaning
> indexes 0-4 are used)
>
> Once there are enough small-node-enabled archive nodes, the default
> can be switched so that nodes only have a single index by default. In
> the first few days, when there are only a few small nodes, the
> previously-deployed archival nodes can help fill in the gaps, and the
> small nodes can be useful for blockchain download right away.
>
> ----------------------------------
>
> This represents a non-trivial amount of code, but I believe that the
> result would be a non-trivial increase in the percentage of users
> running full nodes, and a healthier overall network.
>
>
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists•linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev

-- 
Zcash wallets made simple: https://github.com/Ayms/zcash-wallets
Bitcoin wallets made simple: https://github.com/Ayms/bitcoin-wallets
Get the torrent dynamic blocklist: http://peersm.com/getblocklist
Check the 10 M passwords list: http://peersm.com/findmyass
Anti-spies and private torrents, dynamic blocklist: http://torrent-live.org
Peersm : http://www.peersm.com
torrent-live: https://github.com/Ayms/torrent-live
node-Tor : https://www.github.com/Ayms/node-Tor
GitHub : https://www.github.com/Ayms


[-- Attachment #2: Type: text/html, Size: 13275 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [bitcoin-dev] Small Nodes: A Better Alternative to Pruned Nodes
  2017-04-17  6:54 [bitcoin-dev] Small Nodes: A Better Alternative to Pruned Nodes David Vorick
  2017-04-17  7:11 ` Danny Thorpe
  2017-04-17 10:14 ` Aymeric Vitte
@ 2017-04-18  7:43 ` Jonas Schnelli
  2017-04-18 10:50 ` Tom Zander
  2017-04-21 20:38 ` Gregory Maxwell
  4 siblings, 0 replies; 24+ messages in thread
From: Jonas Schnelli @ 2017-04-18  7:43 UTC (permalink / raw)
  To: David Vorick; +Cc: Bitcoin Dev


[-- Attachment #1.1: Type: text/plain, Size: 3926 bytes --]

Hi Dave

> A node that stores the full blockchain (I will use the term archival node) requires over 100GB of disk space, which I believe is one of the most significant barriers to more people running full nodes. And I believe the ecosystem would benefit substantially if more users were running full nodes.


Thanks for your proposal.

I agree that 100GB of data may be cumbersome for some systems, especially if you target end user systems (Laptops/Desktops). Though, in my opinion, for those systems, CPU consumption is the biggest UX blocker.
Bootstrapping a full node on a decent consumer system with default parameters takes days, and, during this period, you probably run at full CPU capacity and you will be disturbed by constant fan noise. Standard tasks may be impossible because your system will be slowed down to a point where even word processing may get difficult.
This is because Core (with its default settings) is made to sync as fast as possible.

Once you have verified the chain and you reach the chain tip, indeed, it will be much better (until you shutdown for a couple of days/hours and have to re-sync/catch-up).

1. I agree that we need to have a way for pruned nodes to partially serve historical blocks.
My personal measurements told me that around ~80% of historical block serving are between tip and -1’000 blocks.
Currently, Core nodes have only two modes of operations, „server all historical blocks“ or „none“.
This makes little sense especially if you prune to a target size of, lets say, 80GB (~80% of the chain).
Ideally, there would be a mode where your full node can signal a third mode „I keep the last 1000 blocks“ (or make this more dynamic).

2. Bootstrapping new peers
I’m not sure if full nodes must be the single point of historical data storage. Full nodes provide a valuable service (verification, relay, filtering, etc.). I’m not sure if serving historical blocks is one of them. Historical blocks could be made available on CDN’s or other file storage networks. You are going to verify them anyways,... the serving part is pure data storage.
I’m also pretty sure that some users have stopping running full nodes because their upstream bandwidth consumption (because of serving historical blocks) was getting intolerable.
Especially „consumer“ peers must have been hit by this (little experience in how to reduce traffic, upstream in general is bad for consumers-connections, little resources in general).

Having a second option built into full nodes (or as an external bootstrap service/app) how to download historical blocks during bootstrapping could probably be a relieve for "small nodes“.
It could be a little daemon that downloads historical blocks from CDN’s, etc. and feeds them into your full node over p2p/8333 and kickstarts your bootstrapping without bothering valuable peers.
Or, the alternative download, could be built into the full nodes main logic.
And, if it wasn’t obvious, this must not bypass the verification!

I’m also aware of the downsides of this. This can eventually reduce decentralisation of the storage of historical bitcoin blockchain data and – eventually – increase the upstream bandwidth of peers willing to serve historical blocks (especially in a transition phase to a second „download“-option).
Maybe it’s a tradeoff between reducing decentralisation by killing low resource nodes because serving historical blocks is getting too resource-intense _or_ reducing decentralisation by moving some percentage of the historical data storage away from the bitcoin p2p network.
The later seems more promising to me.


To your proposal:
- Isn’t there a tiny finger-printing element if peers have to pick an segmentation index?
- SPV bloom filter clients can’t use fragmented blocks to filter txns? Right? How could they avoid connecting to those peers?

</jonas>

[-- Attachment #1.2: Type: text/html, Size: 4726 bytes --]

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [bitcoin-dev] Small Nodes: A Better Alternative to Pruned Nodes
  2017-04-17  6:54 [bitcoin-dev] Small Nodes: A Better Alternative to Pruned Nodes David Vorick
                   ` (2 preceding siblings ...)
  2017-04-18  7:43 ` Jonas Schnelli
@ 2017-04-18 10:50 ` Tom Zander
  2017-04-18 13:07   ` Tier Nolan
  2017-04-21 20:38 ` Gregory Maxwell
  4 siblings, 1 reply; 24+ messages in thread
From: Tom Zander @ 2017-04-18 10:50 UTC (permalink / raw)
  To: bitcoin-dev, David Vorick

On Monday, 17 April 2017 08:54:49 CEST David Vorick via bitcoin-dev wrote:
> The best alternative today to storing the full blockchain is to run a
> pruned node

The idea looks a little overly complex to me.

I suggested something similar which is a much simpler version;
https://zander.github.io/scaling/Pruning/

> # Random pruning mode
> 
> There is a large gap between the two current modes of everything
> (currently 75GB) and only what we need (2GB or so).
> 
> This mode would have two areas, it would keep a days worth of blocks to
> make sure that any reorgs etc would not cause a re-download, but it would
> have additionally have an area that can be used to store historical data
> to be shared on the network. Maybe 20 or 50GB.
> 
> One main feature of Bitcoin is that we have massive replication. Each node
> currently holds all the same data that every other node holds. But this
> doesn't have to be the case with pruned nodes. A node itself has no need
> for historic data at all.
> 
> The suggestion is that a node stores a random set of blocks. Dropping
> random blocks as the node runs out of disk-space. Additionally, we would
> introduce a new way to download blocks from other nodes which allows the
> node to say it doesn't actually have the block requested.
> 
> The effect of this setup is that many different nodes together end up
> having the total amount of blocks, even though each node only has a
> fraction of the total amount.

-- 
Tom Zander
Blog: https://zander.github.io
Vlog: https://vimeo.com/channels/tomscryptochannel


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [bitcoin-dev] Small Nodes: A Better Alternative to Pruned Nodes
  2017-04-18 10:50 ` Tom Zander
@ 2017-04-18 13:07   ` Tier Nolan
  2017-04-18 23:19     ` Aymeric Vitte
  0 siblings, 1 reply; 24+ messages in thread
From: Tier Nolan @ 2017-04-18 13:07 UTC (permalink / raw)
  Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 623 bytes --]

This has been discussed before.

https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2015-May/008101.html

including a list of nice to have features by Maxwell

https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2015-May/008110.html

You meet most of these rules, though you do have to download blocks from
multiple peers.

The suggestion in that thread were for a way to compactly indicate which
blocks a node has.  Each node would then store a sub-set of all the
blocks.  You just download the blocks you want from the node that has them.

Each node would be recommended to store the last few days worth anyway.

[-- Attachment #2: Type: text/html, Size: 946 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [bitcoin-dev] Small Nodes: A Better Alternative to Pruned Nodes
  2017-04-18 13:07   ` Tier Nolan
@ 2017-04-18 23:19     ` Aymeric Vitte
  2017-04-19  4:28       ` udevNull
  0 siblings, 1 reply; 24+ messages in thread
From: Aymeric Vitte @ 2017-04-18 23:19 UTC (permalink / raw)
  To: bitcoin-dev

[-- Attachment #1: Type: text/plain, Size: 2842 bytes --]

From the initial post " The situation would likely become problematic
quickly if bitcoin-core were to ship with the defaults set to a pruned
node."

Sorry to be straight, I read the (painful) thread below, and most of
what is in there is inept, wrong, obsolete... or biased, cf the first
sentence above, if the idea is to invent a workaround to the fact that
pruning might/will become the default or might/will be set by the users
as the default so full nodes might/will disappear then just say it
clearly instead of proposing this kind of non-solution as a solution to
secure the blockchain

I can't believe this is serious, people now are supposed to prune but
will be forced to host a part of the blockchain, how do you expect this
to work, why people would do this? Knowing that to start pruning they
need a full node, then since we are there, why not continuing with a
full node... but indeed, why should they continue with a full node, and
therefore why should they accept to host a part of the blockchain if
they decline the first proposal?

This is absurd, you are not addressing the first priority given the
context which is to quickly increase the full nodes and which obviously
includes an incentive for people to run them

It gives also the feeling that bitcoin wants to reinvent everything not
capitalizing on/knowing what exists, sorry again but the concepts of the
proposal and others like archival nodes are just funny


Le 18/04/2017 à 15:07, Tier Nolan via bitcoin-dev a écrit :
> This has been discussed before.
>
> https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2015-May/008101.html
>
> including a list of nice to have features by Maxwell
>
> https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2015-May/008110.html
>
> You meet most of these rules, though you do have to download blocks
> from multiple peers.
>
> The suggestion in that thread were for a way to compactly indicate
> which blocks a node has.  Each node would then store a sub-set of all
> the blocks.  You just download the blocks you want from the node that
> has them.
>
> Each node would be recommended to store the last few days worth anyway.
>
>
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists•linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev

-- 
Zcash wallets made simple: https://github.com/Ayms/zcash-wallets
Bitcoin wallets made simple: https://github.com/Ayms/bitcoin-wallets
Get the torrent dynamic blocklist: http://peersm.com/getblocklist
Check the 10 M passwords list: http://peersm.com/findmyass
Anti-spies and private torrents, dynamic blocklist: http://torrent-live.org
Peersm : http://www.peersm.com
torrent-live: https://github.com/Ayms/torrent-live
node-Tor : https://www.github.com/Ayms/node-Tor
GitHub : https://www.github.com/Ayms


[-- Attachment #2: Type: text/html, Size: 5238 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [bitcoin-dev] Small Nodes: A Better Alternative to Pruned Nodes
  2017-04-18 23:19     ` Aymeric Vitte
@ 2017-04-19  4:28       ` udevNull
  2017-04-19 13:47         ` Angel Leon
  0 siblings, 1 reply; 24+ messages in thread
From: udevNull @ 2017-04-19  4:28 UTC (permalink / raw)
  To: bitcoin-dev

[-- Attachment #1: Type: text/plain, Size: 1839 bytes --]

I'd like to add to this. There is definitely a barrier of entry with regards to setting up a full node. Unless you're living in a first world country, the bandwidth requirements alone, will outright prevent you from even setting up a full node (sync since genesis).

To maintain that also becomes a sunk cost, as there is no financial incentive to run a node, only an idealogical one. Most of the people who benefit and will benefit from Bitcoin, are the un-banked. Which you will find in 3rd world countries, that don't have ISPs that provide the data packages, to cater for the requirements of running a full node. I'm sure many would like to, but simply cannot afford it.

A user may not want to run a node at home, but rather on a digital ocean or AWS server, which they cannot afford to do either considering the bandwidth and storage costs associated with it. However, I don't think they should be excluded from participating in the network (supporting proposals, voicing their opinions, running their own wallets, writing their own applications on top of Bitcoin [which I think is extremely important]).

So I would definitely be in favour of a small node of sorts. It will present us with some interesting technical challenges along the way but it's definitely worth while looking into.

Financially incentivising nodes is a really weird area because it would allow someone to essentially automate the deployment of nodes. i.e. if a node can pay for itself 100% (even at a lesser value, it just becomes cheaper overall), you could write an application that uses an AWS API or a digital ocean API to automatically deploy 100's of nodes. Which sounds great but not if that person is malicious and wants to prevent the community adopting proposals.

Just my 2 cents worth.

Sent with [ProtonMail](https://protonmail.com) Secure Email.

[-- Attachment #2: Type: text/html, Size: 2275 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [bitcoin-dev] Small Nodes: A Better Alternative to Pruned Nodes
  2017-04-19  4:28       ` udevNull
@ 2017-04-19 13:47         ` Angel Leon
  0 siblings, 0 replies; 24+ messages in thread
From: Angel Leon @ 2017-04-19 13:47 UTC (permalink / raw)
  To: udevNull, bitcoin-dev

[-- Attachment #1: Type: text/plain, Size: 3010 bytes --]

>Financially incentivising nodes is a really weird area because it would
allow someone to essentially automate the deployment of nodes. i.e. if a
node can pay for itself 100% (even at a lesser value, it just becomes
cheaper overall), you could write an application that uses an AWS API or a
digital ocean API to automatically deploy 100's of nodes. Which sounds
great but not if that person is malicious and wants to prevent the
community adopting proposals.

what other projects have done to avoid such attacks (while incentivizing
economically running full nodes) is to only distribute part of the block
rewards back such nodes if that node has committed/frozen a predetermined
amount of coins that can't be spent. This also leaves less liquidity for
market speculation and a incentives for long term commitments.

On Wed, Apr 19, 2017 at 5:14 AM udevNull via bitcoin-dev <
bitcoin-dev@lists•linuxfoundation.org> wrote:

> I'd like to add to this. There is definitely a barrier of entry with
> regards to setting up a full node. Unless you're living in a first world
> country, the bandwidth requirements alone, will outright prevent you from
> even setting up a full node (sync since genesis).
>
> To maintain that also becomes a sunk cost, as there is no financial
> incentive to run a node, only an idealogical one. Most of the people who
> benefit and will benefit from Bitcoin, are the un-banked. Which you will
> find in 3rd world countries, that don't have ISPs that provide the data
> packages, to cater for the requirements of running a full node. I'm sure
> many would like to, but simply cannot afford it.
>
> A user may not want to run a node at home, but rather on a digital ocean
> or AWS server, which they cannot afford to do either considering the
> bandwidth and storage costs associated with it. However, I don't think they
> should be excluded from participating in the network (supporting proposals,
> voicing their opinions, running their own wallets, writing their own
> applications on top of Bitcoin [which I think is extremely important]).
>
> So I would definitely be in favour of a small node of sorts. It will
> present us with some interesting technical challenges along the way but
> it's definitely worth while looking into.
>
> Financially incentivising nodes is a really weird area because it would
> allow someone to essentially automate the deployment of nodes. i.e. if a
> node can pay for itself 100% (even at a lesser value, it just becomes
> cheaper overall), you could write an application that uses an AWS API or a
> digital ocean API to automatically deploy 100's of nodes. Which sounds
> great but not if that person is malicious and wants to prevent the
> community adopting proposals.
> Just my 2 cents worth.
>
>
> Sent with ProtonMail <https://protonmail.com> Secure Email.
>
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists•linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>

[-- Attachment #2: Type: text/html, Size: 4043 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [bitcoin-dev] Small Nodes: A Better Alternative to Pruned Nodes
  2017-04-17 10:14 ` Aymeric Vitte
@ 2017-04-19 17:30   ` David Vorick
  2017-04-20  9:46     ` Tom Zander
  2017-04-20 11:27     ` Aymeric Vitte
  0 siblings, 2 replies; 24+ messages in thread
From: David Vorick @ 2017-04-19 17:30 UTC (permalink / raw)
  Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 13843 bytes --]

On Tue, Apr 18, 2017 at 3:43 AM, Jonas Schnelli <dev@jonasschnelli•ch>
wrote:

>
> Hi Dave
>
> *1. I agree that we need to have a way for pruned nodes to partially serve
> historical blocks.*
> My personal measurements told me that around ~80% of historical block
> serving are between tip and -1’000 blocks.
> Currently, Core nodes have only two modes of operations, „server all
> historical blocks“ or „none“.
> This makes little sense especially if you prune to a target size of, lets
> say, 80GB (~80% of the chain).
> Ideally, there would be a mode where your full node can signal a third
> mode „I keep the last 1000 blocks“ (or make this more dynamic).
>

That probably makes sense with small nodes too. The past 1000 blocks are
such a small footprint compared to the rest of the chain.


>
> *2. Bootstrapping new peers*
> I’m not sure if full nodes must be the single point of historical data
> storage. Full nodes provide a valuable service (verification, relay,
> filtering, etc.). I’m not sure if serving historical blocks is one of them.
> Historical blocks could be made available on CDN’s or other file storage
> networks. You are going to verify them anyways,... the serving part is pure
> data storage.
> I’m also pretty sure that some users have stopping running full nodes
> because their upstream bandwidth consumption (because of serving historical
> blocks) was getting intolerable.
> Especially „consumer“ peers must have been hit by this (little experience
> in how to reduce traffic, upstream in general is bad for
> consumers-connections, little resources in general).
>

Perhaps it is not, but I would think that it would be pretty
straightforward to configure a global bandwidth limit within Bitcoin. I
know many torrent clients, and clients for protocols like Tor and i2p
include the ability to set both speed limits and monthly bandwidth limits.
Shipping core with sane default limits is probably sufficient to solve
bandwidth issues for most users. I don't know if default limits may result
in today's archive nodes pulling less weight though - changing the defaults
to have limits may slow the network down as a whole.

In my experience (living in a city where most people have uncapped
connections), disk usage is usually the bigger issue, but certainly
bandwidth is a known problem (especially for rural users) as well.


>
> Having a second option built into full nodes (or as an external bootstrap
> service/app) how to download historical blocks during bootstrapping could
> probably be a relieve for "small nodes“.
> It could be a little daemon that downloads historical blocks from CDN’s,
> etc. and feeds them into your full node over p2p/8333 and kickstarts your
> bootstrapping without bothering valuable peers.
> Or, the alternative download, could be built into the full nodes main
> logic.
> And, if it wasn’t obvious, this must not bypass the verification!
>

I worry about any type of CDN being a central point of failure. CDNs cost
money, which means someone is footing the bill. Torrenting typically relies
on a DHT, which is much easier to attack than Bitcoin's peer network. It's
possible that a decentralized CDN could be used, but I don't think any yet
exist (though I am building one for unrelated reasons) which are both
sufficiently secure and incentive-compatible to be considered as an
alternative to using full nodes to bootstrap.

I just don't want to end up in a situation where 90% of users are getting
their blockchain from the same 3 websites or centralized services. And I
also don't want to rely on any p2p solution which would not stand up to a
serious adversary. Right now, I think the bitcoin p2p network is by
significant margin the best we've got. The landscape for decentralized data
distribution is evolving rapidly though, perhaps in a few years there will
be a superior alternative.


> *To your proposal:*
> - Isn’t there a tiny finger-printing element if peers have to pick an
> segmentation index?
> - SPV bloom filter clients can’t use fragmented blocks to filter txns?
> Right? How could they avoid connecting to those peers?
>
> </jonas>
>

Yes, there is finger-print that happens if you have nodes pick an index.
And the fingerprint gets a lot worse if you have a node pick multiple
indexes. Though, isn't it already required that nodes have some sort of IP
address or hidden service domain? I want to say that the fingerprint
created by picking an index is not a big deal, because it can be separated
from activity like transaction relaying and mining. Though, I am not
certain and perhaps it is a problem.

To be honest, I hadn't really considered SPV nodes at the time of writing.
Small nodes would still be seeing all of the new blocks, and per your
suggestion may also be storing the 1000 or so most recent blocks, but SPV
nodes would still need some way to find all of their historical
transactions. The problem is not fetching blocks, it's figuring out which
blocks are worth fetching. It may be sufficient to have nodes store a bloom
filter for each block indicating which addresses had activity. The bloom
filter would probably only need to be about 1% of the size of the full
block. That's not too much overhead (now you are storing 21% of the block
instead of just 20%), and would retain SPV compatibility.

On Mon, Apr 17, 2017 at 12:17 PM, praxeology_guy <
praxeology_guy@protonmail•com> wrote:

>
> FYI With unspent coin snapshots, needing archive data becomes less
> important.  People can synchronize from a later snapshot instead of the
> genesis block.  See https://petertodd.org/2016/delayed-txo-commitments
> for my current favorite idea.
>
>
This is something that I think could help a lot too. If the build processes
included verifying the chain and then creating a utxo snapshot at say,
block 400,000, then nodes would no longer need to download, store, upload,
or share blocks prior to that height. It means that a reorg deeper than
that point would hardfork the network. But a reorg 60k blocks deep is going
to cause problems worse than a network split. Then, the only people who
would ever need to care about the early blocks are developers, and it's
more reasonable to expect developers to go through a longer process and
have more resources than everyday users.

On Mon, Apr 17, 2017 at 6:14 AM, Aymeric Vitte via bitcoin-dev <
bitcoin-dev@lists•linuxfoundation.org> wrote:

> While I fully agree with the intent (increasing full nodes so a big miner
> waking up in a bad mood can't threaten the world any longer every day as it
> is now) I am not sure to get the interest of this proposal, because:
>
> - it's probably not a good idea to encourage the home users to run full
> nodes, there are many people running servers far from their capacity that
> could easily run efficient full nodes
>
Running a full node is the only way to avoid needing to trust others. It's
also how you make your opinion worthwhile for events like hard forks and
UASFs. If decentralization is the primary motivation, it absolutely makes
sense to encourage people to run their own full nodes. Without a full node,
you are at the mercy of the governance decisions by those who do have full
nodes. But if you have a full node, you can chose to opt-out of any upgrade
(example: ethereum classic nodes).


> - if someone can't allocate 100 GB today to run a full node, then we can't
> expect him to allocate more in the future
>
That's why I'm proposing something to decrease the storage requirements.


> - this proposal is a kind of reinventing torrents, while limiting the
> number of connections to something not efficient at all, I don't see why
> something that is proven to be super efficient (torrents) would be needed
> to be reinvented, I am not saying that it should be used as the bittorrent
> network is doing but the concepts can be reused
>
It's different from torrents in that it uses specialized erasure coding to
make sure that every block is always available, even if an adversary is
running around targeting all the nodes with a particular piece.


> - I don't get at all the concept of "archival" nodes since it's another
> useless step toward centralization
>
"archival" nodes are simply nodes with the full blockchain. Nobody can
bootstrap on the network without them. Today, every bitcoin-core node is an
archival node by default.

> I think the only way to increase full nodes it to design an incentive for
> people to run them
>
The primary incentive is the sovereignty that it gives you. Running a
Bitcoin full node gives you security and protection against political
garbage that you can't get any other way. The network does currently depend
on altruism to allow people to download the blockchain, but as long as we
can keep the resource requirements of this altruism low, I think we can
expect it to continue. This proposal attempts to keep those requirements
low.



On Tue, Apr 18, 2017 at 6:50 AM, Tom Zander <tomz@freedommail•ch> wrote:

> On Monday, 17 April 2017 08:54:49 CEST David Vorick via bitcoin-dev wrote:
> > The best alternative today to storing the full blockchain is to run a
> > pruned node
>
> The idea looks a little overly complex to me.
>
> I suggested something similar which is a much simpler version;
> https://zander.github.io/scaling/Pruning/
>
> > # Random pruning mode
> >
> > There is a large gap between the two current modes of everything
> > (currently 75GB) and only what we need (2GB or so).
> >
> > This mode would have two areas, it would keep a days worth of blocks to
> > make sure that any reorgs etc would not cause a re-download, but it would
> > have additionally have an area that can be used to store historical data
> > to be shared on the network. Maybe 20 or 50GB.
> >
> > One main feature of Bitcoin is that we have massive replication. Each
> node
> > currently holds all the same data that every other node holds. But this
> > doesn't have to be the case with pruned nodes. A node itself has no need
> > for historic data at all.
> >
> > The suggestion is that a node stores a random set of blocks. Dropping
> > random blocks as the node runs out of disk-space. Additionally, we would
> > introduce a new way to download blocks from other nodes which allows the
> > node to say it doesn't actually have the block requested.
> >
> > The effect of this setup is that many different nodes together end up
> > having the total amount of blocks, even though each node only has a
> > fraction of the total amount.
>
> --
> Tom Zander
> Blog: https://zander.github.io
> Vlog: https://vimeo.com/channels/tomscryptochannel
>

Your proposal has a significant disadvantage: If every peer is dropping 75%
of all blocks randomly, then you need to connect to a large number of peers
to download the whole blockchain. Any given peer has a 25% chance of
holding a block, or rather, a 75% chance of not holding a block. If you
have n peers, your probability of not being able to download a given block
is 0.75^n. If you are downloading 450,000 blocks, you will need to connect
to an expected 46 peers to download the whole blockchain.

Your proposal is also a lot less able to handle active adversaries: if
nodes are randomly dropping blocks, the probability that one block in
particular is dropped by everyone goes up significantly. And the problem
gets a lot worse in the presence of an active adversary. If there are 8000
nodes each dropping 75% of the blocks, then each block on average will only
be held by 2000 nodes. Probabilistically, some unlucky blocks will be held
by fewer than 2000 nodes. An active adversary needs only to eliminate about
2000 nodes (a chosen 2000 nodes) to knock a block off of the network. But
missing even a single block is a significant problem.

Your proposal essentially requires that archive nodes still exist and be a
part of a typical blockchain download. Given that, I don't think it would
be a sound idea to ship as a default in bitcoin core.



On Tue, Apr 18, 2017 at 9:07 AM, Tier Nolan via bitcoin-dev <
bitcoin-dev@lists•linuxfoundation.org> wrote:

> This has been discussed before.
>
> https://lists.linuxfoundation.org/pipermail/bitcoin-dev/
> 2015-May/008101.html
>
> including a list of nice to have features by Maxwell
>
> https://lists.linuxfoundation.org/pipermail/bitcoin-dev/
> 2015-May/008110.html
>
> You meet most of these rules, though you do have to download blocks from
> multiple peers.
>
> The suggestion in that thread were for a way to compactly indicate which
> blocks a node has.  Each node would then store a sub-set of all the
> blocks.  You just download the blocks you want from the node that has them.
>
> Each node would be recommended to store the last few days worth anyway.
>
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists•linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>
>
I believe that my proposal does meet all of the requirements listed by
Maxwell. Having a set of 8 random peers gives you a very high probability
of being able to recover every single block. You would need to connect to
at least 5 peers (and this is already >90% likely to be sufficient to
recover every block), but if you cannot connect to 5 random peers your node
is probably in trouble anyway. Highly parallel, high speed downloads are
just as possible with small nodes as with archive nodes. It only takes a
few bytes to indicate which part of the blockchain you have, and any 2
peers have a less than 1% chance of overlapping.

[-- Attachment #2: Type: text/html, Size: 18496 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [bitcoin-dev] Small Nodes: A Better Alternative to Pruned Nodes
  2017-04-19 17:30   ` David Vorick
@ 2017-04-20  9:46     ` Tom Zander
  2017-04-20 20:32       ` Andrew Poelstra
  2017-04-20 11:27     ` Aymeric Vitte
  1 sibling, 1 reply; 24+ messages in thread
From: Tom Zander @ 2017-04-20  9:46 UTC (permalink / raw)
  To: bitcoin-dev

On Wednesday, 19 April 2017 19:30:30 CEST David Vorick via bitcoin-dev 
wrote:
> > I suggested something similar which is a much simpler version;
> > https://zander.github.io/scaling/Pruning/

> Your proposal has a significant disadvantage: If every peer is dropping
> 75% of all blocks randomly, then you need to connect to a large number of
> peers to download the whole blockchain.
...
> If you are downloading 450,000 blocks, you will need to
> connect to an expected 46 peers to download the whole blockchain.

I don’t really see the problem here, even if your math is a off. (Statistics 
is difficult, I know). Connecting to many nodes to download faster is really 
not an issue and already happens.

> Your proposal is also a lot less able to handle active adversaries: if
> nodes are randomly dropping blocks, the probability that one block in
> particular is dropped by everyone goes up significantly. 

You make the assumption that this new mode of pruning will be used by 100% 
of the network, this is not how distributed systems work.

-- 
Tom Zander
Blog: https://zander.github.io
Vlog: https://vimeo.com/channels/tomscryptochannel


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [bitcoin-dev] Small Nodes: A Better Alternative to Pruned Nodes
  2017-04-19 17:30   ` David Vorick
  2017-04-20  9:46     ` Tom Zander
@ 2017-04-20 11:27     ` Aymeric Vitte
  1 sibling, 0 replies; 24+ messages in thread
From: Aymeric Vitte @ 2017-04-20 11:27 UTC (permalink / raw)
  To: David Vorick, Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 9434 bytes --]

Thanks but you did not answer all the points and some of your statements
look wrong, like the global ideas behind this proposal from my
standpoint, which basically is inventing strange things not reusing what
is already proven to be working well and could provide the same result,
which at the end is not the expected one, ie increasing full nodes, it
sounds like a strange workaround to prevent the centralization of the
blockchain when pruning will become the default

To answer some other comments in this thread, giving an incentive to run
full nodes does not mean that someone setting up tomorrow 10K nodes will
become rich and/or will be able to control the network, the later being
not unlikely at all to happen in the current situation, the idea is more
to motivate people that already have the resources to run full nodes,
then we mix the concepts of optimizing the resources at no additional
costs (and even decreasing costs since you get rewarded for the part
that you have already paid but don't use) with the one of running nodes
to protect its business

For example
https://gist.github.com/Ayms/aab6f8e08fef0792ab3448f542a826bf#proposal
is showing some concepts where nodes can't position themselves where
they like and are registered in the system by the others (but forget the
proof of something as written in this gist since I think the rewards
should not depend on usual miners) , so it becomes quite difficult that
they position themselves where they like to possibly get the rewards,
fake the system, freeride, cheat, collude in pools or setup plenty of nodes

Comments below


Le 19/04/2017 à 19:30, David Vorick via bitcoin-dev a écrit :
> On Tue, Apr 18, 2017 at 3:43 AM, Jonas Schnelli <dev@jonasschnelli•ch
> <mailto:dev@jonasschnelli•ch>> wrote:
>
>     I know many torrent clients, and clients for protocols like Tor
>     and i2p include the ability to set both speed limits and monthly
>     bandwidth limits.
>

Yes, that's the easy part, the issue is more for the network to check
that users have sufficient bandwidth and don't cheat

> I worry about any type of CDN being a central point of failure.

Of course

>  Torrenting typically relies on a DHT, which is much easier to attack
> than Bitcoin's peer network.

Then please explain how you would attack the bittorrent DHT and why it's
"much easier" than attacking the btc network today, bittorrent is not
designed for security/privacy, including its DHT, which btw is great,
it's a common sign of misinformation to conclude that all DHTs are
necessarily insecure

> I think the bitcoin p2p network is by significant margin the best
> we've got.

The btc network can't be considered as a p2p network in its current form
then can't be the best one for now (and if it was then we would not be
in today's situation)

>
> Yes, there is finger-print that happens if you have nodes pick an
> index. And the fingerprint gets a lot worse if you have a node pick
> multiple indexes.

This is another problem of your proposal, as well as
fingerprinting/tracking peers based on what they have

> Though, isn't it already required that nodes have some sort of IP
> address or hidden service domain? I want to say that the fingerprint
> created by picking an index is not a big deal, because it can be
> separated from activity like transaction relaying and mining. Though,
> I am not certain and perhaps it is a problem.

Are you suggesting that the btc "p2p" network should be using the Tor
network, and especially the nodes hosting the/(a part of) the
blockchain? This is of course a very bad idea and you would not
eliminate the tracking issue, a simple example is that despite ot the
size of the network it's not difficult to track the peers on the
bittorrent network, you might not know who is the peer but you can
follow whatever he is doing, and hidding behind Tor or a VPN does not
prevent this

>
>
>
> On Mon, Apr 17, 2017 at 6:14 AM, Aymeric Vitte via bitcoin-dev
> <bitcoin-dev@lists•linuxfoundation.org
> <mailto:bitcoin-dev@lists•linuxfoundation.org>> wrote:
>
>     While I fully agree with the intent (increasing full nodes so a
>     big miner waking up in a bad mood can't threaten the world any
>     longer every day as it is now) I am not sure to get the interest
>     of this proposal, because:
>
>     - it's probably not a good idea to encourage the home users to run
>     full nodes, there are many people running servers far from their
>     capacity that could easily run efficient full nodes
>
> Running a full node is the only way to avoid needing to trust others.
> It's also how you make your opinion worthwhile for events like hard
> forks and UASFs. If decentralization is the primary motivation, it
> absolutely makes sense to encourage people to run their own full
> nodes. Without a full node, you are at the mercy of the governance
> decisions by those who do have full nodes. But if you have a full
> node, you can chose to opt-out of any upgrade (example: ethereum
> classic nodes).

If you really know the Tor network, then you know why encouraging home
users to run full nodes is probably not a good idea

"Probably" because the situation is not the same for btc and indeed UASF
for example is referring to "users" who today are not really "users" but
intermediate nodes, so the decision finally is not made by the users

>  
>
>     - if someone can't allocate 100 GB today to run a full node, then
>     we can't expect him to allocate more in the future
>
> That's why I'm proposing something to decrease the storage requirements.

This is just delaying the problem, you are just proposing to store some
parts of the blockchain not explaining how the peers will first setup
the nodes, some parts that will of course increase... then falling back
in the issue that you are trying to address

>  
>
>     - this proposal is a kind of reinventing torrents, while limiting
>     the number of connections to something not efficient at all, I
>     don't see why something that is proven to be super efficient
>     (torrents) would be needed to be reinvented, I am not saying that
>     it should be used as the bittorrent network is doing but the
>     concepts can be reused
>
> It's different from torrents in that it uses specialized erasure
> coding to make sure that every block is always available, even if an
> adversary is running around targeting all the nodes with a particular
> piece.

You are reinventing something that would be achieved easily using the
concepts of torrents or incremental ones (ie someone would seed the
whole thing, some others some parts of it, etc)

>  
>
>     - I don't get at all the concept of "archival" nodes since it's
>     another useless step toward centralization
>
> "archival" nodes are simply nodes with the full blockchain. Nobody can
> bootstrap on the network without them. Today, every bitcoin-core node
> is an archival node by default.

What I meant is that you can't build a hierarchy of btc nodes: big
nodes, medium nodes, small nodes, each node is free to decide how/if it
participates to the network, so the wording of "archival" nodes for me
is not adapted since it makes immediately think to centralized entities,
big organizations hosting the blockchain, etc

>     I think the only way to increase full nodes it to design an
>     incentive for people to run them
>
> The primary incentive is the sovereignty that it gives you. Running a
> Bitcoin full node gives you security and protection against political
> garbage that you can't get any other way. The network does currently
> depend on altruism to allow people to download the blockchain, but as
> long as we can keep the resource requirements of this altruism low, I
> think we can expect it to continue. This proposal attempts to keep
> those requirements low.

This is the usual answer but I don't believe it, people will rely on
others to run full nodes and secure them, and so on...

>
>
>
> I believe that my proposal does meet all of the requirements listed by
> Maxwell.

I did noy read them again, some others are listed in the link above

> Having a set of 8 random peers gives you a very high probability of
> being able to recover every single block. You would need to connect to
> at least 5 peers (and this is already >90% likely to be sufficient to
> recover every block), but if you cannot connect to 5 random peers your
> node is probably in trouble anyway. Highly parallel, high speed
> downloads are just as possible with small nodes as with archive nodes.
> It only takes a few bytes to indicate which part of the blockchain you
> have, and any 2 peers have a less than 1% chance of overlapping.

Again, you don't explain how you bootstrap the full nodes first, which
is the main issue, and if the idea is that the pruning nodes will never
desync then you should try downloading x GB to resync connecting to 5/8
peers possibly operating from home in different countries

-- 
Zcash wallets made simple: https://github.com/Ayms/zcash-wallets
Bitcoin wallets made simple: https://github.com/Ayms/bitcoin-wallets
Get the torrent dynamic blocklist: http://peersm.com/getblocklist
Check the 10 M passwords list: http://peersm.com/findmyass
Anti-spies and private torrents, dynamic blocklist: http://torrent-live.org
Peersm : http://www.peersm.com
torrent-live: https://github.com/Ayms/torrent-live
node-Tor : https://www.github.com/Ayms/node-Tor
GitHub : https://www.github.com/Ayms


[-- Attachment #2: Type: text/html, Size: 19075 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [bitcoin-dev] Small Nodes: A Better Alternative to Pruned Nodes
  2017-04-17  7:11 ` Danny Thorpe
  2017-04-17  7:27   ` David Vorick
@ 2017-04-20 15:50   ` Erik Aronesty
  2017-04-20 23:42     ` Aymeric Vitte
  2017-04-21 13:35   ` David Kaufman
  2 siblings, 1 reply; 24+ messages in thread
From: Erik Aronesty @ 2017-04-20 15:50 UTC (permalink / raw)
  To: Danny Thorpe; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 8628 bytes --]

Try to find 1TB dedicated server hosting ...

If you want to set up an ecommerce site somewhere besides your living room,
storage costs are still a concern.

On Mon, Apr 17, 2017 at 3:11 AM, Danny Thorpe via bitcoin-dev <
bitcoin-dev@lists•linuxfoundation.org> wrote:

> 1TB HDD is now available for under $40 USD.  How is the 100GB storage
> requirement preventing anyone from setting up full nodes?
>
> On Apr 16, 2017 11:55 PM, "David Vorick via bitcoin-dev" <
> bitcoin-dev@lists•linuxfoundation.org> wrote:
>
>> *Rationale:*
>>
>> A node that stores the full blockchain (I will use the term archival
>> node) requires over 100GB of disk space, which I believe is one of the most
>> significant barriers to more people running full nodes. And I believe the
>> ecosystem would benefit substantially if more users were running full nodes.
>>
>> The best alternative today to storing the full blockchain is to run a
>> pruned node, which keeps only the UTXO set and throws away already verified
>> blocks. The operator of the pruned node is able to enjoy the full security
>> benefits of a full node, but is essentially leeching the network, as they
>> performed a large download likely without contributing anything back.
>>
>> This puts more pressure on the archival nodes, as the archival nodes need
>> to pick up the slack and help new nodes bootstrap to the network. As the
>> pressure on archival nodes grows, fewer people will be able to actually run
>> archival nodes, and the situation will degrade. The situation would likely
>> become problematic quickly if bitcoin-core were to ship with the defaults
>> set to a pruned node.
>>
>> Even further, the people most likely to care about saving 100GB of disk
>> space are also the people least likely to care about some extra bandwidth
>> usage. For datacenter nodes, and for nodes doing lots of bandwidth, the
>> bandwidth is usually the biggest cost of running the node. For home users
>> however, as long as they stay under their bandwidth cap, the bandwidth is
>> actually free. Ideally, new nodes would be able to bootstrap from nodes
>> that do not have to pay for their bandwidth, instead of needing to rely on
>> a decreasing percentage of heavy-duty archival nodes.
>>
>> I have (perhaps incorrectly) identified disk space consumption as the
>> most significant factor in your average user choosing to run a pruned node
>> or a lite client instead of a full node. The average user is not typically
>> too worried about bandwidth, and is also not typically too worried about
>> initial blockchain download time. But the 100GB hit to your disk space can
>> be a huge psychological factor, especially if your hard drive only has
>> 500GB available in the first place, and 250+ GB is already consumed by
>> other files you have.
>>
>> I believe that improving the disk usage situation would greatly benefit
>> decentralization, especially if it could be done without putting pressure
>> on archival nodes.
>>
>> *Small Nodes Proposal:*
>>
>> I propose an alternative to the pruned node that does not put undue
>> pressure on archival nodes, and would be acceptable and non-risky to ship
>> as a default in bitcoin-core. For lack of a better name, I'll call this new
>> type of node a 'small node'. The intention is that bitcoin-core would
>> eventually ship 'small nodes' by default, such that the expected amount of
>> disk consumption drops from today's 100+ GB to less than 30 GB.
>>
>> My alternative proposal has the following properties:
>>
>> + Full nodes only need to store ~20% of the blockchain
>> + With very high probability, a new node will be able to recover the
>> entire blockchain by connecting to 6 random small node peers.
>> + An attacker that can eliminate a chosen+ 95% of the full nodes running
>> today will be unable to prevent new nodes from downloading the full
>> blockchain, even if the attacker is also able to eliminate all archival
>> nodes. (assuming all nodes today were small nodes instead of archival nodes)
>>
>> Method:
>>
>> A small node will pick an index [5, 256). This index is that node's
>> permanent index. When storing a block, instead of storing the full block,
>> the node will use Reed-Solomon coding to erasure code the block using a
>> 5-of-256 scheme. The result will be 256 pieces that are 20% of the size of
>> the block each. The node picks the piece that corresponds to its index, and
>> stores that instead. (Indexes 0-4 are reserved for archival nodes -
>> explained later)
>>
>> The node is now storing a fragment of every block. Alone, this fragment
>> cannot be used to recover any piece of the blockchain. However, when paired
>> with any 5 unique fragments (fragments of the same index will not be
>> unique), the full block can be recovered.
>>
>> Nodes can optionally store more than 1 fragment each. At 5 fragments, the
>> node becomes a full archival node, and the chosen indexes should be 0-4.
>> This is advantageous for the archival node as the encoded data for the
>> first 5 indexes will actually be identical to the block itself - there is
>> no computational overhead for selecting the first indexes. There is also no
>> need to choose random indexes, because the full block can be recovered no
>> matter which indexes are chosen.
>>
>> When connecting to new peers, the indexes of each peer needs to be known.
>> Once peers totaling 5 unique indexes are discovered, blockchain download
>> can begin. Connecting to just 5 small node peers provides a >95% chance of
>> getting 5 uniques, with exponentially improving odds of success as you
>> connect to more peers. Connecting to a single archive node guarantees that
>> any gaps can be filled.
>>
>> A good encoder should be able to turn a block into a 5-of-256 piece set
>> in under 10 milliseconds using a single core on a standard consumer
>> desktop. This should not slow down initial blockchain download
>> substantially, though the overhead is more than a rounding error.
>>
>> *DoS Prevention:*
>>
>> A malicious node may provide garbage data instead of the actual piece.
>> Given just the garbage data and 4 other correct pieces, it is impossible
>> (best I know anyway) to tell which piece is the garbage piece.
>>
>> One option in this case would be to seek out an archival node that could
>> verify the correctness of the pieces, and identify the malicious node.
>>
>> Another option would be to have the small nodes store a cryptographic
>> checksum of each piece. Obtaining the cryptographic checksum for all 256
>> pieces would incur a nontrivial amount of hashing (post segwit, as much as
>> 100MB of extra hashing per block), and would require an additional ~4kb of
>> storage per block. The hashing overhead here may be prohibitive.
>>
>> Another solution would be to find additional pieces and brute-force
>> combinations of 5 until a working combination was discovered. Though this
>> sounds nasty, it should take less than five seconds of computation to find
>> the working combination given 5 correct pieces and 2 incorrect pieces. This
>> computation only needs to be performed once to identify the malicious peers.
>>
>> I also believe that alternative erasure coding schemes exist which
>> actually are able to identify the bad pieces given sufficient good pieces,
>> however I don't know if they have the same computational performance as the
>> best Reed-Solomon coding implementations.
>>
>> *Deployment:*
>>
>> Small nodes are completely useless unless the critical mass of 5 pieces
>> can be obtained. The first version that supports small node block downloads
>> should default everyone to an archival node (meaning indexes 0-4 are used)
>>
>> Once there are enough small-node-enabled archive nodes, the default can
>> be switched so that nodes only have a single index by default. In the first
>> few days, when there are only a few small nodes, the previously-deployed
>> archival nodes can help fill in the gaps, and the small nodes can be useful
>> for blockchain download right away.
>>
>> ----------------------------------
>>
>> This represents a non-trivial amount of code, but I believe that the
>> result would be a non-trivial increase in the percentage of users running
>> full nodes, and a healthier overall network.
>>
>> _______________________________________________
>> bitcoin-dev mailing list
>> bitcoin-dev@lists•linuxfoundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>>
>>
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists•linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>
>

[-- Attachment #2: Type: text/html, Size: 9997 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [bitcoin-dev] Small Nodes: A Better Alternative to Pruned Nodes
  2017-04-20  9:46     ` Tom Zander
@ 2017-04-20 20:32       ` Andrew Poelstra
  2017-04-21  8:27         ` Tom Zander
  0 siblings, 1 reply; 24+ messages in thread
From: Andrew Poelstra @ 2017-04-20 20:32 UTC (permalink / raw)
  To: Tom Zander; +Cc: bitcoin-dev

[-- Attachment #1: Type: text/plain, Size: 2433 bytes --]

On Thu, Apr 20, 2017 at 11:46:33AM +0200, Tom Zander via bitcoin-dev wrote:
> On Wednesday, 19 April 2017 19:30:30 CEST David Vorick via bitcoin-dev 
> wrote:
> > > I suggested something similar which is a much simpler version;
> > > https://zander.github.io/scaling/Pruning/
> 
> > Your proposal has a significant disadvantage: If every peer is dropping
> > 75% of all blocks randomly, then you need to connect to a large number of
> > peers to download the whole blockchain.
> ...
> > If you are downloading 450,000 blocks, you will need to
> > connect to an expected 46 peers to download the whole blockchain.
> 
> I don’t really see the problem here, even if your math is a off. (Statistics 
> is difficult, I know). Connecting to many nodes to download faster is really 
> not an issue and already happens.
>

I think the expected number of peers is actually ~47.75, which is pretty
close to David's estimate, which was wrong in a way that was actually
more favorable to the "everyone stores random blocks" scheme than the
truth.

Even assuming no archival nodes, and all nodes storing only one random
index between 5 and 255 inclusive, the chance of five arbitrary nodes
giving unique indices by chance is about 98.4%. To get the same probability
from a scheme where each peer has only 25% of the blocks, you need to
connect to 59.59 nodes.

This is over a ten-times increase in the number of nodes required to
download the entire chain, and requires participating nodes to use 25%
more space than David's proposal.

> > Your proposal is also a lot less able to handle active adversaries: if
> > nodes are randomly dropping blocks, the probability that one block in
> > particular is dropped by everyone goes up significantly. 
> 
> You make the assumption that this new mode of pruning will be used by 100% 
> of the network, this is not how distributed systems work.
>

Storing random but complete blocks requires the assumption this is _not_ the
case; David's does not make any assumptions. So on top of the performance
considerations there is this potential DoS vector.
 

-- 
Andrew Poelstra
Mathematics Department, Blockstream
Email: apoelstra at wpsoftware.net
Web:   https://www.wpsoftware.net/andrew

"A goose alone, I suppose, can know the loneliness of geese
 who can never find their peace,
 whether north or south or west or east"
       --Joanna Newsom


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [bitcoin-dev] Small Nodes: A Better Alternative to Pruned Nodes
  2017-04-20 15:50   ` Erik Aronesty
@ 2017-04-20 23:42     ` Aymeric Vitte
  0 siblings, 0 replies; 24+ messages in thread
From: Aymeric Vitte @ 2017-04-20 23:42 UTC (permalink / raw)
  To: bitcoin-dev

[-- Attachment #1: Type: text/plain, Size: 10884 bytes --]

??? what do you mean? (https://www.soyoustart.com/fr/serveurs-essential/)


Le 20/04/2017 à 17:50, Erik Aronesty via bitcoin-dev a écrit :
> Try to find 1TB dedicated server hosting ...
>
> If you want to set up an ecommerce site somewhere besides your living
> room, storage costs are still a concern.
>
> On Mon, Apr 17, 2017 at 3:11 AM, Danny Thorpe via bitcoin-dev
> <bitcoin-dev@lists•linuxfoundation.org
> <mailto:bitcoin-dev@lists•linuxfoundation.org>> wrote:
>
>     1TB HDD is now available for under $40 USD.  How is the 100GB
>     storage requirement preventing anyone from setting up full nodes?
>
>     On Apr 16, 2017 11:55 PM, "David Vorick via bitcoin-dev"
>     <bitcoin-dev@lists•linuxfoundation.org
>     <mailto:bitcoin-dev@lists•linuxfoundation.org>> wrote:
>
>         *Rationale:*
>
>         A node that stores the full blockchain (I will use the term
>         archival node) requires over 100GB of disk space, which I
>         believe is one of the most significant barriers to more people
>         running full nodes. And I believe the ecosystem would benefit
>         substantially if more users were running full nodes.
>
>         The best alternative today to storing the full blockchain is
>         to run a pruned node, which keeps only the UTXO set and throws
>         away already verified blocks. The operator of the pruned node
>         is able to enjoy the full security benefits of a full node,
>         but is essentially leeching the network, as they performed a
>         large download likely without contributing anything back.
>
>         This puts more pressure on the archival nodes, as the archival
>         nodes need to pick up the slack and help new nodes bootstrap
>         to the network. As the pressure on archival nodes grows, fewer
>         people will be able to actually run archival nodes, and the
>         situation will degrade. The situation would likely become
>         problematic quickly if bitcoin-core were to ship with the
>         defaults set to a pruned node.
>
>         Even further, the people most likely to care about saving
>         100GB of disk space are also the people least likely to care
>         about some extra bandwidth usage. For datacenter nodes, and
>         for nodes doing lots of bandwidth, the bandwidth is usually
>         the biggest cost of running the node. For home users however,
>         as long as they stay under their bandwidth cap, the bandwidth
>         is actually free. Ideally, new nodes would be able to
>         bootstrap from nodes that do not have to pay for their
>         bandwidth, instead of needing to rely on a decreasing
>         percentage of heavy-duty archival nodes.
>
>         I have (perhaps incorrectly) identified disk space consumption
>         as the most significant factor in your average user choosing
>         to run a pruned node or a lite client instead of a full node.
>         The average user is not typically too worried about bandwidth,
>         and is also not typically too worried about initial blockchain
>         download time. But the 100GB hit to your disk space can be a
>         huge psychological factor, especially if your hard drive only
>         has 500GB available in the first place, and 250+ GB is already
>         consumed by other files you have.
>
>         I believe that improving the disk usage situation would
>         greatly benefit decentralization, especially if it could be
>         done without putting pressure on archival nodes.
>
>         *Small Nodes Proposal:*
>
>         I propose an alternative to the pruned node that does not put
>         undue pressure on archival nodes, and would be acceptable and
>         non-risky to ship as a default in bitcoin-core. For lack of a
>         better name, I'll call this new type of node a 'small node'.
>         The intention is that bitcoin-core would eventually ship
>         'small nodes' by default, such that the expected amount of
>         disk consumption drops from today's 100+ GB to less than 30 GB.
>
>         My alternative proposal has the following properties:
>
>         + Full nodes only need to store ~20% of the blockchain
>         + With very high probability, a new node will be able to
>         recover the entire blockchain by connecting to 6 random small
>         node peers.
>         + An attacker that can eliminate a chosen+ 95% of the full
>         nodes running today will be unable to prevent new nodes from
>         downloading the full blockchain, even if the attacker is also
>         able to eliminate all archival nodes. (assuming all nodes
>         today were small nodes instead of archival nodes)
>
>         Method:
>
>         A small node will pick an index [5, 256). This index is that
>         node's permanent index. When storing a block, instead of
>         storing the full block, the node will use Reed-Solomon coding
>         to erasure code the block using a 5-of-256 scheme. The result
>         will be 256 pieces that are 20% of the size of the block each.
>         The node picks the piece that corresponds to its index, and
>         stores that instead. (Indexes 0-4 are reserved for archival
>         nodes - explained later)
>
>         The node is now storing a fragment of every block. Alone, this
>         fragment cannot be used to recover any piece of the
>         blockchain. However, when paired with any 5 unique fragments
>         (fragments of the same index will not be unique), the full
>         block can be recovered.
>
>         Nodes can optionally store more than 1 fragment each. At 5
>         fragments, the node becomes a full archival node, and the
>         chosen indexes should be 0-4. This is advantageous for the
>         archival node as the encoded data for the first 5 indexes will
>         actually be identical to the block itself - there is no
>         computational overhead for selecting the first indexes. There
>         is also no need to choose random indexes, because the full
>         block can be recovered no matter which indexes are chosen.
>
>         When connecting to new peers, the indexes of each peer needs
>         to be known. Once peers totaling 5 unique indexes are
>         discovered, blockchain download can begin. Connecting to just
>         5 small node peers provides a >95% chance of getting 5
>         uniques, with exponentially improving odds of success as you
>         connect to more peers. Connecting to a single archive node
>         guarantees that any gaps can be filled.
>
>         A good encoder should be able to turn a block into a 5-of-256
>         piece set in under 10 milliseconds using a single core on a
>         standard consumer desktop. This should not slow down initial
>         blockchain download substantially, though the overhead is more
>         than a rounding error.
>
>         *DoS Prevention:*
>
>         A malicious node may provide garbage data instead of the
>         actual piece. Given just the garbage data and 4 other correct
>         pieces, it is impossible (best I know anyway) to tell which
>         piece is the garbage piece.
>
>         One option in this case would be to seek out an archival node
>         that could verify the correctness of the pieces, and identify
>         the malicious node.
>
>         Another option would be to have the small nodes store a
>         cryptographic checksum of each piece. Obtaining the
>         cryptographic checksum for all 256 pieces would incur a
>         nontrivial amount of hashing (post segwit, as much as 100MB of
>         extra hashing per block), and would require an additional ~4kb
>         of storage per block. The hashing overhead here may be
>         prohibitive.
>
>         Another solution would be to find additional pieces and
>         brute-force combinations of 5 until a working combination was
>         discovered. Though this sounds nasty, it should take less than
>         five seconds of computation to find the working combination
>         given 5 correct pieces and 2 incorrect pieces. This
>         computation only needs to be performed once to identify the
>         malicious peers.
>
>         I also believe that alternative erasure coding schemes exist
>         which actually are able to identify the bad pieces given
>         sufficient good pieces, however I don't know if they have the
>         same computational performance as the best Reed-Solomon coding
>         implementations.
>
>         *Deployment:*
>
>         Small nodes are completely useless unless the critical mass of
>         5 pieces can be obtained. The first version that supports
>         small node block downloads should default everyone to an
>         archival node (meaning indexes 0-4 are used)
>
>         Once there are enough small-node-enabled archive nodes, the
>         default can be switched so that nodes only have a single index
>         by default. In the first few days, when there are only a few
>         small nodes, the previously-deployed archival nodes can help
>         fill in the gaps, and the small nodes can be useful for
>         blockchain download right away.
>
>         ----------------------------------
>
>         This represents a non-trivial amount of code, but I believe
>         that the result would be a non-trivial increase in the
>         percentage of users running full nodes, and a healthier
>         overall network.
>
>         _______________________________________________
>         bitcoin-dev mailing list
>         bitcoin-dev@lists•linuxfoundation.org
>         <mailto:bitcoin-dev@lists•linuxfoundation.org>
>         https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>         <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>
>
>
>     _______________________________________________
>     bitcoin-dev mailing list
>     bitcoin-dev@lists•linuxfoundation.org
>     <mailto:bitcoin-dev@lists•linuxfoundation.org>
>     https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>     <https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev>
>
>
>
>
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists•linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev

-- 
Zcash wallets made simple: https://github.com/Ayms/zcash-wallets
Bitcoin wallets made simple: https://github.com/Ayms/bitcoin-wallets
Get the torrent dynamic blocklist: http://peersm.com/getblocklist
Check the 10 M passwords list: http://peersm.com/findmyass
Anti-spies and private torrents, dynamic blocklist: http://torrent-live.org
Peersm : http://www.peersm.com
torrent-live: https://github.com/Ayms/torrent-live
node-Tor : https://www.github.com/Ayms/node-Tor
GitHub : https://www.github.com/Ayms


[-- Attachment #2: Type: text/html, Size: 19603 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [bitcoin-dev] Small Nodes: A Better Alternative to Pruned Nodes
  2017-04-20 20:32       ` Andrew Poelstra
@ 2017-04-21  8:27         ` Tom Zander
  0 siblings, 0 replies; 24+ messages in thread
From: Tom Zander @ 2017-04-21  8:27 UTC (permalink / raw)
  To: bitcoin-dev

On Thursday, 20 April 2017 22:32:12 CEST Andrew Poelstra wrote:
> > > If you are downloading 450,000 blocks, you will need to
> > > connect to an expected 46 peers to download the whole blockchain.
> > 
> > I don’t really see the problem here, even if your math is a off.
> > (Statistics is difficult, I know). Connecting to many nodes to download
> > faster is really not an issue and already happens.
> 
> I think the expected number of peers is actually ~47.75

Nice to join bitcoin-dev, Andrew. Haven’t seen you post here before.

I’m not sure how you reached that strange number, but I have to point out 
your number is quite useless.

The actual amount of nodes you need to be 100% sure you find all the blocks 
when you know each node will have a completely random 25% of the blocks is 
not a maths problem that leads to a single answer because of the randomness 
involved.
The actual answer is a series of probabilities.

Same as the answer is to the age old question; how many coin flips does it 
take to be 100% certain I have at least one “Heads”.

In our blocks retrieval scenario; with num-nodes < 4, probability is zero.
There is a really really small chance you will get 100% of the blocks with 4 
nodes (actual number depends on the amount of total blocks you are looking 
for).
And this goes up as you add more nodes, but never reaches 100%

At the other end of this question you can ask what the chance is of at least 
one block being lost when there are N nodes, a block nobody has. That chance 
is small with current > 6000 nodes, but not zero (a second reason why the 
previous parag never reaches 100%).

Bottom line, it is silly to assume 100% of the nodes would be partial-
pruning, and if you continue on that path you will only have probabilities 
to predict how many nodes it takes to have 100% coverage, exact numbers are 
worse than useless, they are misleading.

As I said in my initial email, statistics is hard. Crypto is much easier in 
that it is absolute. Either correct or false. Never in between.

To repeat, the goal of this pruning method is not to replace a full 
“archival” node, the goal of this pruning node is to provide an improvement 
over the current pruning node which stops any and all serving of historical 
blocks.
Anyone that feels the need to talk about pruning modes like 100% of the full 
nodes will run it are in actual fact not talking about the real world. 
Distributed systems will never (and should never) end up being a mono-
culture. Diversity is the essential thing you aim for.

I would suggest we focus on the real world and not on irreleavant math 
experiments that only lead to confusion.
-- 
Tom Zander
Blog: https://zander.github.io
Vlog: https://vimeo.com/channels/tomscryptochannel


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [bitcoin-dev] Small Nodes: A Better Alternative to Pruned Nodes
  2017-04-17  7:11 ` Danny Thorpe
  2017-04-17  7:27   ` David Vorick
  2017-04-20 15:50   ` Erik Aronesty
@ 2017-04-21 13:35   ` David Kaufman
  2017-04-21 15:58     ` Leandro Coutinho
  2 siblings, 1 reply; 24+ messages in thread
From: David Kaufman @ 2017-04-21 13:35 UTC (permalink / raw)
  To: Danny Thorpe; +Cc: Bitcoin Dev

Hi Danny,

On Mon, Apr 17, 2017 at 3:11 AM, Danny Thorpe wrote:
>
> 1TB HDD is now available for under $40 USD.  How is the 100GB storage
> requirement preventing anyone from setting up full nodes?

Yeah, but that's because most people (well, using myself as the
"target market" anyway) are upgrading to SSD's for the faster boot and
response times.  Modern consumer OS's run incredibly slow on
non-ssd drives!  And since the vast majority of consumer laptops sold
today fall into the $400 to $700 range, a 200 - 500gb SSD is about the
most storage upgrade people can afford.

And so I think David's premise, that having to devote only 30GB to
running a full node instead of 100, would remove a major obstacle that
prevents many more people running full bitcoin nodes.

My only suggestion is, does it scale?  I mean, if the bitcoin network
volume grows exponentially and in 2 years the blockchain is 500GB, can
the "small node" be adjusted down from one fifth of the blockchain to
just one-tenth, or one twentieth?  Can different smalInesses
interoperate? Can I choose to store a small node with 20 - 30% of the
blockchain, while others chose to share just 5% or 10% of it? Can I run
"less small" node today that's 50GB?

Can the default install be a "small node" that requires about 30GB of
storage (if that is indeed the sweet spot for enticing many more users to
bringing nodes online), but allow the user at install time, to choose *how*
small? To, say, drag a slider anywhere up and down the range from
10GB to 100GB?

If not, then it will have to be revisited constantly as the blockchain
grows, and disk storage prices drop.  I suspect the blockchain will
grow in size, at some point in the not too distant future, much faster
than storage prices drop, so making small, smaller and smallest nodes
that can be configured to store more or less of it will be necessary
to motivate most users to run nodes at all.  But when that happens,
there is likely to be exponentially *more* people using bitcoin, too!
So an exponentially growing number of users running (smaller and
smaller) nodes would take up the slack.

Then, the blockchain would begin to look a lot more like a bittorrent,
right? ;-) but -- happily -- one that you never need to download fully.

-dave


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [bitcoin-dev] Small Nodes: A Better Alternative to Pruned Nodes
  2017-04-21 13:35   ` David Kaufman
@ 2017-04-21 15:58     ` Leandro Coutinho
  0 siblings, 0 replies; 24+ messages in thread
From: Leandro Coutinho @ 2017-04-21 15:58 UTC (permalink / raw)
  To: David Kaufman; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 3628 bytes --]

Maybe it already exists ...

#9484 <https://github.com/bitcoin/bitcoin/pull/9484> 812714f
<https://github.com/bitcoin/bitcoin/commit/812714f> Introduce assumevalid
setting to skip validation presumed valid scripts (gmaxwell)
https://github.com/bitcoin/bitcoin/pull/9484

..., but ...
It would be very interesting if a new node could decide to be a pruned node:
  - it would need to trust one or more peers for the initial blockchain
download, because the blocks downloaded would not be validated
  - it would decide a time from when to get the blocks, like a week before
  - once a day a routine would run that would prune blocks older than the
chosen time

"

*The unspent transaction outputs (which is the only essential piece ofdata
necessary for validation) are already kept in a separate database,so
technically removing old blocks is perfectly possible.*" Pieter Wuille
https://bitcoin.stackexchange.com/questions/11170/why-is-pruning-not-considered-already-at-the-moment


On Fri, Apr 21, 2017 at 10:35 AM, David Kaufman via bitcoin-dev <
bitcoin-dev@lists•linuxfoundation.org> wrote:

> Hi Danny,
>
> On Mon, Apr 17, 2017 at 3:11 AM, Danny Thorpe wrote:
> >
> > 1TB HDD is now available for under $40 USD.  How is the 100GB storage
> > requirement preventing anyone from setting up full nodes?
>
> Yeah, but that's because most people (well, using myself as the
> "target market" anyway) are upgrading to SSD's for the faster boot and
> response times.  Modern consumer OS's run incredibly slow on
> non-ssd drives!  And since the vast majority of consumer laptops sold
> today fall into the $400 to $700 range, a 200 - 500gb SSD is about the
> most storage upgrade people can afford.
>
> And so I think David's premise, that having to devote only 30GB to
> running a full node instead of 100, would remove a major obstacle that
> prevents many more people running full bitcoin nodes.
>
> My only suggestion is, does it scale?  I mean, if the bitcoin network
> volume grows exponentially and in 2 years the blockchain is 500GB, can
> the "small node" be adjusted down from one fifth of the blockchain to
> just one-tenth, or one twentieth?  Can different smalInesses
> interoperate? Can I choose to store a small node with 20 - 30% of the
> blockchain, while others chose to share just 5% or 10% of it? Can I run
> "less small" node today that's 50GB?
>
> Can the default install be a "small node" that requires about 30GB of
> storage (if that is indeed the sweet spot for enticing many more users to
> bringing nodes online), but allow the user at install time, to choose *how*
> small? To, say, drag a slider anywhere up and down the range from
> 10GB to 100GB?
>
> If not, then it will have to be revisited constantly as the blockchain
> grows, and disk storage prices drop.  I suspect the blockchain will
> grow in size, at some point in the not too distant future, much faster
> than storage prices drop, so making small, smaller and smallest nodes
> that can be configured to store more or less of it will be necessary
> to motivate most users to run nodes at all.  But when that happens,
> there is likely to be exponentially *more* people using bitcoin, too!
> So an exponentially growing number of users running (smaller and
> smaller) nodes would take up the slack.
>
> Then, the blockchain would begin to look a lot more like a bittorrent,
> right? ;-) but -- happily -- one that you never need to download fully.
>
> -dave
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists•linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>

[-- Attachment #2: Type: text/html, Size: 4702 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [bitcoin-dev] Small Nodes: A Better Alternative to Pruned Nodes
  2017-04-17  6:54 [bitcoin-dev] Small Nodes: A Better Alternative to Pruned Nodes David Vorick
                   ` (3 preceding siblings ...)
  2017-04-18 10:50 ` Tom Zander
@ 2017-04-21 20:38 ` Gregory Maxwell
  2017-04-23 16:27   ` Aymeric Vitte
  2017-05-03 14:03   ` Erik Aronesty
  4 siblings, 2 replies; 24+ messages in thread
From: Gregory Maxwell @ 2017-04-21 20:38 UTC (permalink / raw)
  To: David Vorick; +Cc: Bitcoin Dev

On Mon, Apr 17, 2017 at 6:54 AM, David Vorick via bitcoin-dev
<bitcoin-dev@lists•linuxfoundation.org> wrote:
> Rationale:
>
> A node that stores the full blockchain (I will use the term archival node)
> requires over 100GB of disk space, which I believe is one of the most
> significant barriers to more people running full nodes. And I believe the
> ecosystem would benefit substantially if more users were running full nodes.

Hi David,

I've been thinking about this subject for a long time (Tier Nolan
linked some of the threads; see also the coloration part of
https://en.bitcoin.it/wiki/User:Gmaxwell/block_network_coding) and
about using FEC with the history for the last year or so. (we use FEC
in practice in fibre for relay at the tip now, and that has a design
history going back to 2013).

As Jonas points out, we're all set to also offer the 'recent blocks'
separately, so that is obviously going to happen and will help. The
free design parameter is the definition of recent, but we have good
measurements for setting that now. But about history...

I think I might have designed myself into a corner and perhaps you've
shown a way out-- I think there are some serious limits in your
proposal but perhaps we can fix them.  Let me give you what I had been
thinking about, then hand you at least a couple improvements to your
thinking:

As you hopefully now know (per Tier Nolan's) post: I'd previously been
thinking about nodes keeping a deterministic random, independently
selected subset which is self-leveling based on a small seed.   The
connection overhead can can be mitigated by working with chunks of
blocks rather than single blocks.

But as you've observed, the failure probabilities are rather high,
especially if an active attacker targets nodes carrying less commonly
available blocks.  So I thought, okay we can use FEC to help improve
the recovery odds.

So I considered using a sparse random code (E.g. an LDPC erasure code
like in RFC 5170) the advantage of these codes is that they are very
fast to decode, and they support having enormous codewords, so you can
a high probability of every peer having a unique ID, so there would
effectively never need to be any duplication.

But then I ran into the problem that I had no reasonable way to
recover from bad data, short of pulling a single group from an archive
then banning whatever peers gave you bad chunks.

So that is kind of where I got stuck.

I didn't even consider the advantage of only being able to use a few
peers total, as I still assumed that it would be doing the random
groups thing as well. That's a great point.

So on your design:

Being able to have a lower bound of 20% seems like a serious
limitation to me: it would be fine today, but the chain will quite
possibly be twice the current size in a year.... and in four years
we're back to where we are now. What I'd been thinking would be able
to scale arbitrarily low.

20% is small, but is it small enough? -- today that would get us back
into common SSDs being able to hold the whole chain, but that property
will probably be lost again in a couple years. I think with any fixed
fraction we'll constantly be fighting against the opportunity cost of
upgrading storage, if not the cost of the storage itself. (and I agree
this is the vast majority of the response from actual users,  sync
time and ongoing bandwidth usage seeming to tie for second; the latter
of which can be mitigated in other ways but for the former see my
remarks at the end)

The fact that having only a few required blocks needed lets you brute
force out the decode is a great point.  I hadn't considered that, and
the schemed I'd been considering are not communications efficient with
only a few blocks required e.g. they sometimes require a couple extra
blocks to successfully decode, which is a lot of overhead if you're
only splitting 5 ways.

> I also believe that alternative erasure coding schemes exist which actually
> are able to identify the bad pieces given sufficient good pieces, however I
> don't know if they have the same computational performance as the best
> Reed-Solomon coding implementations.

Actually RS codes are _the_ codes you want to use for with decoding
with errors but the 'computationally efficient' error correcting
decoding (which is itself heinously slow) requires 2*errors + data
number of blocks to decode. Not so useful if you're only looking at 5
parts.

RS decoding is pretty slow generally, esp compared to binary sparse
codes.  We were unable to make RS coding make sense for use in fast
block relay for this reason-- decoding time bottlenecked
reconstruction. The most highly optimized RS codes make a special
optimization which is not useful for your proposal: They're much
faster to decode if you're decoding from the first couple correction
blocks.  Without these optimizations speeds from from 1GB/s to more
like 100MB/s or worse.  Though I suppose with assumevalid initial sync
now is pretty cpu-idle on most hardware.  (If 100MB/s sounds fast,
keep in mind that time spent decoding is time that can't be spent
hashing/verifying/etc.. and on a lot hardware with fast broadband with
signature validation enabled we're cpu bound already)

> Another option would be to have the small nodes store a cryptographic
> checksum of each piece. Obtaining the cryptographic checksum for all 256
> pieces would incur a nontrivial amount of hashing (post segwit, as much as
> 100MB of extra hashing per block), and would require an additional ~4kb of
> storage per block. The hashing overhead here may be prohibitive.

This was a complete non-starter when thinking about using these sparse
codes where the number of possible blocks is effectively unlimited.

Your 4kb assumption isn't correct though: How you do it is that after
downloading a block you compute the 255 fragments (as an aside, you're
saying 256 but the most you can get is 255 for an 8-bit RS code).
then you compute a hashtree over them. Every node remembers the root,
and the membership proofs for their chunk(s)... this is 256 bytes of
extra storage.

When you decode you use a majority to decide what root you are trying
to decode. If it fails to result in valid blocks, you blacklist that
root, ban all those peers, and try the next.  Worst case cost is the
number of invalid roots rather than peers choose 5.

I'm not sure if combinitoral decode or the above would be easier to implement.

> This represents a non-trivial amount of code, but I believe that the result
> would be a non-trivial increase in the percentage of users running full
> nodes, and a healthier overall network.

The RS coder stuff is small, even doing the fancy w/ error decodes it
isn't huge. I expect complexity mostly will show up in the garbage
input handling.

A couple other thoughts:

The coded blocks are not useful for things like bloom scanning or
other lookup.  With the committed filter proposals you could still
keep the committed filters (even 5 way shared themselves...) so
perhaps not that much of a concern.

Coding the blocks will make their encoding normative. The current P2P
one we use is by no means more efficient,  Pieter and I have an
encoding that works on a transaction by transaction basis and
decreases the size of the whole chain by ~28%, block-wide entropy
encoding could reduce the size even further.  We'd hoped to at least
start using this per transaction coding locally, converting on the fly
to the original serialization where needed (the idea would be to
eventually support the compacted serialization on the wire too).
Maybe the answer there is to just get in whatever improvements we
think are reasonable before doing the coded block.

I think the 20% floor is still too high, and too many nodes will be
forced to just do the recent blocks things. perhaps not today, but
eventually.   I suppose we could just assume that the block is split
10 ways, and the default is two indexes, but there is flexibility to
go down to one in the future, at the cost of needing more peers...
could run numbers on the decode failure odds for the 10 of 255 case...
But that would only improve it to 10%.   I suppose the proposal could
be combined with sparse chain storage like I was thinking years ago,
but much less sparsity would be needed so the downsides would be less
severe.

E.g. if you want to store <10% of the chain you'd keep some 10% blocks
for random groups.  such a feature could be introduced later when it
was more important to keep less than 10 or 20 percent.

Recovery odds could be improved with a second level of coding. E.g. if
your ID was not a constant but a seed into PRNG(height)%250+5  then
you also have a fraction of nodes store the xor between adjacent pairs
then you can fill in unrecoverable groups.  the limit of this thinking
is exactly the sparse random code ECC schemes) Probably not worth the
complexity, but just a thing to keep in mind...

Probably the next step is to do some more focused benchmarks. I have
some code I can recommend offline.

I can't help but feel that this might be a little bit of a waste of
time. I think the long term survival of the system is likely going to
ultimately depend on doing an assumevalid like gated sync of a UTXO
set.  Ethereum already does something far more reckless (they let
miners pick the state and blindly trust it with almost no depth
required) and no one cares.  If that is done then all these concerns
are greatly reduced, along with the 100(+++?)GB/history-year transfer
costs.  You'd still want to have archives, but do you care about 20%
archives?

Replying to some other comments in the thread:

> A user may not want to run a node at home, but rather on a digital ocean or AWS server

This is almost if not quite entirely pointless. There are already many
nodes on AWS and DO, adding more does not improve your security (you
must trust DO/AWS), does not improve your reliability (DO/AWS), does
not improve the network's capacity, etc.  About the only arguable
benefit I can see there beyond making more money for these hosts is
feel good points for (incorrectly) thinking you're helping the
network, and negligibly reducing the density of spy nodes (but wait:
AWS/DO may well just be spying on your connections too..). and it it
isn't like fast storage on these services is cheap.

> Perhaps it is not, but I would think that it would be pretty straightforward to configure a global bandwidth limit within Bitcoin.

We have outbound transmission limits though not by default. Setting
them sensibly is something of a black art. Obviously these will
improve over time... and are more reasons that I agree with your
relative cost assumptions except for the sync-time part.

> Fingerprinting?

Unavoidable, but should be made inconsequential by making transaction
announcement more private independent of any of this. There are
already _MANY_ ways to fingerprint nodes, please don't think that
existing software has any real immunity here. We avoid adding new
fingerprinting where we can... but they're very fingerprintable
already. Transaction announcement privacy MUST be fixed.  I assume if
any of this is worth doing, it will also be worth the increase in
fingerprinting. And at least this proposal would 'only' create 255
node classes.

> SPV?

As I pointed out above, Vorick's idea is totally compatible with
committed filters, and the filters can even be shared like the blocks
are.

> [People who fail at math]

The SNR on this list really sucks.  If you aren't spending a couple
hours thinking about your responses or at least citing research that
took days of effort or more then you are probably wasting people's
time. Please respect the other users of the list.

> Could there be a slider?

Absolutely, I assume if Vorick's proposal were implemented that nodes
would have the follow options: Pruned [UTXO + recent two weeks of
blocks], 20%, 40%, 60%, 80%, 100% (archive).


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [bitcoin-dev] Small Nodes: A Better Alternative to Pruned Nodes
  2017-04-21 20:38 ` Gregory Maxwell
@ 2017-04-23 16:27   ` Aymeric Vitte
  2017-05-03 14:03   ` Erik Aronesty
  1 sibling, 0 replies; 24+ messages in thread
From: Aymeric Vitte @ 2017-04-23 16:27 UTC (permalink / raw)
  To: Gregory Maxwell, Bitcoin Dev

"Absolutely, I assume if Vorick's proposal were implemented that nodes
would have the follow options: Pruned [UTXO + recent two weeks of
blocks], 20%, 40%, 60%, 80%, 100% (archive)."

Yes, and I think that they can have this in "disorder" (only 20% of
blocks in the middle of the blockchain for example)

There are many reasons why I dislike David's proposal as it is, you
mention some below, why 20%? too much? too small? what will happen
tomorrow? etc, I gave others, and this still does not explain why people
should go for more than two weeks of blocks

Maybe what I suggested here again
https://gist.github.com/Ayms/aab6f8e08fef0792ab3448f542a826bf#proposal
could be considered

This is just a suggestion based on incremental "torrents", fortunately
it comes from much more than days of work as you require below and is
the concatenation of thoughts from different projects/studies

It does follow your 8 rules (although I am not sure what you mean in
rule 2 "The decision to contact a node should need O(1) communications",
1 M limit works?) and proposes others, and it solves the issues
mentioned below, or please someone tell me why it does not (I know
people here dislike DHTs, not sure why)

Except fingerprinting (and I don't know what is the SPV issue exactly)
but still is better, if the nodes behave like the groups with most
numerous peers (ie the group seeding, 20%, or the one seeding 40%, or
the one seeding about nothing (sic... subliminal message here...), etc)
then they just can't be fingerprinted (at least based on this feature)

I think the main concepts are detailed enough but maybe that's not the
case, it's a really draft, if you look well the pruning case is
considered, but not explained, like some other points, because
continuing this work with no incentive solution for the seeders looks to
me useless


Le 21/04/2017 à 22:38, Gregory Maxwell via bitcoin-dev a écrit :
> On Mon, Apr 17, 2017 at 6:54 AM, David Vorick via bitcoin-dev
> <bitcoin-dev@lists•linuxfoundation.org> wrote:
>> Rationale:
>>
>> A node that stores the full blockchain (I will use the term archival node)
>> requires over 100GB of disk space, which I believe is one of the most
>> significant barriers to more people running full nodes. And I believe the
>> ecosystem would benefit substantially if more users were running full nodes.
> Hi David,
>
> I've been thinking about this subject for a long time (Tier Nolan
> linked some of the threads; see also the coloration part of
> https://en.bitcoin.it/wiki/User:Gmaxwell/block_network_coding) and
> about using FEC with the history for the last year or so. (we use FEC
> in practice in fibre for relay at the tip now, and that has a design
> history going back to 2013).
>
> As Jonas points out, we're all set to also offer the 'recent blocks'
> separately, so that is obviously going to happen and will help. The
> free design parameter is the definition of recent, but we have good
> measurements for setting that now. But about history...
>
> I think I might have designed myself into a corner and perhaps you've
> shown a way out-- I think there are some serious limits in your
> proposal but perhaps we can fix them.  Let me give you what I had been
> thinking about, then hand you at least a couple improvements to your
> thinking:
>
> As you hopefully now know (per Tier Nolan's) post: I'd previously been
> thinking about nodes keeping a deterministic random, independently
> selected subset which is self-leveling based on a small seed.   The
> connection overhead can can be mitigated by working with chunks of
> blocks rather than single blocks.
>
> But as you've observed, the failure probabilities are rather high,
> especially if an active attacker targets nodes carrying less commonly
> available blocks.  So I thought, okay we can use FEC to help improve
> the recovery odds.
>
> So I considered using a sparse random code (E.g. an LDPC erasure code
> like in RFC 5170) the advantage of these codes is that they are very
> fast to decode, and they support having enormous codewords, so you can
> a high probability of every peer having a unique ID, so there would
> effectively never need to be any duplication.
>
> But then I ran into the problem that I had no reasonable way to
> recover from bad data, short of pulling a single group from an archive
> then banning whatever peers gave you bad chunks.
>
> So that is kind of where I got stuck.
>
> I didn't even consider the advantage of only being able to use a few
> peers total, as I still assumed that it would be doing the random
> groups thing as well. That's a great point.
>
> So on your design:
>
> Being able to have a lower bound of 20% seems like a serious
> limitation to me: it would be fine today, but the chain will quite
> possibly be twice the current size in a year.... and in four years
> we're back to where we are now. What I'd been thinking would be able
> to scale arbitrarily low.
>
> 20% is small, but is it small enough? -- today that would get us back
> into common SSDs being able to hold the whole chain, but that property
> will probably be lost again in a couple years. I think with any fixed
> fraction we'll constantly be fighting against the opportunity cost of
> upgrading storage, if not the cost of the storage itself. (and I agree
> this is the vast majority of the response from actual users,  sync
> time and ongoing bandwidth usage seeming to tie for second; the latter
> of which can be mitigated in other ways but for the former see my
> remarks at the end)
>
> The fact that having only a few required blocks needed lets you brute
> force out the decode is a great point.  I hadn't considered that, and
> the schemed I'd been considering are not communications efficient with
> only a few blocks required e.g. they sometimes require a couple extra
> blocks to successfully decode, which is a lot of overhead if you're
> only splitting 5 ways.
>
>> I also believe that alternative erasure coding schemes exist which actually
>> are able to identify the bad pieces given sufficient good pieces, however I
>> don't know if they have the same computational performance as the best
>> Reed-Solomon coding implementations.
> Actually RS codes are _the_ codes you want to use for with decoding
> with errors but the 'computationally efficient' error correcting
> decoding (which is itself heinously slow) requires 2*errors + data
> number of blocks to decode. Not so useful if you're only looking at 5
> parts.
>
> RS decoding is pretty slow generally, esp compared to binary sparse
> codes.  We were unable to make RS coding make sense for use in fast
> block relay for this reason-- decoding time bottlenecked
> reconstruction. The most highly optimized RS codes make a special
> optimization which is not useful for your proposal: They're much
> faster to decode if you're decoding from the first couple correction
> blocks.  Without these optimizations speeds from from 1GB/s to more
> like 100MB/s or worse.  Though I suppose with assumevalid initial sync
> now is pretty cpu-idle on most hardware.  (If 100MB/s sounds fast,
> keep in mind that time spent decoding is time that can't be spent
> hashing/verifying/etc.. and on a lot hardware with fast broadband with
> signature validation enabled we're cpu bound already)
>
>> Another option would be to have the small nodes store a cryptographic
>> checksum of each piece. Obtaining the cryptographic checksum for all 256
>> pieces would incur a nontrivial amount of hashing (post segwit, as much as
>> 100MB of extra hashing per block), and would require an additional ~4kb of
>> storage per block. The hashing overhead here may be prohibitive.
> This was a complete non-starter when thinking about using these sparse
> codes where the number of possible blocks is effectively unlimited.
>
> Your 4kb assumption isn't correct though: How you do it is that after
> downloading a block you compute the 255 fragments (as an aside, you're
> saying 256 but the most you can get is 255 for an 8-bit RS code).
> then you compute a hashtree over them. Every node remembers the root,
> and the membership proofs for their chunk(s)... this is 256 bytes of
> extra storage.
>
> When you decode you use a majority to decide what root you are trying
> to decode. If it fails to result in valid blocks, you blacklist that
> root, ban all those peers, and try the next.  Worst case cost is the
> number of invalid roots rather than peers choose 5.
>
> I'm not sure if combinitoral decode or the above would be easier to implement.
>
>> This represents a non-trivial amount of code, but I believe that the result
>> would be a non-trivial increase in the percentage of users running full
>> nodes, and a healthier overall network.
> The RS coder stuff is small, even doing the fancy w/ error decodes it
> isn't huge. I expect complexity mostly will show up in the garbage
> input handling.
>
> A couple other thoughts:
>
> The coded blocks are not useful for things like bloom scanning or
> other lookup.  With the committed filter proposals you could still
> keep the committed filters (even 5 way shared themselves...) so
> perhaps not that much of a concern.
>
> Coding the blocks will make their encoding normative. The current P2P
> one we use is by no means more efficient,  Pieter and I have an
> encoding that works on a transaction by transaction basis and
> decreases the size of the whole chain by ~28%, block-wide entropy
> encoding could reduce the size even further.  We'd hoped to at least
> start using this per transaction coding locally, converting on the fly
> to the original serialization where needed (the idea would be to
> eventually support the compacted serialization on the wire too).
> Maybe the answer there is to just get in whatever improvements we
> think are reasonable before doing the coded block.
>
> I think the 20% floor is still too high, and too many nodes will be
> forced to just do the recent blocks things. perhaps not today, but
> eventually.   I suppose we could just assume that the block is split
> 10 ways, and the default is two indexes, but there is flexibility to
> go down to one in the future, at the cost of needing more peers...
> could run numbers on the decode failure odds for the 10 of 255 case...
> But that would only improve it to 10%.   I suppose the proposal could
> be combined with sparse chain storage like I was thinking years ago,
> but much less sparsity would be needed so the downsides would be less
> severe.
>
> E.g. if you want to store <10% of the chain you'd keep some 10% blocks
> for random groups.  such a feature could be introduced later when it
> was more important to keep less than 10 or 20 percent.
>
> Recovery odds could be improved with a second level of coding. E.g. if
> your ID was not a constant but a seed into PRNG(height)%250+5  then
> you also have a fraction of nodes store the xor between adjacent pairs
> then you can fill in unrecoverable groups.  the limit of this thinking
> is exactly the sparse random code ECC schemes) Probably not worth the
> complexity, but just a thing to keep in mind...
>
> Probably the next step is to do some more focused benchmarks. I have
> some code I can recommend offline.
>
> I can't help but feel that this might be a little bit of a waste of
> time. I think the long term survival of the system is likely going to
> ultimately depend on doing an assumevalid like gated sync of a UTXO
> set.  Ethereum already does something far more reckless (they let
> miners pick the state and blindly trust it with almost no depth
> required) and no one cares.  If that is done then all these concerns
> are greatly reduced, along with the 100(+++?)GB/history-year transfer
> costs.  You'd still want to have archives, but do you care about 20%
> archives?
>
> Replying to some other comments in the thread:
>
>> A user may not want to run a node at home, but rather on a digital ocean or AWS server
> This is almost if not quite entirely pointless. There are already many
> nodes on AWS and DO, adding more does not improve your security (you
> must trust DO/AWS), does not improve your reliability (DO/AWS), does
> not improve the network's capacity, etc.  About the only arguable
> benefit I can see there beyond making more money for these hosts is
> feel good points for (incorrectly) thinking you're helping the
> network, and negligibly reducing the density of spy nodes (but wait:
> AWS/DO may well just be spying on your connections too..). and it it
> isn't like fast storage on these services is cheap.
>
>> Perhaps it is not, but I would think that it would be pretty straightforward to configure a global bandwidth limit within Bitcoin.
> We have outbound transmission limits though not by default. Setting
> them sensibly is something of a black art. Obviously these will
> improve over time... and are more reasons that I agree with your
> relative cost assumptions except for the sync-time part.
>
>> Fingerprinting?
> Unavoidable, but should be made inconsequential by making transaction
> announcement more private independent of any of this. There are
> already _MANY_ ways to fingerprint nodes, please don't think that
> existing software has any real immunity here. We avoid adding new
> fingerprinting where we can... but they're very fingerprintable
> already. Transaction announcement privacy MUST be fixed.  I assume if
> any of this is worth doing, it will also be worth the increase in
> fingerprinting. And at least this proposal would 'only' create 255
> node classes.
>
>> SPV?
> As I pointed out above, Vorick's idea is totally compatible with
> committed filters, and the filters can even be shared like the blocks
> are.
>
>> [People who fail at math]
> The SNR on this list really sucks.  If you aren't spending a couple
> hours thinking about your responses or at least citing research that
> took days of effort or more then you are probably wasting people's
> time. Please respect the other users of the list.
>
>> Could there be a slider?
> Absolutely, I assume if Vorick's proposal were implemented that nodes
> would have the follow options: Pruned [UTXO + recent two weeks of
> blocks], 20%, 40%, 60%, 80%, 100% (archive).
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists•linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev

-- 
Zcash wallets made simple: https://github.com/Ayms/zcash-wallets
Bitcoin wallets made simple: https://github.com/Ayms/bitcoin-wallets
Get the torrent dynamic blocklist: http://peersm.com/getblocklist
Check the 10 M passwords list: http://peersm.com/findmyass
Anti-spies and private torrents, dynamic blocklist: http://torrent-live.org
Peersm : http://www.peersm.com
torrent-live: https://github.com/Ayms/torrent-live
node-Tor : https://www.github.com/Ayms/node-Tor
GitHub : https://www.github.com/Ayms



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [bitcoin-dev] Small Nodes: A Better Alternative to Pruned Nodes
  2017-04-21 20:38 ` Gregory Maxwell
  2017-04-23 16:27   ` Aymeric Vitte
@ 2017-05-03 14:03   ` Erik Aronesty
  2017-05-03 19:10     ` Natanael
  1 sibling, 1 reply; 24+ messages in thread
From: Erik Aronesty @ 2017-05-03 14:03 UTC (permalink / raw)
  To: Gregory Maxwell; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 483 bytes --]

> But as you've observed, the failure probabilities are rather high,
> especially if an active attacker targets nodes carrying less commonly
> available blocks.

Wouldn't the solution be for nodes to use whatever mechanism an attacker
uses to determine less commonly available blocks and choose to store a
random percentage of them as well as their deterministic random set?

IE X blocks end of chain (spv bootstrap), Y% deterministic random set,  Z%
patch/fill set to deter attacks

[-- Attachment #2: Type: text/html, Size: 592 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [bitcoin-dev] Small Nodes: A Better Alternative to Pruned Nodes
  2017-05-03 14:03   ` Erik Aronesty
@ 2017-05-03 19:10     ` Natanael
  2017-05-03 22:45       ` Aymeric Vitte
  0 siblings, 1 reply; 24+ messages in thread
From: Natanael @ 2017-05-03 19:10 UTC (permalink / raw)
  To: Erik Aronesty; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 887 bytes --]

Den 3 maj 2017 16:05 skrev "Erik Aronesty via bitcoin-dev" <
bitcoin-dev@lists•linuxfoundation.org>:

> But as you've observed, the failure probabilities are rather high,
> especially if an active attacker targets nodes carrying less commonly
> available blocks.

Wouldn't the solution be for nodes to use whatever mechanism an attacker
uses to determine less commonly available blocks and choose to store a
random percentage of them as well as their deterministic random set?

IE X blocks end of chain (spv bootstrap), Y% deterministic random set,  Z%
patch/fill set to deter attacks


Then he uses Sybil attacks to obscure what's actually rare and not. Even
proof of storage isn't enough, you need proof of INDEPENDENT storage, which
is essentially impossible, as well as a way of determining which nodes are
run by the same people (all the AWS nodes should essentially count as one).

[-- Attachment #2: Type: text/html, Size: 1635 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [bitcoin-dev] Small Nodes: A Better Alternative to Pruned Nodes
  2017-05-03 19:10     ` Natanael
@ 2017-05-03 22:45       ` Aymeric Vitte
  0 siblings, 0 replies; 24+ messages in thread
From: Aymeric Vitte @ 2017-05-03 22:45 UTC (permalink / raw)
  To: bitcoin-dev

[-- Attachment #1: Type: text/plain, Size: 2338 bytes --]



Le 03/05/2017 à 21:10, Natanael via bitcoin-dev a écrit :
>
> Den 3 maj 2017 16:05 skrev "Erik Aronesty via bitcoin-dev"
> <bitcoin-dev@lists•linuxfoundation.org
> <mailto:bitcoin-dev@lists•linuxfoundation.org>>:
>
>     > But as you've observed, the failure probabilities are rather high,
>     > especially if an active attacker targets nodes carrying less
>     commonly
>     > available blocks. 
>
>     Wouldn't the solution be for nodes to use whatever mechanism an
>     attacker uses to determine less commonly available blocks and
>     choose to store a random percentage of them as well as their
>     deterministic random set? 
>
>     IE X blocks end of chain (spv bootstrap), Y% deterministic random
>     set,  Z% patch/fill set to deter attacks
>
>
> Then he uses Sybil attacks to obscure what's actually rare and not.

> Even proof of storage isn't enough,you need proof of INDEPENDENT storage

Yes

> , which is essentially impossible

No, the bittorrent network is a good example

> , as well as a way of determining which nodes are run by the same
> people (all the AWS nodes should essentially count as one).

No, this one is impossible and you don't care in fact, as long as the
system forbids the nodes to position themselves where they like and can
check that the nodes are behaving correctly, same people's nodes/IPs
would then just do the job

And if you add to this a rewarding system that is not necessarily
profitable then you eliminate the incentive for sybil attacking the
network (like the "tip" proposal today) while motivating those that have
the resources to run full nodes, then increasing independence


>
>
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists•linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev

-- 
Zcash wallets made simple: https://github.com/Ayms/zcash-wallets
Bitcoin wallets made simple: https://github.com/Ayms/bitcoin-wallets
Get the torrent dynamic blocklist: http://peersm.com/getblocklist
Check the 10 M passwords list: http://peersm.com/findmyass
Anti-spies and private torrents, dynamic blocklist: http://torrent-live.org
Peersm : http://www.peersm.com
torrent-live: https://github.com/Ayms/torrent-live
node-Tor : https://www.github.com/Ayms/node-Tor
GitHub : https://www.github.com/Ayms


[-- Attachment #2: Type: text/html, Size: 5648 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2017-05-03 22:45 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-17  6:54 [bitcoin-dev] Small Nodes: A Better Alternative to Pruned Nodes David Vorick
2017-04-17  7:11 ` Danny Thorpe
2017-04-17  7:27   ` David Vorick
2017-04-20 15:50   ` Erik Aronesty
2017-04-20 23:42     ` Aymeric Vitte
2017-04-21 13:35   ` David Kaufman
2017-04-21 15:58     ` Leandro Coutinho
2017-04-17 10:14 ` Aymeric Vitte
2017-04-19 17:30   ` David Vorick
2017-04-20  9:46     ` Tom Zander
2017-04-20 20:32       ` Andrew Poelstra
2017-04-21  8:27         ` Tom Zander
2017-04-20 11:27     ` Aymeric Vitte
2017-04-18  7:43 ` Jonas Schnelli
2017-04-18 10:50 ` Tom Zander
2017-04-18 13:07   ` Tier Nolan
2017-04-18 23:19     ` Aymeric Vitte
2017-04-19  4:28       ` udevNull
2017-04-19 13:47         ` Angel Leon
2017-04-21 20:38 ` Gregory Maxwell
2017-04-23 16:27   ` Aymeric Vitte
2017-05-03 14:03   ` Erik Aronesty
2017-05-03 19:10     ` Natanael
2017-05-03 22:45       ` Aymeric Vitte

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox