[Bitcoin-development] Proposed additional options for pruned nodes

public inbox for bitcoindev@googlegroups.com
 help / color / mirror / Atom feed

* [Bitcoin-development] Proposed additional options for pruned nodes
       [not found] <CANJO25J1WRHtfQLVXUB2s_sjj39pTPWmixAcXNJ3t-5os8RPmQ@mail.gmail.com>
@ 2015-05-12 15:26 ` gabe appleton
  2015-05-12 16:05   ` Jeff Garzik
  0 siblings, 1 reply; 19+ messages in thread
From: gabe appleton @ 2015-05-12 15:26 UTC (permalink / raw)
  To: bitcoin-development

[-- Attachment #1: Type: text/plain, Size: 1415 bytes --]

Hi,

There's been a lot of talk in the rest of the community about how the 20MB
step would increase storage needs, and that switching to pruned nodes
(partially) would reduce network security. I think I may have a solution.

There could be a hybrid option in nodes. Selecting this would do the
following:
Flip the --no-wallet toggle
Select a section of the blockchain to store fully (percentage based,
possibly on hash % sections?)
Begin pruning all sections not included in 2
The idea is that you can implement it similar to how a Koorde is done, in
that the network will decide which sections it retrieves. So if the user
prompts it to store 50% of the blockchain, it would look at its peers, and
at their peers (if secure), and choose the least-occurring options from
them.

This would allow them to continue validating all transactions, and still
store a full copy, just distributed among many nodes. It should overall
have little impact on security (unless I'm mistaken), and it would
significantly reduce storage needs on a node.

It would also allow for a retroactive --max-size flag, where it will prune
until it is at the specified size, and continue to prune over time, while
keeping to the sections defined by the network.

What sort of side effects or network vulnerabilities would this introduce?
I know some said it wouldn't be Sybil resistant, but how would this be less
so than a fully pruned node?

[-- Attachment #2: Type: text/html, Size: 1552 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bitcoin-development] Proposed additional options for pruned nodes
  2015-05-12 15:26 ` [Bitcoin-development] Proposed additional options for pruned nodes gabe appleton
@ 2015-05-12 16:05   ` Jeff Garzik
  2015-05-12 16:56     ` gabe appleton
  2015-05-12 17:16     ` Peter Todd
  0 siblings, 2 replies; 19+ messages in thread
From: Jeff Garzik @ 2015-05-12 16:05 UTC (permalink / raw)
  To: gabe appleton; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 2449 bytes --]

A general assumption is that you will have a few archive nodes with the
full blockchain, and a majority of nodes are pruned, able to serve only the
tail of the chains.


On Tue, May 12, 2015 at 8:26 AM, gabe appleton <gappleto97@gmail•com> wrote:

> Hi,
>
> There's been a lot of talk in the rest of the community about how the 20MB
> step would increase storage needs, and that switching to pruned nodes
> (partially) would reduce network security. I think I may have a solution.
>
> There could be a hybrid option in nodes. Selecting this would do the
> following:
> Flip the --no-wallet toggle
> Select a section of the blockchain to store fully (percentage based,
> possibly on hash % sections?)
> Begin pruning all sections not included in 2
> The idea is that you can implement it similar to how a Koorde is done, in
> that the network will decide which sections it retrieves. So if the user
> prompts it to store 50% of the blockchain, it would look at its peers, and
> at their peers (if secure), and choose the least-occurring options from
> them.
>
> This would allow them to continue validating all transactions, and still
> store a full copy, just distributed among many nodes. It should overall
> have little impact on security (unless I'm mistaken), and it would
> significantly reduce storage needs on a node.
>
> It would also allow for a retroactive --max-size flag, where it will prune
> until it is at the specified size, and continue to prune over time, while
> keeping to the sections defined by the network.
>
> What sort of side effects or network vulnerabilities would this introduce?
> I know some said it wouldn't be Sybil resistant, but how would this be less
> so than a fully pruned node?
>
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> Bitcoin-development mailing list
> Bitcoin-development@lists•sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bitcoin-development
>
>


-- 
Jeff Garzik
Bitcoin core developer and open source evangelist
BitPay, Inc.      https://bitpay.com/

[-- Attachment #2: Type: text/html, Size: 3239 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bitcoin-development] Proposed additional options for pruned nodes
  2015-05-12 16:05   ` Jeff Garzik
@ 2015-05-12 16:56     ` gabe appleton
  2015-05-12 17:16     ` Peter Todd
  1 sibling, 0 replies; 19+ messages in thread
From: gabe appleton @ 2015-05-12 16:56 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 2745 bytes --]

Yes, but that just increases the incentive for partially-full nodes. It
would add to the assumed-small number of full nodes.

Or am I misunderstanding?

On Tue, May 12, 2015 at 12:05 PM, Jeff Garzik <jgarzik@bitpay•com> wrote:

> A general assumption is that you will have a few archive nodes with the
> full blockchain, and a majority of nodes are pruned, able to serve only the
> tail of the chains.
>
>
> On Tue, May 12, 2015 at 8:26 AM, gabe appleton <gappleto97@gmail•com>
> wrote:
>
>> Hi,
>>
>> There's been a lot of talk in the rest of the community about how the
>> 20MB step would increase storage needs, and that switching to pruned nodes
>> (partially) would reduce network security. I think I may have a solution.
>>
>> There could be a hybrid option in nodes. Selecting this would do the
>> following:
>> Flip the --no-wallet toggle
>> Select a section of the blockchain to store fully (percentage based,
>> possibly on hash % sections?)
>> Begin pruning all sections not included in 2
>> The idea is that you can implement it similar to how a Koorde is done, in
>> that the network will decide which sections it retrieves. So if the user
>> prompts it to store 50% of the blockchain, it would look at its peers, and
>> at their peers (if secure), and choose the least-occurring options from
>> them.
>>
>> This would allow them to continue validating all transactions, and still
>> store a full copy, just distributed among many nodes. It should overall
>> have little impact on security (unless I'm mistaken), and it would
>> significantly reduce storage needs on a node.
>>
>> It would also allow for a retroactive --max-size flag, where it will
>> prune until it is at the specified size, and continue to prune over time,
>> while keeping to the sections defined by the network.
>>
>> What sort of side effects or network vulnerabilities would this
>> introduce? I know some said it wouldn't be Sybil resistant, but how would
>> this be less so than a fully pruned node?
>>
>>
>> ------------------------------------------------------------------------------
>> One dashboard for servers and applications across Physical-Virtual-Cloud
>> Widest out-of-the-box monitoring support with 50+ applications
>> Performance metrics, stats and reports that give you Actionable Insights
>> Deep dive visibility with transaction tracing using APM Insight.
>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>> _______________________________________________
>> Bitcoin-development mailing list
>> Bitcoin-development@lists•sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/bitcoin-development
>>
>>
>
>
> --
> Jeff Garzik
> Bitcoin core developer and open source evangelist
> BitPay, Inc.      https://bitpay.com/
>

[-- Attachment #2: Type: text/html, Size: 3899 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bitcoin-development] Proposed additional options for pruned nodes
  2015-05-12 16:05   ` Jeff Garzik
  2015-05-12 16:56     ` gabe appleton
@ 2015-05-12 17:16     ` Peter Todd
  2015-05-12 18:23       ` Tier Nolan
  1 sibling, 1 reply; 19+ messages in thread
From: Peter Todd @ 2015-05-12 17:16 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 562 bytes --]

On Tue, May 12, 2015 at 09:05:44AM -0700, Jeff Garzik wrote:
> A general assumption is that you will have a few archive nodes with the
> full blockchain, and a majority of nodes are pruned, able to serve only the
> tail of the chains.

Hmm?

Lots of people are tossing around ideas for partial archival nodes that
would store a subset of blocks, such that collectively the whole
blockchain would be available even if no one node had the entire chain.

-- 
'peter'[:-1]@petertodd.org
0000000000000000156d2069eeebb3309455f526cfe50efbf8a85ec630df7f7c

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 650 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bitcoin-development] Proposed additional options for pruned nodes
  2015-05-12 17:16     ` Peter Todd
@ 2015-05-12 18:23       ` Tier Nolan
  2015-05-12 19:03         ` Gregory Maxwell
  0 siblings, 1 reply; 19+ messages in thread
From: Tier Nolan @ 2015-05-12 18:23 UTC (permalink / raw)
  Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 2361 bytes --]

On Tue, May 12, 2015 at 6:16 PM, Peter Todd <pete@petertodd•org> wrote:

>
> Lots of people are tossing around ideas for partial archival nodes that
> would store a subset of blocks, such that collectively the whole
> blockchain would be available even if no one node had the entire chain.
>

A compact way to describe which blocks are stored helps to mitigate against
fingerprint attacks.

It also means that a node could compactly indicate which blocks it stores
with service bits.

The node could pick two numbers

W = window = a power of 2
P = position = random value less than W

The node would store all blocks with a height of P mod W.  The block hash
could be used too.

This has the nice feature that the node can throw away half of its data and
still represent what is stored.

W_new = W * 2
P_new = (random_bool()) ? P + W/2 : P;

Half of the stored blocks would match P_new mod W_new and the other half
could be deleted.  This means that the store would use up between 50% and
100% of the allocated size.

Another benefit is that it increases the probability that at least someone
has every block.

If N nodes each store 1% of the blocks, then the odds of a block being
stored is pow(0.99, N).  For 1000 nodes, that gives odds of 1 in 23,164
that a block will be missing.  That means that around 13 out of 300,000
blocks would be missing.  There would likely be more nodes than that, and
also storage nodes, so it is not a major risk.

If everyone is storing 1% of blocks, then they would set W to 128.  As long
as all of the 128 buckets is covered by some nodes, then all blocks are
stored.  With 1000 nodes, that gives odds of 0.6% that at least one bucket
will be missed.  That is better than around 13 blocks being missing.

Nodes could inform peers of their W and P parameters on connection.  The
version message could be amended or a "getparams" message of some kind
could be added.

W could be encoded with 4 bits and P could be encoded with 16 bits, for 20
in total.  W = 1 << bits[19:16] and P = bits[14:0].  That gives a maximum W
of 32768, which is likely to many bits for P.

Initial download would be harder, since new nodes would have to connect to
at least 100 different nodes.  They could download from random nodes, and
just download the ones they are missing from storage nodes.  Even storage
nodes could have a range of W values.

[-- Attachment #2: Type: text/html, Size: 3273 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bitcoin-development] Proposed additional options for pruned nodes
  2015-05-12 18:23       ` Tier Nolan
@ 2015-05-12 19:03         ` Gregory Maxwell
  2015-05-12 19:24           ` gabe appleton
                             ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Gregory Maxwell @ 2015-05-12 19:03 UTC (permalink / raw)
  To: Tier Nolan; +Cc: Bitcoin Dev

It's a little frustrating to see this just repeated without even
paying attention to the desirable characteristics from the prior
discussions.

Summarizing from memory:

(0) Block coverage should have locality; historical blocks are
(almost) always needed in contiguous ranges.   Having random peers
with totally random blocks would be horrific for performance; as you'd
have to hunt down a working peer and make a connection for each block
with high probability.

(1) Block storage on nodes with a fraction of the history should not
depend on believing random peers; because listening to peers can
easily create attacks (e.g. someone could break the network; by
convincing nodes to become unbalanced) and not useful-- it's not like
the blockchain is substantially different for anyone; if you're to the
point of needing to know coverage to fill then something is wrong.
Gaps would be handled by archive nodes, so there is no reason to
increase vulnerability by doing anything but behaving uniformly.

(2) The decision to contact a node should need O(1) communications,
not just because of the delay of chasing around just to find who has
someone; but because that chasing process usually makes the process
_highly_ sybil vulnerable.

(3) The expression of what blocks a node has should be compact (e.g.
not a dense list of blocks) so it can be rumored efficiently.

(4) Figuring out what block (ranges) a peer has given should be
computationally efficient.

(5) The communication about what blocks a node has should be compact.

(6) The coverage created by the network should be uniform, and should
remain uniform as the blockchain grows; ideally it you shouldn't need
to update your state to know what blocks a peer will store in the
future, assuming that it doesn't change the amount of data its
planning to use. (What Tier Nolan proposes sounds like it fails this
point)

(7) Growth of the blockchain shouldn't cause much (or any) need to
refetch old blocks.

I've previously proposed schemes which come close but fail one of the above.

(e.g. a scheme based on reservoir sampling that gives uniform
selection of contiguous ranges, communicating only 64 bits of data to
know what blocks a node claims to have, remaining totally uniform as
the chain grows, without any need to refetch -- but needs O(height)
work to figure out what blocks a peer has from the data it
communicated.;   or another scheme based on consistent hashes that has
log(height) computation; but sometimes may result in a node needing to
go refetch an old block range it previously didn't store-- creating
re-balancing traffic.)

So far something that meets all those criteria (and/or whatever ones
I'm not remembering) has not been discovered; but I don't really think
much time has been spent on it. I think its very likely possible.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bitcoin-development] Proposed additional options for pruned nodes
  2015-05-12 19:03         ` Gregory Maxwell
@ 2015-05-12 19:24           ` gabe appleton
  2015-05-12 19:38             ` Jeff Garzik
  2015-05-12 22:00           ` [Bitcoin-development] " Tier Nolan
  2015-05-13  5:19           ` Daniel Kraft
  2 siblings, 1 reply; 19+ messages in thread
From: gabe appleton @ 2015-05-12 19:24 UTC (permalink / raw)
  To: Gregory Maxwell; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 4070 bytes --]

0, 1, 3, 4, 5, 6 can be solved by looking at chunks chronologically. Ie,
give the signed (by sender) hash of the first and last block in your range.
This is less data dense than the idea above, but it might work better.

That said, this is likely a less secure way to do it. To improve upon that,
a node could request a block of random height within that range and verify
it, but that violates point 2. And the scheme in itself definitely violates
point 7.
On May 12, 2015 3:07 PM, "Gregory Maxwell" <gmaxwell@gmail•com> wrote:

> It's a little frustrating to see this just repeated without even
> paying attention to the desirable characteristics from the prior
> discussions.
>
> Summarizing from memory:
>
> (0) Block coverage should have locality; historical blocks are
> (almost) always needed in contiguous ranges.   Having random peers
> with totally random blocks would be horrific for performance; as you'd
> have to hunt down a working peer and make a connection for each block
> with high probability.
>
> (1) Block storage on nodes with a fraction of the history should not
> depend on believing random peers; because listening to peers can
> easily create attacks (e.g. someone could break the network; by
> convincing nodes to become unbalanced) and not useful-- it's not like
> the blockchain is substantially different for anyone; if you're to the
> point of needing to know coverage to fill then something is wrong.
> Gaps would be handled by archive nodes, so there is no reason to
> increase vulnerability by doing anything but behaving uniformly.
>
> (2) The decision to contact a node should need O(1) communications,
> not just because of the delay of chasing around just to find who has
> someone; but because that chasing process usually makes the process
> _highly_ sybil vulnerable.
>
> (3) The expression of what blocks a node has should be compact (e.g.
> not a dense list of blocks) so it can be rumored efficiently.
>
> (4) Figuring out what block (ranges) a peer has given should be
> computationally efficient.
>
> (5) The communication about what blocks a node has should be compact.
>
> (6) The coverage created by the network should be uniform, and should
> remain uniform as the blockchain grows; ideally it you shouldn't need
> to update your state to know what blocks a peer will store in the
> future, assuming that it doesn't change the amount of data its
> planning to use. (What Tier Nolan proposes sounds like it fails this
> point)
>
> (7) Growth of the blockchain shouldn't cause much (or any) need to
> refetch old blocks.
>
> I've previously proposed schemes which come close but fail one of the
> above.
>
> (e.g. a scheme based on reservoir sampling that gives uniform
> selection of contiguous ranges, communicating only 64 bits of data to
> know what blocks a node claims to have, remaining totally uniform as
> the chain grows, without any need to refetch -- but needs O(height)
> work to figure out what blocks a peer has from the data it
> communicated.;   or another scheme based on consistent hashes that has
> log(height) computation; but sometimes may result in a node needing to
> go refetch an old block range it previously didn't store-- creating
> re-balancing traffic.)
>
> So far something that meets all those criteria (and/or whatever ones
> I'm not remembering) has not been discovered; but I don't really think
> much time has been spent on it. I think its very likely possible.
>
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> Bitcoin-development mailing list
> Bitcoin-development@lists•sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bitcoin-development
>

[-- Attachment #2: Type: text/html, Size: 4840 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bitcoin-development] Proposed additional options for pruned nodes
  2015-05-12 19:24           ` gabe appleton
@ 2015-05-12 19:38             ` Jeff Garzik
  2015-05-12 19:43               ` gabe appleton
                                 ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Jeff Garzik @ 2015-05-12 19:38 UTC (permalink / raw)
  To: gabe appleton; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 5183 bytes --]

One general problem is that security is weakened when an attacker can DoS a
small part of the chain by DoS'ing a small number of nodes - yet the impact
is a network-wide DoS because nobody can complete a sync.


On Tue, May 12, 2015 at 12:24 PM, gabe appleton <gappleto97@gmail•com>
wrote:

> 0, 1, 3, 4, 5, 6 can be solved by looking at chunks chronologically. Ie,
> give the signed (by sender) hash of the first and last block in your range.
> This is less data dense than the idea above, but it might work better.
>
> That said, this is likely a less secure way to do it. To improve upon
> that, a node could request a block of random height within that range and
> verify it, but that violates point 2. And the scheme in itself definitely
> violates point 7.
> On May 12, 2015 3:07 PM, "Gregory Maxwell" <gmaxwell@gmail•com> wrote:
>
>> It's a little frustrating to see this just repeated without even
>> paying attention to the desirable characteristics from the prior
>> discussions.
>>
>> Summarizing from memory:
>>
>> (0) Block coverage should have locality; historical blocks are
>> (almost) always needed in contiguous ranges.   Having random peers
>> with totally random blocks would be horrific for performance; as you'd
>> have to hunt down a working peer and make a connection for each block
>> with high probability.
>>
>> (1) Block storage on nodes with a fraction of the history should not
>> depend on believing random peers; because listening to peers can
>> easily create attacks (e.g. someone could break the network; by
>> convincing nodes to become unbalanced) and not useful-- it's not like
>> the blockchain is substantially different for anyone; if you're to the
>> point of needing to know coverage to fill then something is wrong.
>> Gaps would be handled by archive nodes, so there is no reason to
>> increase vulnerability by doing anything but behaving uniformly.
>>
>> (2) The decision to contact a node should need O(1) communications,
>> not just because of the delay of chasing around just to find who has
>> someone; but because that chasing process usually makes the process
>> _highly_ sybil vulnerable.
>>
>> (3) The expression of what blocks a node has should be compact (e.g.
>> not a dense list of blocks) so it can be rumored efficiently.
>>
>> (4) Figuring out what block (ranges) a peer has given should be
>> computationally efficient.
>>
>> (5) The communication about what blocks a node has should be compact.
>>
>> (6) The coverage created by the network should be uniform, and should
>> remain uniform as the blockchain grows; ideally it you shouldn't need
>> to update your state to know what blocks a peer will store in the
>> future, assuming that it doesn't change the amount of data its
>> planning to use. (What Tier Nolan proposes sounds like it fails this
>> point)
>>
>> (7) Growth of the blockchain shouldn't cause much (or any) need to
>> refetch old blocks.
>>
>> I've previously proposed schemes which come close but fail one of the
>> above.
>>
>> (e.g. a scheme based on reservoir sampling that gives uniform
>> selection of contiguous ranges, communicating only 64 bits of data to
>> know what blocks a node claims to have, remaining totally uniform as
>> the chain grows, without any need to refetch -- but needs O(height)
>> work to figure out what blocks a peer has from the data it
>> communicated.;   or another scheme based on consistent hashes that has
>> log(height) computation; but sometimes may result in a node needing to
>> go refetch an old block range it previously didn't store-- creating
>> re-balancing traffic.)
>>
>> So far something that meets all those criteria (and/or whatever ones
>> I'm not remembering) has not been discovered; but I don't really think
>> much time has been spent on it. I think its very likely possible.
>>
>>
>> ------------------------------------------------------------------------------
>> One dashboard for servers and applications across Physical-Virtual-Cloud
>> Widest out-of-the-box monitoring support with 50+ applications
>> Performance metrics, stats and reports that give you Actionable Insights
>> Deep dive visibility with transaction tracing using APM Insight.
>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>> _______________________________________________
>> Bitcoin-development mailing list
>> Bitcoin-development@lists•sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/bitcoin-development
>>
>
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> Bitcoin-development mailing list
> Bitcoin-development@lists•sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bitcoin-development
>
>


-- 
Jeff Garzik
Bitcoin core developer and open source evangelist
BitPay, Inc.      https://bitpay.com/

[-- Attachment #2: Type: text/html, Size: 6654 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bitcoin-development] Proposed additional options for pruned nodes
  2015-05-12 19:38             ` Jeff Garzik
@ 2015-05-12 19:43               ` gabe appleton
  2015-05-12 21:30                 ` [Bitcoin-development] [Bulk] " gb
  2015-05-12 20:02               ` [Bitcoin-development] " Gregory Maxwell
       [not found]               ` <CAFVoEQTdmCSRAy3u26q5oHdfvFEytZDBfQb_fs_qttK15fiRmg@mail.gmail.com>
  2 siblings, 1 reply; 19+ messages in thread
From: gabe appleton @ 2015-05-12 19:43 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 5627 bytes --]

Yet this holds true in our current assumptions of the network as well: that
it will become a collection of pruned nodes with a few storage nodes.

A hybrid option makes this better, because it spreads the risk, rather than
concentrating it in full nodes.
On May 12, 2015 3:38 PM, "Jeff Garzik" <jgarzik@bitpay•com> wrote:

> One general problem is that security is weakened when an attacker can DoS
> a small part of the chain by DoS'ing a small number of nodes - yet the
> impact is a network-wide DoS because nobody can complete a sync.
>
>
> On Tue, May 12, 2015 at 12:24 PM, gabe appleton <gappleto97@gmail•com>
> wrote:
>
>> 0, 1, 3, 4, 5, 6 can be solved by looking at chunks chronologically. Ie,
>> give the signed (by sender) hash of the first and last block in your range.
>> This is less data dense than the idea above, but it might work better.
>>
>> That said, this is likely a less secure way to do it. To improve upon
>> that, a node could request a block of random height within that range and
>> verify it, but that violates point 2. And the scheme in itself definitely
>> violates point 7.
>> On May 12, 2015 3:07 PM, "Gregory Maxwell" <gmaxwell@gmail•com> wrote:
>>
>>> It's a little frustrating to see this just repeated without even
>>> paying attention to the desirable characteristics from the prior
>>> discussions.
>>>
>>> Summarizing from memory:
>>>
>>> (0) Block coverage should have locality; historical blocks are
>>> (almost) always needed in contiguous ranges.   Having random peers
>>> with totally random blocks would be horrific for performance; as you'd
>>> have to hunt down a working peer and make a connection for each block
>>> with high probability.
>>>
>>> (1) Block storage on nodes with a fraction of the history should not
>>> depend on believing random peers; because listening to peers can
>>> easily create attacks (e.g. someone could break the network; by
>>> convincing nodes to become unbalanced) and not useful-- it's not like
>>> the blockchain is substantially different for anyone; if you're to the
>>> point of needing to know coverage to fill then something is wrong.
>>> Gaps would be handled by archive nodes, so there is no reason to
>>> increase vulnerability by doing anything but behaving uniformly.
>>>
>>> (2) The decision to contact a node should need O(1) communications,
>>> not just because of the delay of chasing around just to find who has
>>> someone; but because that chasing process usually makes the process
>>> _highly_ sybil vulnerable.
>>>
>>> (3) The expression of what blocks a node has should be compact (e.g.
>>> not a dense list of blocks) so it can be rumored efficiently.
>>>
>>> (4) Figuring out what block (ranges) a peer has given should be
>>> computationally efficient.
>>>
>>> (5) The communication about what blocks a node has should be compact.
>>>
>>> (6) The coverage created by the network should be uniform, and should
>>> remain uniform as the blockchain grows; ideally it you shouldn't need
>>> to update your state to know what blocks a peer will store in the
>>> future, assuming that it doesn't change the amount of data its
>>> planning to use. (What Tier Nolan proposes sounds like it fails this
>>> point)
>>>
>>> (7) Growth of the blockchain shouldn't cause much (or any) need to
>>> refetch old blocks.
>>>
>>> I've previously proposed schemes which come close but fail one of the
>>> above.
>>>
>>> (e.g. a scheme based on reservoir sampling that gives uniform
>>> selection of contiguous ranges, communicating only 64 bits of data to
>>> know what blocks a node claims to have, remaining totally uniform as
>>> the chain grows, without any need to refetch -- but needs O(height)
>>> work to figure out what blocks a peer has from the data it
>>> communicated.;   or another scheme based on consistent hashes that has
>>> log(height) computation; but sometimes may result in a node needing to
>>> go refetch an old block range it previously didn't store-- creating
>>> re-balancing traffic.)
>>>
>>> So far something that meets all those criteria (and/or whatever ones
>>> I'm not remembering) has not been discovered; but I don't really think
>>> much time has been spent on it. I think its very likely possible.
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> One dashboard for servers and applications across Physical-Virtual-Cloud
>>> Widest out-of-the-box monitoring support with 50+ applications
>>> Performance metrics, stats and reports that give you Actionable Insights
>>> Deep dive visibility with transaction tracing using APM Insight.
>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>> _______________________________________________
>>> Bitcoin-development mailing list
>>> Bitcoin-development@lists•sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/bitcoin-development
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> One dashboard for servers and applications across Physical-Virtual-Cloud
>> Widest out-of-the-box monitoring support with 50+ applications
>> Performance metrics, stats and reports that give you Actionable Insights
>> Deep dive visibility with transaction tracing using APM Insight.
>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>> _______________________________________________
>> Bitcoin-development mailing list
>> Bitcoin-development@lists•sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/bitcoin-development
>>
>>
>
>
> --
> Jeff Garzik
> Bitcoin core developer and open source evangelist
> BitPay, Inc.      https://bitpay.com/
>

[-- Attachment #2: Type: text/html, Size: 7205 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bitcoin-development] Proposed additional options for pruned nodes
  2015-05-12 19:38             ` Jeff Garzik
  2015-05-12 19:43               ` gabe appleton
@ 2015-05-12 20:02               ` Gregory Maxwell
  2015-05-12 20:10                 ` Jeff Garzik
       [not found]               ` <CAFVoEQTdmCSRAy3u26q5oHdfvFEytZDBfQb_fs_qttK15fiRmg@mail.gmail.com>
  2 siblings, 1 reply; 19+ messages in thread
From: Gregory Maxwell @ 2015-05-12 20:02 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Bitcoin Dev

On Tue, May 12, 2015 at 7:38 PM, Jeff Garzik <jgarzik@bitpay•com> wrote:
> One general problem is that security is weakened when an attacker can DoS a
> small part of the chain by DoS'ing a small number of nodes - yet the impact
> is a network-wide DoS because nobody can complete a sync.

It might be more interesting to think of that attack as a bandwidth
exhaustion DOS attack on the archive nodes... if you can't get a copy
without them, thats where you'll go.

So the question arises: does the option make some nodes that would
have been archive not be? Probably some-- but would it do so much that
it would offset the gain of additional copies of the data when those
attacks are not going no. I suspect not.

It's also useful to give people incremental ways to participate even
when they can't swollow the whole pill; or choose to provide the
resource thats cheap for them to provide.  In particular, if there is
only two kinds of full nodes-- archive and pruned; then the archive
nodes take both a huge disk and bandwidth cost; where as if there are
fractional then archives take low(er) bandwidth unless the fractionals
get DOS attacked.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bitcoin-development] Proposed additional options for pruned nodes
  2015-05-12 20:02               ` [Bitcoin-development] " Gregory Maxwell
@ 2015-05-12 20:10                 ` Jeff Garzik
  2015-05-12 20:41                   ` gabe appleton
  2015-05-12 20:47                   ` Gregory Maxwell
  0 siblings, 2 replies; 19+ messages in thread
From: Jeff Garzik @ 2015-05-12 20:10 UTC (permalink / raw)
  To: Gregory Maxwell; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 1944 bytes --]

True.  Part of the issue rests on the block sync horizon/cliff.  There is a
value X which is the average number of blocks the 90th percentile of nodes
need in order to sync.  It is sufficient for the [semi-]pruned nodes to
keep X blocks, after which nodes must fall back to archive nodes for older
data.

There is simply far, far more demand for recent blocks, and the demand for
old blocks very rapidly falls off.

There was even a more radical suggestion years ago - refuse to sync if too
old (>2 weeks?), and force the user to download ancient data via torrent.



On Tue, May 12, 2015 at 1:02 PM, Gregory Maxwell <gmaxwell@gmail•com> wrote:

> On Tue, May 12, 2015 at 7:38 PM, Jeff Garzik <jgarzik@bitpay•com> wrote:
> > One general problem is that security is weakened when an attacker can
> DoS a
> > small part of the chain by DoS'ing a small number of nodes - yet the
> impact
> > is a network-wide DoS because nobody can complete a sync.
>
> It might be more interesting to think of that attack as a bandwidth
> exhaustion DOS attack on the archive nodes... if you can't get a copy
> without them, thats where you'll go.
>
> So the question arises: does the option make some nodes that would
> have been archive not be? Probably some-- but would it do so much that
> it would offset the gain of additional copies of the data when those
> attacks are not going no. I suspect not.
>
> It's also useful to give people incremental ways to participate even
> when they can't swollow the whole pill; or choose to provide the
> resource thats cheap for them to provide.  In particular, if there is
> only two kinds of full nodes-- archive and pruned; then the archive
> nodes take both a huge disk and bandwidth cost; where as if there are
> fractional then archives take low(er) bandwidth unless the fractionals
> get DOS attacked.
>



-- 
Jeff Garzik
Bitcoin core developer and open source evangelist
BitPay, Inc.      https://bitpay.com/

[-- Attachment #2: Type: text/html, Size: 2607 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bitcoin-development] Proposed additional options for pruned nodes
  2015-05-12 20:10                 ` Jeff Garzik
@ 2015-05-12 20:41                   ` gabe appleton
  2015-05-12 20:47                   ` Gregory Maxwell
  1 sibling, 0 replies; 19+ messages in thread
From: gabe appleton @ 2015-05-12 20:41 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 2355 bytes --]

I suppose this begs two questions:

1) why not have a partial archive store the most recent X% of the
blockchain by default?

2) why not include some sort of torrent in QT, to mitigate this risk? I
don't think this is necessarily a good idea, but I'd like to hear the
reasoning.
On May 12, 2015 4:11 PM, "Jeff Garzik" <jgarzik@bitpay•com> wrote:

> True.  Part of the issue rests on the block sync horizon/cliff.  There is
> a value X which is the average number of blocks the 90th percentile of
> nodes need in order to sync.  It is sufficient for the [semi-]pruned nodes
> to keep X blocks, after which nodes must fall back to archive nodes for
> older data.
>
> There is simply far, far more demand for recent blocks, and the demand for
> old blocks very rapidly falls off.
>
> There was even a more radical suggestion years ago - refuse to sync if too
> old (>2 weeks?), and force the user to download ancient data via torrent.
>
>
>
> On Tue, May 12, 2015 at 1:02 PM, Gregory Maxwell <gmaxwell@gmail•com>
> wrote:
>
>> On Tue, May 12, 2015 at 7:38 PM, Jeff Garzik <jgarzik@bitpay•com> wrote:
>> > One general problem is that security is weakened when an attacker can
>> DoS a
>> > small part of the chain by DoS'ing a small number of nodes - yet the
>> impact
>> > is a network-wide DoS because nobody can complete a sync.
>>
>> It might be more interesting to think of that attack as a bandwidth
>> exhaustion DOS attack on the archive nodes... if you can't get a copy
>> without them, thats where you'll go.
>>
>> So the question arises: does the option make some nodes that would
>> have been archive not be? Probably some-- but would it do so much that
>> it would offset the gain of additional copies of the data when those
>> attacks are not going no. I suspect not.
>>
>> It's also useful to give people incremental ways to participate even
>> when they can't swollow the whole pill; or choose to provide the
>> resource thats cheap for them to provide.  In particular, if there is
>> only two kinds of full nodes-- archive and pruned; then the archive
>> nodes take both a huge disk and bandwidth cost; where as if there are
>> fractional then archives take low(er) bandwidth unless the fractionals
>> get DOS attacked.
>>
>
>
>
> --
> Jeff Garzik
> Bitcoin core developer and open source evangelist
> BitPay, Inc.      https://bitpay.com/
>

[-- Attachment #2: Type: text/html, Size: 3223 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bitcoin-development] Proposed additional options for pruned nodes
  2015-05-12 20:10                 ` Jeff Garzik
  2015-05-12 20:41                   ` gabe appleton
@ 2015-05-12 20:47                   ` Gregory Maxwell
  1 sibling, 0 replies; 19+ messages in thread
From: Gregory Maxwell @ 2015-05-12 20:47 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Bitcoin Dev

On Tue, May 12, 2015 at 8:10 PM, Jeff Garzik <jgarzik@bitpay•com> wrote:
> True.  Part of the issue rests on the block sync horizon/cliff.  There is a
> value X which is the average number of blocks the 90th percentile of nodes
> need in order to sync.  It is sufficient for the [semi-]pruned nodes to keep
> X blocks, after which nodes must fall back to archive nodes for older data.

Prior discussion had things like "the definition of pruned means you
have and will serve at least the last 288 from your tip" (which is
what I put in the pruned service bip text); and another flag for "I
have at least the last 2016".  (2016 should be reevaluated-- it was
just a round number near where sipa's old data showed the fetch
probability flatlined.

But that data was old,  but what it showed that the probability of a
block being fetched vs depth looked like a exponential drop-off (I
think with a 50% at 3-ish days); plus a constant low probability.
Which is probably what we should have expected.

> There was even a more radical suggestion years ago - refuse to sync if too
> old (>2 weeks?), and force the user to download ancient data via torrent.

I'm not fond of this; it makes the system dependent on centralized
services (e.g. trackers and sources of torrents). A torrent also
cannot very efficiently handle fractional copies; cannot efficiently
grow over time. Bitcoin should be complete-- plus, many nodes already
have the data.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bitcoin-development] Fwd: Proposed additional options for pruned nodes
       [not found]                   ` <CAJHLa0MYSpVBD4VE65LVbADb2daOvE=N83G8F_zDSHy3AQ5DAQ@mail.gmail.com>
@ 2015-05-12 21:17                     ` Adam Weiss
  0 siblings, 0 replies; 19+ messages in thread
From: Adam Weiss @ 2015-05-12 21:17 UTC (permalink / raw)
  To: bitcoin-development

[-- Attachment #1: Type: text/plain, Size: 696 bytes --]

FYI on behalf of jgarzik...

---------- Forwarded message ----------
From: Jeff Garzik <jgarzik@bitpay•com>
Date: Tue, May 12, 2015 at 4:48 PM
Subject: Re: [Bitcoin-development] Proposed additional options for pruned
nodes
To: Adam Weiss <adam@signal11•com>


Maybe you could forward my response to the list as an FYI?


On Tue, May 12, 2015 at 12:43 PM, Jeff Garzik <jgarzik@bitpay•com> wrote:

> You are the 12th person to report this.  It is SF, not bitpay, rewriting
> email headers and breaking authentication.
>
>
> On Tue, May 12, 2015 at 12:40 PM, Adam Weiss <adam@signal11•com> wrote:
>
>> fyi, your email to bitcoin-dev is still generating google spam warnings...
>>
>> --adam
>>
>>
>>

[-- Attachment #2: Type: text/html, Size: 1734 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bitcoin-development] [Bulk] Re: Proposed additional options for pruned nodes
  2015-05-12 19:43               ` gabe appleton
@ 2015-05-12 21:30                 ` gb
  0 siblings, 0 replies; 19+ messages in thread
From: gb @ 2015-05-12 21:30 UTC (permalink / raw)
  To: gabe appleton; +Cc: Bitcoin Dev

This seems like a good place to add in an idea I had about
partially-connected nodes that are able to throttle bandwidth demands.
While we will be having partial-blockchain nodes with a spectrum of
storage options the requirement to be connected is somewhat binary, I
think many users manually throttle by turning nodes on/off already with
a minimum to just keep the chain up to date. A throttling option would
leverage on bitcoin's asychronous design to reduce bandwidth demands for
weaker nodes.

So throttling to allow for a spectrum of bandwidth connectivity:

1) an option for the user -throttle=XXX that would allow the user to
specify a desirable total bandwidth XXX in Gbytes/day the bitcoin client
can use.

2) the client reduces the number of continuous connections, transaction
or block relaying to achieve the desired throttling rate

3) it could do this by being partially connected throughout the duty
cycle or cycling the node on/off for a percentage of a 24(?) hr period

4) have an auto setting where some smart traffic management 'just takes
care of it' and manual settings that can be user configured

5) reduces minimum requirement for any 24(?) hr period it has received a
full copy of all blocks to remain fully-validating

Not sure if anyone has bought such an idea forward or if there are
obvious holes, so pre-emptive apologies for time-wasting if so.

On Tue, 2015-05-12 at 15:43 -0400, gabe appleton wrote:
> Yet this holds true in our current assumptions of the network as well:
> that it will become a collection of pruned nodes with a few storage
> nodes. 
> 
> A hybrid option makes this better, because it spreads the risk, rather
> than concentrating it in full nodes. 
> 
> On May 12, 2015 3:38 PM, "Jeff Garzik" <jgarzik@bitpay•com> wrote:
>         One general problem is that security is weakened when an
>         attacker can DoS a small part of the chain by DoS'ing a small
>         number of nodes - yet the impact is a network-wide DoS because
>         nobody can complete a sync.
>         
>         
>         
>         On Tue, May 12, 2015 at 12:24 PM, gabe appleton
>         <gappleto97@gmail•com> wrote:
>                 0, 1, 3, 4, 5, 6 can be solved by looking at chunks
>                 chronologically. Ie, give the signed (by sender) hash
>                 of the first and last block in your range. This is
>                 less data dense than the idea above, but it might work
>                 better. 
>                 
>                 That said, this is likely a less secure way to do it.
>                 To improve upon that, a node could request a block of
>                 random height within that range and verify it, but
>                 that violates point 2. And the scheme in itself
>                 definitely violates point 7.
>                 
>                 On May 12, 2015 3:07 PM, "Gregory Maxwell"
>                 <gmaxwell@gmail•com> wrote:
>                         It's a little frustrating to see this just
>                         repeated without even
>                         paying attention to the desirable
>                         characteristics from the prior
>                         discussions.
>                         
>                         Summarizing from memory:
>                         
>                         (0) Block coverage should have locality;
>                         historical blocks are
>                         (almost) always needed in contiguous ranges.
>                          Having random peers
>                         with totally random blocks would be horrific
>                         for performance; as you'd
>                         have to hunt down a working peer and make a
>                         connection for each block
>                         with high probability.
>                         
>                         (1) Block storage on nodes with a fraction of
>                         the history should not
>                         depend on believing random peers; because
>                         listening to peers can
>                         easily create attacks (e.g. someone could
>                         break the network; by
>                         convincing nodes to become unbalanced) and not
>                         useful-- it's not like
>                         the blockchain is substantially different for
>                         anyone; if you're to the
>                         point of needing to know coverage to fill then
>                         something is wrong.
>                         Gaps would be handled by archive nodes, so
>                         there is no reason to
>                         increase vulnerability by doing anything but
>                         behaving uniformly.
>                         
>                         (2) The decision to contact a node should need
>                         O(1) communications,
>                         not just because of the delay of chasing
>                         around just to find who has
>                         someone; but because that chasing process
>                         usually makes the process
>                         _highly_ sybil vulnerable.
>                         
>                         (3) The expression of what blocks a node has
>                         should be compact (e.g.
>                         not a dense list of blocks) so it can be
>                         rumored efficiently.
>                         
>                         (4) Figuring out what block (ranges) a peer
>                         has given should be
>                         computationally efficient.
>                         
>                         (5) The communication about what blocks a node
>                         has should be compact.
>                         
>                         (6) The coverage created by the network should
>                         be uniform, and should
>                         remain uniform as the blockchain grows;
>                         ideally it you shouldn't need
>                         to update your state to know what blocks a
>                         peer will store in the
>                         future, assuming that it doesn't change the
>                         amount of data its
>                         planning to use. (What Tier Nolan proposes
>                         sounds like it fails this
>                         point)
>                         
>                         (7) Growth of the blockchain shouldn't cause
>                         much (or any) need to
>                         refetch old blocks.
>                         
>                         I've previously proposed schemes which come
>                         close but fail one of the above.
>                         
>                         (e.g. a scheme based on reservoir sampling
>                         that gives uniform
>                         selection of contiguous ranges, communicating
>                         only 64 bits of data to
>                         know what blocks a node claims to have,
>                         remaining totally uniform as
>                         the chain grows, without any need to refetch
>                         -- but needs O(height)
>                         work to figure out what blocks a peer has from
>                         the data it
>                         communicated.;   or another scheme based on
>                         consistent hashes that has
>                         log(height) computation; but sometimes may
>                         result in a node needing to
>                         go refetch an old block range it previously
>                         didn't store-- creating
>                         re-balancing traffic.)
>                         
>                         So far something that meets all those criteria
>                         (and/or whatever ones
>                         I'm not remembering) has not been discovered;
>                         but I don't really think
>                         much time has been spent on it. I think its
>                         very likely possible.
>                         
>                         ------------------------------------------------------------------------------
>                         One dashboard for servers and applications
>                         across Physical-Virtual-Cloud
>                         Widest out-of-the-box monitoring support with
>                         50+ applications
>                         Performance metrics, stats and reports that
>                         give you Actionable Insights
>                         Deep dive visibility with transaction tracing
>                         using APM Insight.
>                         http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>                         _______________________________________________
>                         Bitcoin-development mailing list
>                         Bitcoin-development@lists•sourceforge.net
>                         https://lists.sourceforge.net/lists/listinfo/bitcoin-development
>                 
>                 ------------------------------------------------------------------------------
>                 One dashboard for servers and applications across
>                 Physical-Virtual-Cloud
>                 Widest out-of-the-box monitoring support with 50+
>                 applications
>                 Performance metrics, stats and reports that give you
>                 Actionable Insights
>                 Deep dive visibility with transaction tracing using
>                 APM Insight.
>                 http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>                 _______________________________________________
>                 Bitcoin-development mailing list
>                 Bitcoin-development@lists•sourceforge.net
>                 https://lists.sourceforge.net/lists/listinfo/bitcoin-development
>                 
>         
>         
>         
>         
>         -- 
>         Jeff Garzik
>         Bitcoin core developer and open source evangelist
>         BitPay, Inc.      https://bitpay.com/
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud 
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________ Bitcoin-development mailing list Bitcoin-development@lists•sourceforge.net https://lists.sourceforge.net/lists/listinfo/bitcoin-development





^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bitcoin-development] Proposed additional options for pruned nodes
  2015-05-12 19:03         ` Gregory Maxwell
  2015-05-12 19:24           ` gabe appleton
@ 2015-05-12 22:00           ` Tier Nolan
  2015-05-12 22:09             ` gabe appleton
  2015-05-13  5:19           ` Daniel Kraft
  2 siblings, 1 reply; 19+ messages in thread
From: Tier Nolan @ 2015-05-12 22:00 UTC (permalink / raw)
  Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 3512 bytes --]

On Tue, May 12, 2015 at 8:03 PM, Gregory Maxwell <gmaxwell@gmail•com> wrote:

>
> (0) Block coverage should have locality; historical blocks are
> (almost) always needed in contiguous ranges.   Having random peers
> with totally random blocks would be horrific for performance; as you'd
> have to hunt down a working peer and make a connection for each block
> with high probability.
>
> (1) Block storage on nodes with a fraction of the history should not
> depend on believing random peers; because listening to peers can
> easily create attacks (e.g. someone could break the network; by
> convincing nodes to become unbalanced) and not useful-- it's not like
> the blockchain is substantially different for anyone; if you're to the
> point of needing to know coverage to fill then something is wrong.
> Gaps would be handled by archive nodes, so there is no reason to
> increase vulnerability by doing anything but behaving uniformly.
>
> (2) The decision to contact a node should need O(1) communications,
> not just because of the delay of chasing around just to find who has
> someone; but because that chasing process usually makes the process
> _highly_ sybil vulnerable.
>
> (3) The expression of what blocks a node has should be compact (e.g.
> not a dense list of blocks) so it can be rumored efficiently.
>
> (4) Figuring out what block (ranges) a peer has given should be
> computationally efficient.
>
> (5) The communication about what blocks a node has should be compact.
>
> (6) The coverage created by the network should be uniform, and should
> remain uniform as the blockchain grows; ideally it you shouldn't need
> to update your state to know what blocks a peer will store in the
> future, assuming that it doesn't change the amount of data its
> planning to use. (What Tier Nolan proposes sounds like it fails this
> point)
>
> (7) Growth of the blockchain shouldn't cause much (or any) need to
> refetch old blocks.
>

M = 1,000,000
N = number of "starts"

S(0) = hash(seed) mod M
...
S(n) = hash(S(n-1)) mod M

This generates a sequence of start points.  If the start point is less than
the block height, then it counts as a hit.

The node stores the 50MB of data starting at the block at height S(n).

As the blockchain increases in size, new starts will be less than the block
height.  This means some other runs would be deleted.

A weakness is that it is random with regards to block heights.  Tiny blocks
have the same priority as larger blocks.

0) Blocks are local, in 50MB runs
1) Agreed, nodes should download headers-first (or some other compact way
of finding the highest POW chain)
2) M could be fixed, N and the seed are all that is required.  The seed
doesn't have to be that large.  If 1% of the blockchain is stored, then 16
bits should be sufficient so that every block is covered by seeds.
3) N is likely to be less than 2 bytes and the seed can be 2 bytes
4) A 1% cover of 50GB of blockchain would have 10 starts @ 50MB per run.
That is 10 hashes.  They don't even necessarily need to be crypt hashes
5) Isn't this the same as 3?
6) Every block has the same odds of being included.  There inherently needs
to be an update when a node deletes some info due to exceeding its cap.  N
can be dropped one run at a time.
7) When new starts drop below the tip height, N can be decremented and that
one run is deleted.

There would need to be a special rule to ensure the low height blocks are
covered.  Nodes should keep the first 50MB of blocks with some probability
(10%?)

[-- Attachment #2: Type: text/html, Size: 4609 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bitcoin-development] Proposed additional options for pruned nodes
  2015-05-12 22:00           ` [Bitcoin-development] " Tier Nolan
@ 2015-05-12 22:09             ` gabe appleton
  0 siblings, 0 replies; 19+ messages in thread
From: gabe appleton @ 2015-05-12 22:09 UTC (permalink / raw)
  To: Tier Nolan; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 4693 bytes --]

This is exactly the sort of solution I was hoping for. It seems this is the
minimal modification to make it work, and, if someone was willing to work
with me, I would love to help implement this.

My only concern would be if the - - max-size flag is not included than this
delivers significantly less benefit to the end user. Still a good chunk,
but possibly not enough.
On May 12, 2015 6:03 PM, "Tier Nolan" <tier.nolan@gmail•com> wrote:

>
>
> On Tue, May 12, 2015 at 8:03 PM, Gregory Maxwell <gmaxwell@gmail•com>
> wrote:
>
>>
>> (0) Block coverage should have locality; historical blocks are
>> (almost) always needed in contiguous ranges.   Having random peers
>> with totally random blocks would be horrific for performance; as you'd
>> have to hunt down a working peer and make a connection for each block
>> with high probability.
>>
>> (1) Block storage on nodes with a fraction of the history should not
>> depend on believing random peers; because listening to peers can
>> easily create attacks (e.g. someone could break the network; by
>> convincing nodes to become unbalanced) and not useful-- it's not like
>> the blockchain is substantially different for anyone; if you're to the
>> point of needing to know coverage to fill then something is wrong.
>> Gaps would be handled by archive nodes, so there is no reason to
>> increase vulnerability by doing anything but behaving uniformly.
>>
>> (2) The decision to contact a node should need O(1) communications,
>> not just because of the delay of chasing around just to find who has
>> someone; but because that chasing process usually makes the process
>> _highly_ sybil vulnerable.
>>
>> (3) The expression of what blocks a node has should be compact (e.g.
>> not a dense list of blocks) so it can be rumored efficiently.
>>
>> (4) Figuring out what block (ranges) a peer has given should be
>> computationally efficient.
>>
>> (5) The communication about what blocks a node has should be compact.
>>
>> (6) The coverage created by the network should be uniform, and should
>> remain uniform as the blockchain grows; ideally it you shouldn't need
>> to update your state to know what blocks a peer will store in the
>> future, assuming that it doesn't change the amount of data its
>> planning to use. (What Tier Nolan proposes sounds like it fails this
>> point)
>>
>> (7) Growth of the blockchain shouldn't cause much (or any) need to
>> refetch old blocks.
>>
>
> M = 1,000,000
> N = number of "starts"
>
> S(0) = hash(seed) mod M
> ...
> S(n) = hash(S(n-1)) mod M
>
> This generates a sequence of start points.  If the start point is less
> than the block height, then it counts as a hit.
>
> The node stores the 50MB of data starting at the block at height S(n).
>
> As the blockchain increases in size, new starts will be less than the
> block height.  This means some other runs would be deleted.
>
> A weakness is that it is random with regards to block heights.  Tiny
> blocks have the same priority as larger blocks.
>
> 0) Blocks are local, in 50MB runs
> 1) Agreed, nodes should download headers-first (or some other compact way
> of finding the highest POW chain)
> 2) M could be fixed, N and the seed are all that is required.  The seed
> doesn't have to be that large.  If 1% of the blockchain is stored, then 16
> bits should be sufficient so that every block is covered by seeds.
> 3) N is likely to be less than 2 bytes and the seed can be 2 bytes
> 4) A 1% cover of 50GB of blockchain would have 10 starts @ 50MB per run.
> That is 10 hashes.  They don't even necessarily need to be crypt hashes
> 5) Isn't this the same as 3?
> 6) Every block has the same odds of being included.  There inherently
> needs to be an update when a node deletes some info due to exceeding its
> cap.  N can be dropped one run at a time.
> 7) When new starts drop below the tip height, N can be decremented and
> that one run is deleted.
>
> There would need to be a special rule to ensure the low height blocks are
> covered.  Nodes should keep the first 50MB of blocks with some probability
> (10%?)
>
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> Bitcoin-development mailing list
> Bitcoin-development@lists•sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bitcoin-development
>
>

[-- Attachment #2: Type: text/html, Size: 6210 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bitcoin-development] Proposed additional options for pruned nodes
  2015-05-12 19:03         ` Gregory Maxwell
  2015-05-12 19:24           ` gabe appleton
  2015-05-12 22:00           ` [Bitcoin-development] " Tier Nolan
@ 2015-05-13  5:19           ` Daniel Kraft
  2015-05-13  9:34             ` Tier Nolan
  2 siblings, 1 reply; 19+ messages in thread
From: Daniel Kraft @ 2015-05-13  5:19 UTC (permalink / raw)
  To: bitcoin-development

[-- Attachment #1: Type: text/plain, Size: 5131 bytes --]

Hi all!

On 2015-05-12 21:03, Gregory Maxwell wrote:
> Summarizing from memory:

In the context of this discussion, let me also restate an idea I've
proposed in Bitcointalk for this.  It is probably not perfect and could
surely be adapted (I'm interested in that), but I think it meets
most/all of the criteria stated below.  It is similar to the idea with
"start points", but gives O(log height) instead of O(height) for
determining which blocks a node has.

Let me for simplicity assume that the node wants to store 50% of all
blocks.  It is straight-forward to extend the scheme so that this is
configurable:

1) Create some kind of "seed" that can be compact and will be sent to
other peers to define which blocks the node has.  Use it to initialise a
PRNG of some sort.

2) Divide the range of all blocks into intervals with exponentially
growing size.  I. e., something like this:

1, 1, 2, 2, 4, 4, 8, 8, 16, 16, ...

With this, only O(log height) intervals are necessary to cover height
blocks.

3) Using the PRNG, *one* of the two intervals of each length is
selected.  The node stores these blocks and discards the others.
(Possibly keeping the last 200 or 2,016 or whatever blocks additionally.)

> (0) Block coverage should have locality; historical blocks are
> (almost) always needed in contiguous ranges.   Having random peers
> with totally random blocks would be horrific for performance; as you'd
> have to hunt down a working peer and make a connection for each block
> with high probability.

You get contiguous block ranges (with at most O(log height) "breaks").
Also ranges of newer blocks are longer, which may be an advantage if
those blocks are needed more often.

> (1) Block storage on nodes with a fraction of the history should not
> depend on believing random peers; because listening to peers can
> easily create attacks (e.g. someone could break the network; by
> convincing nodes to become unbalanced) and not useful-- it's not like
> the blockchain is substantially different for anyone; if you're to the
> point of needing to know coverage to fill then something is wrong.
> Gaps would be handled by archive nodes, so there is no reason to
> increase vulnerability by doing anything but behaving uniformly.

With my proposal, each node determines randomly and on its own which
blocks to store.  No believing anyone.

> (2) The decision to contact a node should need O(1) communications,
> not just because of the delay of chasing around just to find who has
> someone; but because that chasing process usually makes the process
> _highly_ sybil vulnerable.

Not exactly sure what you mean by that, but I think that's fulfilled.
You can (locally) compute in O(log height) from a node's seed whether or
not it has the blocks you need.  This needs only communication about the
node's seed.

> (3) The expression of what blocks a node has should be compact (e.g.
> not a dense list of blocks) so it can be rumored efficiently.

See above.

> (4) Figuring out what block (ranges) a peer has given should be
> computationally efficient.

O(log height).  Not O(1), but that's probably not a big issue.

> (5) The communication about what blocks a node has should be compact.

See above.

> (6) The coverage created by the network should be uniform, and should
> remain uniform as the blockchain grows; ideally it you shouldn't need
> to update your state to know what blocks a peer will store in the
> future, assuming that it doesn't change the amount of data its
> planning to use. (What Tier Nolan proposes sounds like it fails this
> point)

Coverage will be uniform if the seed is created randomly and the PRNG
has good properties.  No need to update the seed if the other node's
fraction is unchanged.  (Not sure if you suggest for nodes to define a
"fraction" or rather an "absolute size".)

> (7) Growth of the blockchain shouldn't cause much (or any) need to
> refetch old blocks.

No need to do that with the scheme.

What do you think about this idea?  Some random thoughts from myself:

*) I need to formulate it in a more general way so that the fraction can
be arbitrary and not just 50%.  This should be easy to do, and I can do
it if there's interest.

*) It is O(log height) and not O(1), but that should not be too
different for the heights that are relevant.

*) Maybe it would be better / easier to not use the PRNG at all; just
decide to *always* use the first or the second interval with a given
size.  Not sure about that.

*) With the proposed scheme, the node's actual fraction of stored blocks
will vary between 1/2 and 2/3 (if I got the mathematics right, it is
still early) as the blocks come in.  Not sure if that's a problem.  I
can do a precise analysis of this property for an extended scheme if you
are interested in it.

Yours,
Daniel

-- 
http://www.domob.eu/
OpenPGP: 1142 850E 6DFF 65BA 63D6  88A8 B249 2AC4 A733 0737
Namecoin: id/domob -> https://nameid.org/?name=domob
--
Done:  Arc-Bar-Cav-Hea-Kni-Ran-Rog-Sam-Tou-Val-Wiz
To go: Mon-Pri

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bitcoin-development] Proposed additional options for pruned nodes
  2015-05-13  5:19           ` Daniel Kraft
@ 2015-05-13  9:34             ` Tier Nolan
  0 siblings, 0 replies; 19+ messages in thread
From: Tier Nolan @ 2015-05-13  9:34 UTC (permalink / raw)
  Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 2308 bytes --]

On Wed, May 13, 2015 at 6:19 AM, Daniel Kraft <d@domob•eu> wrote:

> 2) Divide the range of all blocks into intervals with exponentially
> growing size.  I. e., something like this:
>
> 1, 1, 2, 2, 4, 4, 8, 8, 16, 16, ...
>

Interesting.  This can be combined with the system I suggested.

A node broadcasts 3 pieces of information

Seed (16 bits): This is the seed
M_bits_lsb (1 bit):  Used to indicate M during a transition
N (7 bits):  This is the count of the last range held (or partially held)

M = 1 << M_bits

M should be set to the lowest power of 2 greater than double the block
chain height

That gives M = 1 million at the moment.  During changing M, some nodes will
be using the higher M and others will use the lower M.

The M_bits_lsb field allows those to be distinguished.

As the block height approaches 512k, nodes can begin to upgrade.  For a
period around block 512k, some nodes could use M = 1 million and others
could use M = 2 million.

Assuming M is around 3 times higher than the block height, then the odds of
a start being less than the block height is around 35%.  If they runs by
25% each step, then that is approx a double for each hit.

Size(n) = ((4 + (n & 0x3)) << (n >> 2)) * 2.5MB

This gives an exponential increase, but groups of 4 are linearly
interpolated.

*Size(0) = 10 MB*
Size(1) = 12.5MB
Size(2) = 15 MB
Size(3) = 17.5MB
Size(4) = 20MB

*Size(5) = 25MB*
Size(6) = 30MB
Size(7) = 35MB

*Size(8) = 40MB*

Start(n) = Hash(seed + n) mod M

A node should store as much of its last start as possible.  Assuming start
0, 5, and 8 were "hits" but the node had a max size of 60MB.  It can store
0 and 5 and have 25MB left.  That isn't enough to store all of run 8, but
it should store 25MB of the blocks in run 8 anyway.

Size(255) = pow(2, 31) * 17.5MB = 35,840 TB

Decreasing N only causes previously accepted runs to be invalidated.

When a node approaches a transition point for N, it would select a block
height within 25,000 of the transition point.  Once it reaches that block,
it will begin downloading the new runs that it needs.  When updating, it
can set N to zero.  This spreads out the upgrade (over around a year), with
only a small number of nodes upgrading at any time.

New nodes should use the higher M, if near a transition point (say within
100,000).

[-- Attachment #2: Type: text/html, Size: 3186 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2015-05-13  9:34 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CANJO25J1WRHtfQLVXUB2s_sjj39pTPWmixAcXNJ3t-5os8RPmQ@mail.gmail.com>
2015-05-12 15:26 ` [Bitcoin-development] Proposed additional options for pruned nodes gabe appleton
2015-05-12 16:05   ` Jeff Garzik
2015-05-12 16:56     ` gabe appleton
2015-05-12 17:16     ` Peter Todd
2015-05-12 18:23       ` Tier Nolan
2015-05-12 19:03         ` Gregory Maxwell
2015-05-12 19:24           ` gabe appleton
2015-05-12 19:38             ` Jeff Garzik
2015-05-12 19:43               ` gabe appleton
2015-05-12 21:30                 ` [Bitcoin-development] [Bulk] " gb
2015-05-12 20:02               ` [Bitcoin-development] " Gregory Maxwell
2015-05-12 20:10                 ` Jeff Garzik
2015-05-12 20:41                   ` gabe appleton
2015-05-12 20:47                   ` Gregory Maxwell
     [not found]               ` <CAFVoEQTdmCSRAy3u26q5oHdfvFEytZDBfQb_fs_qttK15fiRmg@mail.gmail.com>
     [not found]                 ` <CAJHLa0OxxxiVd3JOp8SDvbF8dHj6KHdUNGb9L_GvTe93z3Z8mg@mail.gmail.com>
     [not found]                   ` <CAJHLa0MYSpVBD4VE65LVbADb2daOvE=N83G8F_zDSHy3AQ5DAQ@mail.gmail.com>
2015-05-12 21:17                     ` [Bitcoin-development] Fwd: " Adam Weiss
2015-05-12 22:00           ` [Bitcoin-development] " Tier Nolan
2015-05-12 22:09             ` gabe appleton
2015-05-13  5:19           ` Daniel Kraft
2015-05-13  9:34             ` Tier Nolan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox