[bitcoin-dev] A design for Probabilistic Partial Pruning

* [bitcoin-dev] A design for Probabilistic Partial Pruning
@ 2021-02-26 18:40 Keagan McClelland
  2021-02-27  7:10 ` Igor Cota
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Keagan McClelland @ 2021-02-26 18:40 UTC (permalink / raw)
  To: Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 2098 bytes --]

Hi all,

I've been thinking for quite some time about the problem of pruned nodes
and ongoing storage costs for full nodes. One of the things that strikes me
as odd is that we only really have two settings.

A. Prune everything except the most recent blocks, down to the cache size
B. Keep everything since genesis

From my observations and conversations with various folks in the community,
they would like to be able to run a "partially" pruned node to help bear
the load of bootstrapping other nodes and helping with data redundancy in
the network, but would prefer to not dedicate hundreds of Gigabytes of
storage space to the cause.

This led me to the idea that a node could randomly prune some of the blocks
from history if it passed some predicate. A rough sketch of this would look
as follows.

1. At node startup, it would generate a random seed, this would be unique
to the node but not necessary that it be cryptographically secure.
2. In the node configuration it would also carry a "threshold" expressed as
some percentage of blocks it wanted to keep.
3. As IBD occurs, based off of the threshold, the block hash, and the
node's unique seed, the node would either decide to prune the data or keep
it. The uniqueness of the node's hash should ensure that no block is
systematically overrepresented in the set of nodes choosing this storage
scheme.
4. Once the node's IBD is complete it would advertise this as a peer
service, advertising its seed and threshold, so that nodes could
deterministically deduce which of its peers had which blocks.

The goals are to increase data redundancy in a way that more uniformly
shares the load across nodes, alleviating some of the pressure of full
archive nodes on the IBD problem. I am working on a draft BIP for this
proposal but figured I would submit it as a high level idea in case anyone
had any feedback on the initial design before I go into specification
levels of detail.

If you have thoughts on

A. The protocol design itself
B. The barriers to put this kind of functionality into Core

I would love to hear from you,

Cheers,
Keagan

[-- Attachment #2: Type: text/html, Size: 2456 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread