public inbox for bitcoindev@googlegroups.com
 help / color / mirror / Atom feed
From: Keagan McClelland <keagan.mcclelland@gmail•com>
To: Bitcoin Protocol Discussion <bitcoin-dev@lists•linuxfoundation.org>
Subject: [bitcoin-dev] A design for Probabilistic Partial Pruning
Date: Fri, 26 Feb 2021 11:40:35 -0700	[thread overview]
Message-ID: <CALeFGL1WSSA69ARvJW3di-UC_gz7NV9q7=6zd7s=CHnmttdQFg@mail.gmail.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 2098 bytes --]

Hi all,

I've been thinking for quite some time about the problem of pruned nodes
and ongoing storage costs for full nodes. One of the things that strikes me
as odd is that we only really have two settings.

A. Prune everything except the most recent blocks, down to the cache size
B. Keep everything since genesis

From my observations and conversations with various folks in the community,
they would like to be able to run a "partially" pruned node to help bear
the load of bootstrapping other nodes and helping with data redundancy in
the network, but would prefer to not dedicate hundreds of Gigabytes of
storage space to the cause.

This led me to the idea that a node could randomly prune some of the blocks
from history if it passed some predicate. A rough sketch of this would look
as follows.

1. At node startup, it would generate a random seed, this would be unique
to the node but not necessary that it be cryptographically secure.
2. In the node configuration it would also carry a "threshold" expressed as
some percentage of blocks it wanted to keep.
3. As IBD occurs, based off of the threshold, the block hash, and the
node's unique seed, the node would either decide to prune the data or keep
it. The uniqueness of the node's hash should ensure that no block is
systematically overrepresented in the set of nodes choosing this storage
scheme.
4. Once the node's IBD is complete it would advertise this as a peer
service, advertising its seed and threshold, so that nodes could
deterministically deduce which of its peers had which blocks.

The goals are to increase data redundancy in a way that more uniformly
shares the load across nodes, alleviating some of the pressure of full
archive nodes on the IBD problem. I am working on a draft BIP for this
proposal but figured I would submit it as a high level idea in case anyone
had any feedback on the initial design before I go into specification
levels of detail.

If you have thoughts on

A. The protocol design itself
B. The barriers to put this kind of functionality into Core

I would love to hear from you,

Cheers,
Keagan

[-- Attachment #2: Type: text/html, Size: 2456 bytes --]

             reply	other threads:[~2021-02-26 18:40 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-26 18:40 Keagan McClelland [this message]
2021-02-27  7:10 ` Igor Cota
2021-02-28  3:41   ` Leo Wandersleb
2021-03-01  6:22     ` Craig Raw
2021-03-01  9:37       ` eric
2021-03-01 20:55         ` Keagan McClelland
2021-02-27 19:19 ` David A. Harding
2021-02-27 23:37   ` David A. Harding
2021-02-27 22:09 ` Yuval Kogman
2021-02-27 22:13   ` Yuval Kogman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALeFGL1WSSA69ARvJW3di-UC_gz7NV9q7=6zd7s=CHnmttdQFg@mail.gmail.com' \
    --to=keagan.mcclelland@gmail$(echo .)com \
    --cc=bitcoin-dev@lists$(echo .)linuxfoundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox