[Bitcoin-development] Chain pruning

public inbox for bitcoindev@googlegroups.com
 help / color / mirror / Atom feed

* [Bitcoin-development] Chain pruning
@ 2014-04-10 11:37 Mike Hearn
  2014-04-10 11:57 ` Wladimir
  0 siblings, 1 reply; 21+ messages in thread
From: Mike Hearn @ 2014-04-10 11:37 UTC (permalink / raw)
  To: Pieter Wuille; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 731 bytes --]

Chain pruning is probably a separate thread, changing subject.

> Reason is that the actual blocks available are likely to change frequently
> (if
> you keep the last week of blocks

I doubt anyone would specify blocks to keep in terms of time. More likely
it'd be in terms of megabytes, as that's the actual resource constraint on
nodes. Given a block size average it's easy to go from megabytes to
num_blocks, so I had imagined it'd be a new addr field that specifies how
many blocks from the chain head are stored. Then you'd connect to some
nodes and if they indicate their chain head - num_blocks_stored is higher
than your current chain height, you'd do a getaddr and go looking for nodes
that are storing far enough back.

[-- Attachment #2: Type: text/html, Size: 1032 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bitcoin-development] Chain pruning
  2014-04-10 11:37 [Bitcoin-development] Chain pruning Mike Hearn
@ 2014-04-10 11:57 ` Wladimir
  2014-04-10 12:10   ` Gregory Maxwell
  0 siblings, 1 reply; 21+ messages in thread
From: Wladimir @ 2014-04-10 11:57 UTC (permalink / raw)
  To: Mike Hearn; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 1419 bytes --]

On Thu, Apr 10, 2014 at 1:37 PM, Mike Hearn <mike@plan99•net> wrote:

> Chain pruning is probably a separate thread, changing subject.
>
>
>> Reason is that the actual blocks available are likely to change
>> frequently (if
>> you keep the last week of blocks
>
>
> I doubt anyone would specify blocks to keep in terms of time. More likely
> it'd be in terms of megabytes, as that's the actual resource constraint on
> nodes.
>

Well with bitcoin, (average) time, number of blocks and (maximum) size are
all related to each other so it doesn't matter how it is specified, it's
always possible to give estimates of all three.

As for implementation it indeed makes most sense to work with block ranges.

> Given a block size average it's easy to go from megabytes to num_blocks,
> so I had imagined it'd be a new addr field that specifies how many blocks
> from the chain head are stored. Then you'd connect to some nodes and if
> they indicate their chain head - num_blocks_stored is higher than your
> current chain height, you'd do a getaddr and go looking for nodes that are
> storing far enough back.
>

This assumes that nodes will always be storing the latest blocks. For
dynamic nodes that take part in the consensus this makes sense.

Just wondering: Would there be a use for a [static] node that, say, always
serves only the first 100000 blocks? Or, even, a static range like block
100000 - 200000?

Wladimir

[-- Attachment #2: Type: text/html, Size: 2296 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bitcoin-development] Chain pruning
  2014-04-10 11:57 ` Wladimir
@ 2014-04-10 12:10   ` Gregory Maxwell
  2014-04-10 14:19     ` Wladimir
  0 siblings, 1 reply; 21+ messages in thread
From: Gregory Maxwell @ 2014-04-10 12:10 UTC (permalink / raw)
  To: Wladimir; +Cc: Bitcoin Dev

On Thu, Apr 10, 2014 at 4:57 AM, Wladimir <laanwj@gmail•com> wrote:
> Just wondering: Would there be a use for a [static] node that, say, always
> serves only the first 100000 blocks? Or, even, a static range like block
> 100000 - 200000?

The last time we discussed this sipa collected data based on how often
blocks were feteched as a function of their depth and there was a huge
increase for recent blocks that didn't really level out until 2000
blocks or so— presumably its not uncommon for nodes to be offline for
a week or two at a time.

But sure I could see a fixed range as also being a useful contribution
though I'm struggling to figure out what set of constraints would
leave a node without following the consensus?   Obviously it has
bandwidth if you're expecting to contribute much in serving those
historic blocks... and verifying is reasonably cpu cheap with fast
ecdsa code.   Maybe it has a lot of read only storage?

I think it should be possible to express and use such a thing in the
protocol even if I'm currently unsure as to why you wouldn't do 100000
- 200000  _plus_ the most recent 144 that you were already keeping
around for reorgs.

In terms of peer selection, if the blocks you need aren't covered by
the nodes you're currently connected to I think you'd prefer to seek
node nodes which have the least rare-ness in the ranges they offer.
E.g. if you're looking for a block 50 from the tip,  you're should
probably not prefer to fetch it from someone with blocks 100000-150000
if its one of only 100 nodes that has that range.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bitcoin-development] Chain pruning
  2014-04-10 12:10   ` Gregory Maxwell
@ 2014-04-10 14:19     ` Wladimir
  2014-04-10 16:23       ` Brian Hoffman
  0 siblings, 1 reply; 21+ messages in thread
From: Wladimir @ 2014-04-10 14:19 UTC (permalink / raw)
  To: Gregory Maxwell; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 1979 bytes --]

On Thu, Apr 10, 2014 at 2:10 PM, Gregory Maxwell <gmaxwell@gmail•com> wrote:

> But sure I could see a fixed range as also being a useful contribution
> though I'm struggling to figure out what set of constraints would
> leave a node without following the consensus?   Obviously it has
> bandwidth if you're expecting to contribute much in serving those
> historic blocks... and verifying is reasonably cpu cheap with fast
> ecdsa code.   Maybe it has a lot of read only storage?
>

The use case is that you could burn the node implementation + block data +
a live operating system on a read-only medium. This could be set in stone
for a long time.

There would be no consensus code to keep up to date with protocol
developments, because it doesn't take active part in it.

I don't think it would be terribly useful right now, but it could be useful
when nodes that host all history become rare. It'd allow distributing
'pieces of history' in a self-contained form.

> I think it should be possible to express and use such a thing in the
> protocol even if I'm currently unsure as to why you wouldn't do 100000
> - 200000  _plus_ the most recent 144 that you were already keeping
> around for reorgs.
>

Yes, it would be nice to at least be able to express it, if it doesn't make
the protocol too finicky.

In terms of peer selection, if the blocks you need aren't covered by
> the nodes you're currently connected to I think you'd prefer to seek
> node nodes which have the least rare-ness in the ranges they offer.
> E.g. if you're looking for a block 50 from the tip,  you're should
> probably not prefer to fetch it from someone with blocks 100000-150000
> if its one of only 100 nodes that has that range.
>

That makes sense.

In general, if you want a block 50 from the tip, it would be best to
request it from a node that only serves the last N (N>~50) blocks, and not
a history node that could use the same bandwidth to serve earlier, rarer
blocks to others.

Wladimir

[-- Attachment #2: Type: text/html, Size: 2795 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bitcoin-development] Chain pruning
  2014-04-10 14:19     ` Wladimir
@ 2014-04-10 16:23       ` Brian Hoffman
  2014-04-10 16:28         ` Mike Hearn
  0 siblings, 1 reply; 21+ messages in thread
From: Brian Hoffman @ 2014-04-10 16:23 UTC (permalink / raw)
  To: Wladimir; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 3678 bytes --]

This is probably just noise, but what if nodes could compress and store
earlier transaction sets (archive sets) and serve them up conditionally. So
if there were let's say 100 archive sets of (10,000 blocks) you might have
5 open at any time when you're an active archive node while the others sit
on your disk compressed and unavailable to the network. This would allow
nodes to have all full transactions but conserve disk space and network
activity since they wouldn't ever respond about every possible transaction.

This could be based on a rotational request period, based on request count
or done periodically. Once their considered active they would be expected
to uncompress a set and make it available to the network. Clients would
have to piece together archive sets from different nodes, but if there
weren't enough archive nodes to cover the chain they could ratchet up the
amount of required open archive sets when your node was active.

I fully expect to have my idea trashed, but I'm dipping toes in the waters
of contribution.




On Thu, Apr 10, 2014 at 10:19 AM, Wladimir <laanwj@gmail•com> wrote:

>
> On Thu, Apr 10, 2014 at 2:10 PM, Gregory Maxwell <gmaxwell@gmail•com>wrote:
>
>> But sure I could see a fixed range as also being a useful contribution
>> though I'm struggling to figure out what set of constraints would
>> leave a node without following the consensus?   Obviously it has
>> bandwidth if you're expecting to contribute much in serving those
>> historic blocks... and verifying is reasonably cpu cheap with fast
>> ecdsa code.   Maybe it has a lot of read only storage?
>>
>
> The use case is that you could burn the node implementation + block data +
> a live operating system on a read-only medium. This could be set in stone
> for a long time.
>
> There would be no consensus code to keep up to date with protocol
> developments, because it doesn't take active part in it.
>
> I don't think it would be terribly useful right now, but it could be
> useful when nodes that host all history become rare. It'd allow
> distributing 'pieces of history' in a self-contained form.
>
>
>> I think it should be possible to express and use such a thing in the
>> protocol even if I'm currently unsure as to why you wouldn't do 100000
>> - 200000  _plus_ the most recent 144 that you were already keeping
>> around for reorgs.
>>
>
> Yes, it would be nice to at least be able to express it, if it doesn't
> make the protocol too finicky.
>
> In terms of peer selection, if the blocks you need aren't covered by
>> the nodes you're currently connected to I think you'd prefer to seek
>> node nodes which have the least rare-ness in the ranges they offer.
>> E.g. if you're looking for a block 50 from the tip,  you're should
>> probably not prefer to fetch it from someone with blocks 100000-150000
>> if its one of only 100 nodes that has that range.
>>
>
> That makes sense.
>
> In general, if you want a block 50 from the tip, it would be best to
> request it from a node that only serves the last N (N>~50) blocks, and not
> a history node that could use the same bandwidth to serve earlier, rarer
> blocks to others.
>
> Wladimir
>
>
>
> ------------------------------------------------------------------------------
> Put Bad Developers to Shame
> Dominate Development with Jenkins Continuous Integration
> Continuously Automate Build, Test & Deployment
> Start a new project now. Try Jenkins in the cloud.
> http://p.sf.net/sfu/13600_Cloudbees
> _______________________________________________
> Bitcoin-development mailing list
> Bitcoin-development@lists•sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bitcoin-development
>
>

[-- Attachment #2: Type: text/html, Size: 5202 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bitcoin-development] Chain pruning
  2014-04-10 16:23       ` Brian Hoffman
@ 2014-04-10 16:28         ` Mike Hearn
  2014-04-10 16:47           ` Brian Hoffman
  2014-04-10 16:52           ` Ricardo Filipe
  0 siblings, 2 replies; 21+ messages in thread
From: Mike Hearn @ 2014-04-10 16:28 UTC (permalink / raw)
  To: Brian Hoffman; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 531 bytes --]

Suggestions always welcome!

The main problem with this is that the block chain is mostly random bytes
(hashes, keys) so it doesn't compress that well. It compresses a bit, but
not enough to change the fundamental physics.

However, that does not mean the entire chain has to be stored on expensive
rotating platters. I've suggested that in some star trek future where the
chain really is gigantic, it could be stored on tape and spooled off at
high speed. Literally a direct DMA from tape drive to NIC. But we're not
there yet :)

[-- Attachment #2: Type: text/html, Size: 617 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bitcoin-development] Chain pruning
  2014-04-10 16:28         ` Mike Hearn
@ 2014-04-10 16:47           ` Brian Hoffman
  2014-04-10 16:54             ` Ricardo Filipe
  2014-04-10 16:59             ` Pieter Wuille
  2014-04-10 16:52           ` Ricardo Filipe
  1 sibling, 2 replies; 21+ messages in thread
From: Brian Hoffman @ 2014-04-10 16:47 UTC (permalink / raw)
  To: Mike Hearn; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 1067 bytes --]

Looks like only about ~30% disk space savings so I see your point. Is there
a critical reason why blocks couldn't be formed into "superblocks" that are
chained together and nodes could serve a specific superblock, which could
be pieced together from different nodes to get the full blockchain? This
would allow participants with limited resources to serve full portions of
the blockchain rather than limited pieces of the entire blockchain.

On Thu, Apr 10, 2014 at 12:28 PM, Mike Hearn <mike@plan99•net> wrote:

> Suggestions always welcome!
>
> The main problem with this is that the block chain is mostly random bytes
> (hashes, keys) so it doesn't compress that well. It compresses a bit, but
> not enough to change the fundamental physics.
>
> However, that does not mean the entire chain has to be stored on expensive
> rotating platters. I've suggested that in some star trek future where the
> chain really is gigantic, it could be stored on tape and spooled off at
> high speed. Literally a direct DMA from tape drive to NIC. But we're not
> there yet :)
>

[-- Attachment #2: Type: text/html, Size: 1446 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bitcoin-development] Chain pruning
  2014-04-10 16:28         ` Mike Hearn
  2014-04-10 16:47           ` Brian Hoffman
@ 2014-04-10 16:52           ` Ricardo Filipe
  1 sibling, 0 replies; 21+ messages in thread
From: Ricardo Filipe @ 2014-04-10 16:52 UTC (permalink / raw)
  To: Mike Hearn; +Cc: Bitcoin Dev

anyway, any kind of compression that comes to the blockchain is
orthogonal to pruning.

I agree that you will probably want some kind of replication on more
recent nodes than on older ones. However, nodes with older blocks
don't need to be "static", get the block distribution algorithm to
sort it out.

2014-04-10 17:28 GMT+01:00 Mike Hearn <mike@plan99•net>:
> Suggestions always welcome!
>
> The main problem with this is that the block chain is mostly random bytes
> (hashes, keys) so it doesn't compress that well. It compresses a bit, but
> not enough to change the fundamental physics.
>
> However, that does not mean the entire chain has to be stored on expensive
> rotating platters. I've suggested that in some star trek future where the
> chain really is gigantic, it could be stored on tape and spooled off at high
> speed. Literally a direct DMA from tape drive to NIC. But we're not there
> yet :)
>
> ------------------------------------------------------------------------------
> Put Bad Developers to Shame
> Dominate Development with Jenkins Continuous Integration
> Continuously Automate Build, Test & Deployment
> Start a new project now. Try Jenkins in the cloud.
> http://p.sf.net/sfu/13600_Cloudbees
> _______________________________________________
> Bitcoin-development mailing list
> Bitcoin-development@lists•sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bitcoin-development
>



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bitcoin-development] Chain pruning
  2014-04-10 16:47           ` Brian Hoffman
@ 2014-04-10 16:54             ` Ricardo Filipe
  2014-04-10 16:56               ` Brian Hoffman
  2014-04-10 16:59             ` Pieter Wuille
  1 sibling, 1 reply; 21+ messages in thread
From: Ricardo Filipe @ 2014-04-10 16:54 UTC (permalink / raw)
  To: Brian Hoffman; +Cc: Bitcoin Dev

that's what blockchain pruning is all about :)

2014-04-10 17:47 GMT+01:00 Brian Hoffman <brianchoffman@gmail•com>:
> Looks like only about ~30% disk space savings so I see your point. Is there
> a critical reason why blocks couldn't be formed into "superblocks" that are
> chained together and nodes could serve a specific superblock, which could be
> pieced together from different nodes to get the full blockchain? This would
> allow participants with limited resources to serve full portions of the
> blockchain rather than limited pieces of the entire blockchain.
>
>
> On Thu, Apr 10, 2014 at 12:28 PM, Mike Hearn <mike@plan99•net> wrote:
>>
>> Suggestions always welcome!
>>
>> The main problem with this is that the block chain is mostly random bytes
>> (hashes, keys) so it doesn't compress that well. It compresses a bit, but
>> not enough to change the fundamental physics.
>>
>> However, that does not mean the entire chain has to be stored on expensive
>> rotating platters. I've suggested that in some star trek future where the
>> chain really is gigantic, it could be stored on tape and spooled off at high
>> speed. Literally a direct DMA from tape drive to NIC. But we're not there
>> yet :)
>
>
>
> ------------------------------------------------------------------------------
> Put Bad Developers to Shame
> Dominate Development with Jenkins Continuous Integration
> Continuously Automate Build, Test & Deployment
> Start a new project now. Try Jenkins in the cloud.
> http://p.sf.net/sfu/13600_Cloudbees
> _______________________________________________
> Bitcoin-development mailing list
> Bitcoin-development@lists•sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bitcoin-development
>



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bitcoin-development] Chain pruning
  2014-04-10 16:54             ` Ricardo Filipe
@ 2014-04-10 16:56               ` Brian Hoffman
  0 siblings, 0 replies; 21+ messages in thread
From: Brian Hoffman @ 2014-04-10 16:56 UTC (permalink / raw)
  To: Ricardo Filipe; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 1943 bytes --]

Okay...will let myself out now ;P


On Thu, Apr 10, 2014 at 12:54 PM, Ricardo Filipe
<ricardojdfilipe@gmail•com>wrote:

> that's what blockchain pruning is all about :)
>
> 2014-04-10 17:47 GMT+01:00 Brian Hoffman <brianchoffman@gmail•com>:
> > Looks like only about ~30% disk space savings so I see your point. Is
> there
> > a critical reason why blocks couldn't be formed into "superblocks" that
> are
> > chained together and nodes could serve a specific superblock, which
> could be
> > pieced together from different nodes to get the full blockchain? This
> would
> > allow participants with limited resources to serve full portions of the
> > blockchain rather than limited pieces of the entire blockchain.
> >
> >
> > On Thu, Apr 10, 2014 at 12:28 PM, Mike Hearn <mike@plan99•net> wrote:
> >>
> >> Suggestions always welcome!
> >>
> >> The main problem with this is that the block chain is mostly random
> bytes
> >> (hashes, keys) so it doesn't compress that well. It compresses a bit,
> but
> >> not enough to change the fundamental physics.
> >>
> >> However, that does not mean the entire chain has to be stored on
> expensive
> >> rotating platters. I've suggested that in some star trek future where
> the
> >> chain really is gigantic, it could be stored on tape and spooled off at
> high
> >> speed. Literally a direct DMA from tape drive to NIC. But we're not
> there
> >> yet :)
> >
> >
> >
> >
> ------------------------------------------------------------------------------
> > Put Bad Developers to Shame
> > Dominate Development with Jenkins Continuous Integration
> > Continuously Automate Build, Test & Deployment
> > Start a new project now. Try Jenkins in the cloud.
> > http://p.sf.net/sfu/13600_Cloudbees
> > _______________________________________________
> > Bitcoin-development mailing list
> > Bitcoin-development@lists•sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/bitcoin-development
> >
>

[-- Attachment #2: Type: text/html, Size: 2833 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bitcoin-development] Chain pruning
  2014-04-10 16:47           ` Brian Hoffman
  2014-04-10 16:54             ` Ricardo Filipe
@ 2014-04-10 16:59             ` Pieter Wuille
  2014-04-10 17:06               ` Brian Hoffman
                                 ` (2 more replies)
  1 sibling, 3 replies; 21+ messages in thread
From: Pieter Wuille @ 2014-04-10 16:59 UTC (permalink / raw)
  To: Brian Hoffman; +Cc: Bitcoin Dev

On Thu, Apr 10, 2014 at 6:47 PM, Brian Hoffman <brianchoffman@gmail•com> wrote:
> Looks like only about ~30% disk space savings so I see your point. Is there
> a critical reason why blocks couldn't be formed into "superblocks" that are
> chained together and nodes could serve a specific superblock, which could be
> pieced together from different nodes to get the full blockchain? This would
> allow participants with limited resources to serve full portions of the
> blockchain rather than limited pieces of the entire blockchain.

As this is a suggestion that I think I've seen come up once a month
for the past 3 years, let's try to answer it thoroughly.

The actual "state" of the blockchain is the UTXO set (stored in
chainstate/ by the reference client). It's the set of all unspent
transaction outputs at the currently active point in the block chain.
It is all you need for validating future blocks.

The problem is, you can't just give someone the UTXO set and expect
them to trust it, as there is no way to prove that it was the result
of processing the actual blocks.

As Bitcoin's full node uses a "zero trust" model, where (apart from
one detail: the order of otherwise valid transactions) it never
assumes any data received from the outside it valid, it HAS to see the
previous blocks in order to establish the validity of the current UTXO
set. This is what initial block syncing does. Nothing but the actual
blocks can provide this data, and it is why the actual blocks need to
be available. It does not require everyone to have all blocks, though
- they just need to have seen them during processing.

A related, but not identical evolution is merkle UTXO commitments.
This means that we shape the UTXO set as a merkle tree, compute its
root after every block, and require that the block commits to this
root hash (by putting it in the coinbase, for example). This means a
full node can copy the chain state from someone else, and check that
its hash matches what the block chain commits to. It's important to
note that this is a strict reduction in security: we're now trusting
that the longest chain (with most proof of work) commits to a valid
UTXO set (at some point in the past).

In essence, combining both ideas means you get "superblocks" (the UTXO
set is essentially the summary of the result of all past blocks), in a
way that is less-than-currently-but-perhaps-still-acceptably-validated.

-- 
Pieter

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bitcoin-development] Chain pruning
  2014-04-10 16:59             ` Pieter Wuille
@ 2014-04-10 17:06               ` Brian Hoffman
  2014-04-10 18:19               ` Paul Rabahy
  2014-04-10 21:34               ` Jesus Cea
  2 siblings, 0 replies; 21+ messages in thread
From: Brian Hoffman @ 2014-04-10 17:06 UTC (permalink / raw)
  To: Pieter Wuille; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 2931 bytes --]

Ok I think I've got a good understanding of where we're at now. I can
promise that the next person to waste your time in 30 days will not be me.
I'm pleasantly surprised to see a community that doesn't kickban newcomers
and takes the time to explain (re-explain) concepts.

Hoping to add *beneficial* thoughts in the future!


On Thu, Apr 10, 2014 at 12:59 PM, Pieter Wuille <pieter.wuille@gmail•com>wrote:

> On Thu, Apr 10, 2014 at 6:47 PM, Brian Hoffman <brianchoffman@gmail•com>
> wrote:
> > Looks like only about ~30% disk space savings so I see your point. Is
> there
> > a critical reason why blocks couldn't be formed into "superblocks" that
> are
> > chained together and nodes could serve a specific superblock, which
> could be
> > pieced together from different nodes to get the full blockchain? This
> would
> > allow participants with limited resources to serve full portions of the
> > blockchain rather than limited pieces of the entire blockchain.
>
> As this is a suggestion that I think I've seen come up once a month
> for the past 3 years, let's try to answer it thoroughly.
>
> The actual "state" of the blockchain is the UTXO set (stored in
> chainstate/ by the reference client). It's the set of all unspent
> transaction outputs at the currently active point in the block chain.
> It is all you need for validating future blocks.
>
> The problem is, you can't just give someone the UTXO set and expect
> them to trust it, as there is no way to prove that it was the result
> of processing the actual blocks.
>
> As Bitcoin's full node uses a "zero trust" model, where (apart from
> one detail: the order of otherwise valid transactions) it never
> assumes any data received from the outside it valid, it HAS to see the
> previous blocks in order to establish the validity of the current UTXO
> set. This is what initial block syncing does. Nothing but the actual
> blocks can provide this data, and it is why the actual blocks need to
> be available. It does not require everyone to have all blocks, though
> - they just need to have seen them during processing.
>
> A related, but not identical evolution is merkle UTXO commitments.
> This means that we shape the UTXO set as a merkle tree, compute its
> root after every block, and require that the block commits to this
> root hash (by putting it in the coinbase, for example). This means a
> full node can copy the chain state from someone else, and check that
> its hash matches what the block chain commits to. It's important to
> note that this is a strict reduction in security: we're now trusting
> that the longest chain (with most proof of work) commits to a valid
> UTXO set (at some point in the past).
>
> In essence, combining both ideas means you get "superblocks" (the UTXO
> set is essentially the summary of the result of all past blocks), in a
> way that is less-than-currently-but-perhaps-still-acceptably-validated.
>
> --
> Pieter
>

[-- Attachment #2: Type: text/html, Size: 3609 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bitcoin-development] Chain pruning
  2014-04-10 16:59             ` Pieter Wuille
  2014-04-10 17:06               ` Brian Hoffman
@ 2014-04-10 18:19               ` Paul Rabahy
  2014-04-10 18:32                 ` Pieter Wuille
  2014-04-10 19:36                 ` Mark Friedenbach
  2014-04-10 21:34               ` Jesus Cea
  2 siblings, 2 replies; 21+ messages in thread
From: Paul Rabahy @ 2014-04-10 18:19 UTC (permalink / raw)
  To: Pieter Wuille; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 3876 bytes --]

You say UTXO commitments is "a strict reduction in security". If UTXO
commitments were rolled in as a soft fork, I do not see any new attacks
that could happen to a person trusting the committed UTXO + any remaining
blocks to catch up to the head.

I would imagine the soft fork to proceed similar to the following.
1. Miners begin including UTXO commitments.
2. Miners begin rejecting blocks with invalid UTXO commitments.
3. Miners begin rejecting blocks with no UTXO commitments.

To start up a fresh client it would follow the following.
1. Sync headers.
2. Pick a committed UTXO that is deep enough to not get orphaned.
3. Sync blocks from commitment to head.

I would argue that a client following this methodology is strictly more
secure than SPV, and I don't see any attacks that make it less secure than
a full client. It is obviously still susceptible to a 51% attack, but so is
the traditional block chain. I also do not see any sybil attacks that are
strengthened by this change because it is not modifying the networking code.

I guess if the soft fork happened, then miners began to not include the
UTXO commitment anymore, it would lower the overall network hash rate, but
this would be self-harming to the miners so they have an incentive to not
do it.

Please let me know if I have missed something.

On Thu, Apr 10, 2014 at 12:59 PM, Pieter Wuille <pieter.wuille@gmail•com>wrote:

>
> As this is a suggestion that I think I've seen come up once a month
> for the past 3 years, let's try to answer it thoroughly.
>
> The actual "state" of the blockchain is the UTXO set (stored in
> chainstate/ by the reference client). It's the set of all unspent
> transaction outputs at the currently active point in the block chain.
> It is all you need for validating future blocks.
>
> The problem is, you can't just give someone the UTXO set and expect
> them to trust it, as there is no way to prove that it was the result
> of processing the actual blocks.
>
> As Bitcoin's full node uses a "zero trust" model, where (apart from
> one detail: the order of otherwise valid transactions) it never
> assumes any data received from the outside it valid, it HAS to see the
> previous blocks in order to establish the validity of the current UTXO
> set. This is what initial block syncing does. Nothing but the actual
> blocks can provide this data, and it is why the actual blocks need to
> be available. It does not require everyone to have all blocks, though
> - they just need to have seen them during processing.
>
> A related, but not identical evolution is merkle UTXO commitments.
> This means that we shape the UTXO set as a merkle tree, compute its
> root after every block, and require that the block commits to this
> root hash (by putting it in the coinbase, for example). This means a
> full node can copy the chain state from someone else, and check that
> its hash matches what the block chain commits to. It's important to
> note that this is a strict reduction in security: we're now trusting
> that the longest chain (with most proof of work) commits to a valid
> UTXO set (at some point in the past).
>
> In essence, combining both ideas means you get "superblocks" (the UTXO
> set is essentially the summary of the result of all past blocks), in a
> way that is less-than-currently-but-perhaps-still-acceptably-validated.
>
> --
> Pieter
>
>
> ------------------------------------------------------------------------------
> Put Bad Developers to Shame
> Dominate Development with Jenkins Continuous Integration
> Continuously Automate Build, Test & Deployment
> Start a new project now. Try Jenkins in the cloud.
> http://p.sf.net/sfu/13600_Cloudbees
> _______________________________________________
> Bitcoin-development mailing list
> Bitcoin-development@lists•sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bitcoin-development
>

[-- Attachment #2: Type: text/html, Size: 4975 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bitcoin-development] Chain pruning
  2014-04-10 18:19               ` Paul Rabahy
@ 2014-04-10 18:32                 ` Pieter Wuille
  2014-04-10 20:12                   ` Tier Nolan
  2014-04-10 19:36                 ` Mark Friedenbach
  1 sibling, 1 reply; 21+ messages in thread
From: Pieter Wuille @ 2014-04-10 18:32 UTC (permalink / raw)
  To: Paul Rabahy; +Cc: Bitcoin Dev

On Thu, Apr 10, 2014 at 8:19 PM, Paul Rabahy <prabahy@gmail•com> wrote:
> Please let me know if I have missed something.

A 51% attack can make you believe you were paid, while you weren't.

Full node security right now validates everything - there is no way
you can ever be made to believe something invalid. The only attacks
against it are about which version of valid history eventually gets
chosen.

If you trust hashrate for determining which UTXO set is valid, a 51%
attack becomes worse in that you can be made to believe a version of
history which is in fact invalid.

-- 
Pieter

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bitcoin-development] Chain pruning
  2014-04-10 18:19               ` Paul Rabahy
  2014-04-10 18:32                 ` Pieter Wuille
@ 2014-04-10 19:36                 ` Mark Friedenbach
  1 sibling, 0 replies; 21+ messages in thread
From: Mark Friedenbach @ 2014-04-10 19:36 UTC (permalink / raw)
  To: bitcoin-development

You took the quote out of context:

"a full node can copy the chain state from someone else, and check that
its hash matches what the block chain commits to. It's important to
note that this is a strict reduction in security: we're now trusting
that the longest chain (with most proof of work) commits to a valid
UTXO set (at some point in the past)."

The described synchronization mechanism would be to determine the
most-work block header (SPV level of security!), and then sync the UTXO
set committed to within that block. This is strictly less security than
building the UTXO set yourself because it is susceptible to a 51% attack
which violates protocol rules.

On 04/10/2014 11:19 AM, Paul Rabahy wrote:
> You say UTXO commitments is "a strict reduction in security". If UTXO
> commitments were rolled in as a soft fork, I do not see any new attacks
> that could happen to a person trusting the committed UTXO + any
> remaining blocks to catch up to the head.
> 
> I would imagine the soft fork to proceed similar to the following.
> 1. Miners begin including UTXO commitments.
> 2. Miners begin rejecting blocks with invalid UTXO commitments.
> 3. Miners begin rejecting blocks with no UTXO commitments.
> 
> To start up a fresh client it would follow the following.
> 1. Sync headers.
> 2. Pick a committed UTXO that is deep enough to not get orphaned.
> 3. Sync blocks from commitment to head.
> 
> I would argue that a client following this methodology is strictly more
> secure than SPV, and I don't see any attacks that make it less secure
> than a full client. It is obviously still susceptible to a 51% attack,
> but so is the traditional block chain. I also do not see any sybil
> attacks that are strengthened by this change because it is not modifying
> the networking code.
> 
> I guess if the soft fork happened, then miners began to not include the
> UTXO commitment anymore, it would lower the overall network hash rate,
> but this would be self-harming to the miners so they have an incentive
> to not do it.
> 
> Please let me know if I have missed something.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bitcoin-development] Chain pruning
  2014-04-10 18:32                 ` Pieter Wuille
@ 2014-04-10 20:12                   ` Tier Nolan
  2014-04-10 20:29                     ` Pieter Wuille
  0 siblings, 1 reply; 21+ messages in thread
From: Tier Nolan @ 2014-04-10 20:12 UTC (permalink / raw)
  To: Pieter Wuille; +Cc: Bitcoin Dev, Paul Rabahy

[-- Attachment #1: Type: text/plain, Size: 1775 bytes --]

On Thu, Apr 10, 2014 at 7:32 PM, Pieter Wuille <pieter.wuille@gmail•com>wrote:

> If you trust hashrate for determining which UTXO set is valid, a 51%
> attack becomes worse in that you can be made to believe a version of
> history which is in fact invalid.
>

If there are invalidation proofs, then this isn't strictly true.

If you are connected to 10 nodes and only 1 is honest, it can send you the
proof that your main chain is invalid.

For bad scripts, it shows you the input transaction for the invalid input
along with the merkle path to prove it is in a previous block.

For double spends, it could show the transaction which spent the output.

Double spends are pretty much the same as trying to spend non-existent
outputs anyway.

If the UTXO set commit was actually a merkle tree, then all updates could
be included.

Blocks could have extra data with the proofs that the UTXO set is being
updated correctly.

To update the UTXO set, you need the paths for all spent inputs.

It puts a large load on miners to keep things working, since they have to
run a full node.

If they commit the data to the chain, then SPV nodes can do local checking.

One of them will find invalid blocks eventually (even if one of the other
miners don't).

>
> --
> Pieter
>
>
> ------------------------------------------------------------------------------
> Put Bad Developers to Shame
> Dominate Development with Jenkins Continuous Integration
> Continuously Automate Build, Test & Deployment
> Start a new project now. Try Jenkins in the cloud.
> http://p.sf.net/sfu/13600_Cloudbees
> _______________________________________________
> Bitcoin-development mailing list
> Bitcoin-development@lists•sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bitcoin-development
>

[-- Attachment #2: Type: text/html, Size: 2742 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bitcoin-development] Chain pruning
  2014-04-10 20:12                   ` Tier Nolan
@ 2014-04-10 20:29                     ` Pieter Wuille
  0 siblings, 0 replies; 21+ messages in thread
From: Pieter Wuille @ 2014-04-10 20:29 UTC (permalink / raw)
  To: Tier Nolan; +Cc: Bitcoin Dev, Paul Rabahy

On Thu, Apr 10, 2014 at 10:12 PM, Tier Nolan <tier.nolan@gmail•com> wrote:
> On Thu, Apr 10, 2014 at 7:32 PM, Pieter Wuille <pieter.wuille@gmail•com>
> wrote:
>>
>> If you trust hashrate for determining which UTXO set is valid, a 51%
>> attack becomes worse in that you can be made to believe a version of
>> history which is in fact invalid.
>
>
> If there are invalidation proofs, then this isn't strictly true.

I'm aware of fraud proofs, and they're a very cool idea. They allow
you to leverage some "herd immunity" in the system (assuming you'll be
told about invalid data you received without actually validating it).
However, they are certainly not the same thing as zero trust security
a fully validating node offers.

For example, a sybil attack that hides the actual best chain + fraud
proofs from you, plus being fed a chain that commits to an invalid
UTXO set.

There are many ideas that make attacks harder, and they're probably
good ideas to deploy, but there is little that achieves the security
of a full node. (well, perhaps a zero-knowledge proof of having run
the validation code against the claimed chain tip to produce the known
UTXO set...).
-- 
Pieter



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bitcoin-development] Chain pruning
  2014-04-10 16:59             ` Pieter Wuille
  2014-04-10 17:06               ` Brian Hoffman
  2014-04-10 18:19               ` Paul Rabahy
@ 2014-04-10 21:34               ` Jesus Cea
  2014-04-10 22:15                 ` Mark Friedenbach
  2 siblings, 1 reply; 21+ messages in thread
From: Jesus Cea @ 2014-04-10 21:34 UTC (permalink / raw)
  To: bitcoin-development

[-- Attachment #1: Type: text/plain, Size: 1391 bytes --]

On 10/04/14 18:59, Pieter Wuille wrote:
> It's important to
> note that this is a strict reduction in security: we're now trusting
> that the longest chain (with most proof of work) commits to a valid
> UTXO set (at some point in the past).

AFAIK, current bitcoin code code already set blockchain checkpoints from
time to time. It is a garanteed that a longer chain starting before the
checkpoint is not going to be accepted suddently. See
<https://bitcointalk.org/index.php?topic=194078.0>.

Could be perfectly valid to store only unspend wallets before last
checkpoint, if during the blockchain download the node did all the checks.

Would be interesting, of course, to be able to verify "unspend wallet
accounting" having only that checkpoint data (the merkle tree can do
that, I guess). So you could detect a data corruption or manipulation in
your local harddisk.

-- 
Jesús Cea Avión                         _/_/      _/_/_/        _/_/_/
jcea@jcea•es - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
Twitter: @jcea                        _/_/    _/_/          _/_/_/_/_/
jabber / xmpp:jcea@jabber•org  _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 538 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bitcoin-development] Chain pruning
  2014-04-10 21:34               ` Jesus Cea
@ 2014-04-10 22:15                 ` Mark Friedenbach
  2014-04-10 22:24                   ` Jesus Cea
  0 siblings, 1 reply; 21+ messages in thread
From: Mark Friedenbach @ 2014-04-10 22:15 UTC (permalink / raw)
  To: bitcoin-development

Checkpoints will go away, eventually.

On 04/10/2014 02:34 PM, Jesus Cea wrote:
> On 10/04/14 18:59, Pieter Wuille wrote:
>> It's important to
>> note that this is a strict reduction in security: we're now trusting
>> that the longest chain (with most proof of work) commits to a valid
>> UTXO set (at some point in the past).
> 
> AFAIK, current bitcoin code code already set blockchain checkpoints from
> time to time. It is a garanteed that a longer chain starting before the
> checkpoint is not going to be accepted suddently. See
> <https://bitcointalk.org/index.php?topic=194078.0>.
> 
> Could be perfectly valid to store only unspend wallets before last
> checkpoint, if during the blockchain download the node did all the checks.
> 
> Would be interesting, of course, to be able to verify "unspend wallet
> accounting" having only that checkpoint data (the merkle tree can do
> that, I guess). So you could detect a data corruption or manipulation in
> your local harddisk.
> 
> 
> 
> ------------------------------------------------------------------------------
> Put Bad Developers to Shame
> Dominate Development with Jenkins Continuous Integration
> Continuously Automate Build, Test & Deployment 
> Start a new project now. Try Jenkins in the cloud.
> http://p.sf.net/sfu/13600_Cloudbees
> 
> 
> 
> _______________________________________________
> Bitcoin-development mailing list
> Bitcoin-development@lists•sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bitcoin-development
> 



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bitcoin-development] Chain pruning
  2014-04-10 22:15                 ` Mark Friedenbach
@ 2014-04-10 22:24                   ` Jesus Cea
  2014-04-10 22:33                     ` Gregory Maxwell
  0 siblings, 1 reply; 21+ messages in thread
From: Jesus Cea @ 2014-04-10 22:24 UTC (permalink / raw)
  To: bitcoin-development

[-- Attachment #1: Type: text/plain, Size: 650 bytes --]

On 11/04/14 00:15, Mark Friedenbach wrote:
> Checkpoints will go away, eventually.

Why?. The points in the forum thread seem pretty sensible.

-- 
Jesús Cea Avión                         _/_/      _/_/_/        _/_/_/
jcea@jcea•es - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
Twitter: @jcea                        _/_/    _/_/          _/_/_/_/_/
jabber / xmpp:jcea@jabber•org  _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 538 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bitcoin-development] Chain pruning
  2014-04-10 22:24                   ` Jesus Cea
@ 2014-04-10 22:33                     ` Gregory Maxwell
  0 siblings, 0 replies; 21+ messages in thread
From: Gregory Maxwell @ 2014-04-10 22:33 UTC (permalink / raw)
  To: Jesus Cea; +Cc: Bitcoin Development

On Thu, Apr 10, 2014 at 3:24 PM, Jesus Cea <jcea@jcea•es> wrote:
> On 11/04/14 00:15, Mark Friedenbach wrote:
>> Checkpoints will go away, eventually.
> Why?. The points in the forum thread seem pretty sensible.

Because with headers first synchronization the major problems that
they solve— e.g. block flooding DOS attacks, weak chain isolation, and
checking shortcutting can be addressed in other more efficient ways
that don't result in putting trust in third parties.

They also cause really severe confusion about the security model.

Instead you can embed in software knoweldge that the longest chain is
"at least this long" to prevent isolation attacks, which is a lot
simpler and less trusting.  You can also do randomized validation of
the deeply burried old history for performance, instead of constantly
depending on 'trusted parties' to update software or it gets slower
over time, and locally save your own validation fingerprints so if you
need to reinitilize data you can remember what you've check so far by
hash.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2014-04-10 22:33 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-10 11:37 [Bitcoin-development] Chain pruning Mike Hearn
2014-04-10 11:57 ` Wladimir
2014-04-10 12:10   ` Gregory Maxwell
2014-04-10 14:19     ` Wladimir
2014-04-10 16:23       ` Brian Hoffman
2014-04-10 16:28         ` Mike Hearn
2014-04-10 16:47           ` Brian Hoffman
2014-04-10 16:54             ` Ricardo Filipe
2014-04-10 16:56               ` Brian Hoffman
2014-04-10 16:59             ` Pieter Wuille
2014-04-10 17:06               ` Brian Hoffman
2014-04-10 18:19               ` Paul Rabahy
2014-04-10 18:32                 ` Pieter Wuille
2014-04-10 20:12                   ` Tier Nolan
2014-04-10 20:29                     ` Pieter Wuille
2014-04-10 19:36                 ` Mark Friedenbach
2014-04-10 21:34               ` Jesus Cea
2014-04-10 22:15                 ` Mark Friedenbach
2014-04-10 22:24                   ` Jesus Cea
2014-04-10 22:33                     ` Gregory Maxwell
2014-04-10 16:52           ` Ricardo Filipe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox