Re: [bitcoin-dev] TXO commitments do not need a soft-fork to be useful

public inbox for bitcoindev@googlegroups.com
 help / color / mirror / Atom feed

From: Peter Todd <pete@petertodd•org>
To: Bitcoin Protocol Discussion <bitcoin-dev@lists•linuxfoundation.org>
Subject: Re: [bitcoin-dev] TXO commitments do not need a soft-fork to be useful
Date: Thu, 23 Feb 2017 02:23:10 -0500	[thread overview]
Message-ID: <20170223072310.GA3122@savin.petertodd.org> (raw)
In-Reply-To: <20170223011147.GB905@savin.petertodd.org>

[-- Attachment #1: Type: text/plain, Size: 6653 bytes --]

On Wed, Feb 22, 2017 at 08:11:47PM -0500, Peter Todd via bitcoin-dev wrote:
> 5) By *not* committing the TXO commitment in the block itself, we obsolete my
> concept of delayed TXO commitments: you don't need to have calculated the TXO
> commitment digest to validate a block anyway!

Thinking about this a bit more, by not being forced to calculate a TXO
commitment for every block, we may be able to do significantly better than
delayed TXO commitments by lazily hashing.

Suppose we have the following perfect merkle tree, which we're using as a
key-value map. We'll represent inner nodes for which we've calculated digests
with "0"'s to represent what version of the tree they correspond too:

               0
              / \
             /   \
            /     \
           /       \
          /         \
         0           0
        / \         / \
       /   \       /   \
      0     0     0     0
     / \   / \   / \   / \
    a   b c   d e   f g   h

If a value is updated, digests above it become out of date and need to be
recalculated:

               1
              / \
             /   \
            /     \
           /       \
          /         \
         0           1
        / \         / \
       /   \       /   \
      0     0     0     1
     / \   / \   / \   / \
    a   b c   d e   f g   H

               2
              / \
             /   \
            /     \
           /       \
          /         \
         0           2
        / \         / \
       /   \       /   \
      0     0     2     1
     / \   / \   / \   / \
    A   b c   d e   F g   H

               3
              / \
             /   \
            /     \
           /       \
          /         \
         0           3
        / \         / \
       /   \       /   \
      0     0     2     3
     / \   / \   / \   / \
    a   b c   d e   F G   H

Suppose however that your implementation does lazy hashing; after the 3rd
update your state will be:

               .
              / \
             /   \
            /     \
           /       \
          /         \
         0           .
        / \         / \
       /   \       /   \
      0     0     .     .
     / \   / \   / \   / \
    a   b c   d e   F G   H

Basically all the digests on the right side is out of date and need to be
recalculated. Now, first of all it's obviously possible for your implementation
to keep updating values in the tree given their keys - you've essentially
regressed to a bog standard binary tree.

But what happens if you discard part of your dataset? Let's suppose you've
discarded the left half:

               .
              / \
             /   \
            /     \
           /       \
          /         \
         0           .
                    / \
                   /   \
                  .     .
                 / \   / \
                e   F G   H

Note how you still have sufficient information to calculate the current merkle
tip commitment: the left side hasn't changed yet. But what happens when someone
gives you an update proof? Specifically, suppose they want to change b -> B.
That requires them to provide you with the part of the merkle tree proving that
position #1 is b. Now you might think that's this data:

               3
              / \
             /   \
            /     \
           /       \
          /         \
         0           3
        / \
       /   \
      0     0
     / \
    a   b

But the inner node digests marked "3" are useless to you: you haven't
calculated those digests yet so you can't compare them to anything. What you
can compare is the following:

         0
        / \
       /   \
      0     0
     / \
    a   b

With that extra data your local knowledge is now:

               .
              / \
             /   \
            /     \
           /       \
          /         \
         0           .
        / \         / \
       /   \       /   \
      0     0     .     .
     / \         / \   / \
    a   b       e   F G   H

Allowing you to apply the update:

               .
              / \
             /   \
            /     \
           /       \
          /         \
         .           .
        / \         / \
       /   \       /   \
      .     0     .     .
     / \         / \   / \
    a   B       e   F G   H

If you want to again prune that data, simply recalculate the digests so you
can verify a copy given to you by a peer in the future:

               .
              / \
             /   \
            /     \
           /       \
          /         \
         4           .
        / \         / \
       /   \       /   \
      4     0     .     .
     / \         / \   / \
    a   B       e   F G   H

And prune, leaving you with:

               .
              / \
             /   \
            /     \
           /       \
          /         \
         4           .
                    / \
                   /   \
                  .     .
                 / \   / \
                e   F G   H

So tl;dr: the reason this works is that we can substitute commitments for
pointers: our merkle tree can also be viewed as a binary tree. So a reasonable
real-world implementation would be to delay computation of digests for anything
we have in RAM, and only compute digests as in-RAM data is flushed to disk.
Equally, on disk we can use standard time-space tradeoffs to only store a
subset of the digests, recalculating the rest on the fly. Given that'd we could
effectively combine both a cryptographic data structure and a standard
pointer-based data structure in one, I suspect we can get good performance out
of this.

The main subtlety of this approach will be how exactly to handle the proofs:
the level of verification possible depends on what digests a given node has
calculated, and we want to avoid making network splitting attacks possible by
attackers deliberately giving nodes proofs with upper digests that are
incorrect, something only some nodes can detect. Not sure yet exactly what's
the right approach there.

Finally, notice how this entire approach depends on schemes like MMR's where
the overall structure of the tree does not change as nodes are added and
updated; it would be much harder to implement this idea for something like a
merklized red-black tree where the structure changes as the tree is rebalanced.

-- 
https://petertodd.org 'peter'[:-1]@petertodd.org

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

next prev parent reply	other threads:[~2017-02-23  7:23 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-23  1:11 Peter Todd
2017-02-23  3:30 ` Eric Lombrozo
2017-02-23  7:23 ` Peter Todd [this message]
2017-05-16 12:15 ` Alex Mizrahi
2017-05-16 12:23   ` Peter Todd
2017-02-28 16:26 praxeology_guy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170223072310.GA3122@savin.petertodd.org \
    --to=pete@petertodd$(echo .)org \
    --cc=bitcoin-dev@lists$(echo .)linuxfoundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox