On Wed, Feb 22, 2017 at 07:07:08PM -0800, Bram Cohen wrote:
> On Wed, Feb 22, 2017 at 5:15 PM, Peter Todd via bitcoin-dev <
> bitcoin-dev@lists.linuxfoundation.org> wrote:
> 
> >
> > With that, notice how proving the soundness of the proofs becomes trivial:
> > if
> > validation is deterministic, it is obviously impossible to construct two
> > different proofs that prove contradictory statements, because a proof is
> > simply
> > part of the data structure itself. Contradiction would imply that the two
> > proofs are different, but that's easily rejected by simply checking the
> > hash of
> > the data.
> >
> 
> My code works this way. Proofs are serialization of a subset of the tree,
> and to validate a proof you ask a single function whether a particular
> value is included in that tree subset, and it answers yes or no, so
> obviously it's impossible for a single value to both validate and not
> validate. The proof code was quite terrifying before I made this change
> (which I did on your suggestion), and it's much cleaner and simpler now. It
> also in principle supports compact proofs of multiple inclusions and
> exclusions in the same serialization of a subset of the tree because the
> upper branches won't have to be repeated. I haven't written code for
> generating those, but the validation code will happily accept them.

That's an improvement, but I think we can do even better if we think of missing
pruned data as analogous to virtual memory: pruned data is the same as a page
that has been swapped to disk, with the magical property that hashing allows us
to swap it back in from an untrusted source.

Thus a proof should actually be whatever data we expect our counterparty to
have flushed, ranging from none at all, to 100% (modulo a root hash). An
implementation should then do operations as normal, using parts of the proof on
an as-needed basis where pruned data is encountered.

Thus if you have a key-value map and do a get() operation, you'd expect the
proof to *not* be what the get operates on, but rather be a *context* argument
to the get() operation. The other way around is actually an example of doing
computations on untrusted data, and bad API design!

> I'm not sure what you mean by MMRs though. Are you talking about MMRs where
> each mountain is a set of diffs to the old things and are periodically
> consolidated? Or do later mountains refer to internals of earlier ones? Or
> do they have 'maybe' values which mean that the earlier mountain should be
> referred to? Are these patricia tries or something flatter and more fixed
> depth?

I'm talking about these MMR's: https://github.com/proofchains/python-proofmarshal/blob/master/proofmarshal/mmr.py

Notably I'm talking about an insertion ordered list, indexed by position, that
supports append and update operations, but *not* insertions; this is different
than what you've recently published re: UTXO commitments. That's a full
key-value map, something MMR's are delibrately are not doing.

Draw out a MMR based on the formal definition you're replying too and you'll
see the new structure.

> My code doesn't keep track of tree size, by the way. It would be trivial to
> add that functionality to the library, and including it in the hashing
> creates complexity and doesn't seem to have any benefit over sending that
> data in a side channel.

Like I say above, you're solving a different problem than MMR's solve.

-- 
https://petertodd.org 'peter'[:-1]@petertodd.org