public inbox for bitcoindev@googlegroups.com
 help / color / mirror / Atom feed
* [bitcoin-dev] Rolling UTXO set hashes
@ 2017-05-15 20:01 Pieter Wuille
  2017-05-15 20:53 ` Peter R
  2017-05-15 23:04 ` ZmnSCPxj
  0 siblings, 2 replies; 10+ messages in thread
From: Pieter Wuille @ 2017-05-15 20:01 UTC (permalink / raw)
  To: Bitcoin Dev

Hello all,

I would like to discuss a way of computing a UTXO set hash that is
very efficient to update, but does not support any compact proofs of
existence or non-existence.

Much has been written on the topic of various data structures and
derived hashes for the UTXO/TXO set before (including Alan Reiner's
trust-free lite nodes [1], Peter Todd's TXO MMR commitments [2] [3],
or Bram Cohen's TXO bitfield [4]). They all provide interesting extra
functionality or tradeoffs, but require invasive changes to the P2P
protocol or how wallets work, or force nodes to maintain their
database in a normative fashion. Instead, here I focus on an efficient
hash that supports nothing but comparing two UTXO sets. However, it is
not incompatible with any of those other approaches, so we can gain
some of the advantages of a UTXO hash without adopting something that
may be incompatible with future protocol enhancements.

1. Incremental hashing

Computing a hash of the UTXO set is easy when it does not need
efficient updates, and when we can assume a fixed serialization with a
normative ordering for the data in it - just serialize the whole thing
and hash it. As different software or releases may use different
database models for the UTXO set, a solution that is order-independent
would seem preferable.

This brings us to the problem of computing a hash of unordered data.
Several approaches that accomplish this through incremental hashing
were suggested in [5], including XHASH, AdHash, and MuHash. XHASH
consists of first hashing all the set elements independently, and
XORing all those hashes together. This is insecure, as Gaussian
elimination can easily find a subset of random hashes that XOR to a
given value. AdHash/MuHash are similar, except addition/multiplication
modulo a large prime are used instead of XOR. Wagner [6] showed that
attacking XHASH or AdHash is an instance of a generalized birthday
problem (called the k-sum problem in his paper, with unrestricted k),
and gives a O(2^(2*sqrt(n)-1)) algorithm to attack it (for n-bit
hashes). As a result, AdHash with 256-bit hashes only has 31 bits of
security.

Thankfully, [6] also shows that the k-sum problem cannot be
efficiently solved in groups in which the discrete logarithm problem
is hard, as an efficient k-sum solver can be used to compute discrete
logarithms. As a result, MuHash modulo a sufficiently large safe prime
is provably secure under the DL assumption. Common guidelines on
security parameters [7] say that 3072-bit DL has about 128 bits of
security. A final 256-bit hash can be applied to the 3072-bit result
without loss of security to reduce the final size.

An alternative to multiplication modulo a prime is using an elliptic
curve group. Due to the ECDLP assumption, which the security of
Bitcoin signatures already relies on, this also results in security
against k-sum solving. This approach is used in the Elliptic Curve
Multiset Hash (ECMH) in [8]. For this to work, we must "hash onto a
curve point" in a way that results in points without known discrete
logarithm. The paper suggests using (controversial) binary elliptic
curves to make that operation efficient. If we only consider
secp256k1, one approach is just reading potential X coordinates from a
PRNG until one is found that has a corresponding Y coordinate
according to the curve equation. On average, 2 iterations are needed.
A constant time algorithm to hash onto the curve exists as well [9],
but it is only slightly faster and is much more complicated to
implement.

AdHash-like constructions with a sufficiently large intermediate hash
can be made secure against Wagner's algorithm, as suggested in [10].
4160-bit hashes would be needed for 128 bits of security. When
repetition is allowed, [8] gives a stronger attack against AdHash,
suggesting that as much as 400000 bits are needed. While repetition is
not directly an issue for our use case, it would be nice if
verification software would not be required to check for duplicated
entries.

2. Efficient addition and deletion

Interestingly, both ECMH and MuHash not only support adding set
elements in any order but also deleting in any order. As a result, we
can simply maintain a running sum for the UTXO set as a whole, and
add/subtract when creating/spending an output in it. In the case of
MuHash it is slightly more complicated, as computing an inverse is
relatively expensive. This can be solved by representing the running
value as a fraction, and multiplying created elements into the
numerator and spent elements into the denominator. Only when the final
hash is desired, a single modular inverse and multiplication is needed
to combine the two.

As the update operations are also associative, H(a)+H(b)+H(c)+H(d) can
in fact be computed as (H(a)+H(b)) + (H(c)+H(d)). This implies that
all of this is perfectly parallellizable: each thread can process an
arbitrary subset of the update operations, allowing them to be
efficiently combined later.

3. Comparison of approaches

Numbers below are based on preliminary benchmarks on a single thread
of a i7-6820HQ CPU running at 3.4GHz.

(1) (MuHash) Multiplying 3072-bit hashes mod 2^3072 - 1103717 (the
largest 3072-bit safe prime).
    * Needs a fast modular multiplication/inverse implementation.
    * Using SHA512 + ChaCha20 for generating the hashes takes 1.2us per element.
    * Modular multiplication using GMP takes 1.5us per element (2.5us
with a 60-line C+asm implementation).
    * 768 bytes for maintaining a running sum (384 for numerator, 384
for denominator)
    * Very common security assumption. Even if the DL assumption would
be broken (but no k-sum algorithm faster than Wagner's is found), this
still maintains 110 bits of security.

(2) (ECMH) Adding secp256k1 EC points
    * Much more complicated than the previous approaches when
implementing from scratch, but almost no extra complexity when ECDSA
secp256k1 signature validation is already implemented.
    * Using SHA512 + libsecp256k1's point decompression for generating
the points takes 11us per element on average.
    * Addition/subtracting of N points takes 5.25us + 0.25us*N.
    * 64 bytes for a running sum.
    * Identical security assumption as Bitcoin's signatures.

Using the numbers above, we find that:
* Computing the hash from just the UTXO set takes (1) 2m15s (2) 9m20s
* Processing all creations and spends in an average block takes (1)
24ms (2) 100ms
* Processing precomputed per-transaction aggregates in an average
block takes (1) 3ms (2) 0.5ms

Note that while (2) has higher CPU usage than (1) in general, it has
lower latency when using precomputed per-transaction aggregates. Using
such aggregates is also more feasible as they're only 64 bytes rather
than 768. Because of simplicity, (1) has my preference.

Overall, these numbers are sufficiently low (note that they can be
parallellized) that it would be reasonable for full nodes and/or other
software to always maintain one of them, and effectively have a
rolling cryptographical checksum of the UTXO set at all times.

4. Use cases

* Replacement for Bitcoin Core's gettxoutsetinfo RPC's hash
computation. This currently requires minutes of I/O and CPU, as it
serializes and hashes the entire UTXO set. A rolling set hash would
make this instant, making the whole RPC much more usable for sanity
checking.
* Assisting in implementation of fast sync methods with known good
blocks/UTXO sets.
* Database consistency checking: by remembering the UTXO set hash of
the past few blocks (computed on the fly), a consistency check can be
done that recomputes it based on the database.


  [1] https://bitcointalk.org/index.php?topic=88208.0
  [2] https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2016-May/012715.html
  [3] https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2017-February/013591.html
  [4] https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2017-March/013928.html
  [5] https://cseweb.ucsd.edu/~mihir/papers/inchash.pdf
  [6] https://people.eecs.berkeley.edu/~daw/papers/genbday.html
  [7] https://www.keylength.com/
  [8] https://arxiv.org/pdf/1601.06502.pdf
  [9] https://www.di.ens.fr/~fouque/pub/latincrypt12.pdf
  [10] http://csrc.nist.gov/groups/ST/hash/sha-3/Aug2014/documents/gligoroski_paper_sha3_2014_workshop.pdf

Cheers,

-- 
Pieter


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [bitcoin-dev] Rolling UTXO set hashes
  2017-05-15 20:01 [bitcoin-dev] Rolling UTXO set hashes Pieter Wuille
@ 2017-05-15 20:53 ` Peter R
  2017-05-15 23:04 ` ZmnSCPxj
  1 sibling, 0 replies; 10+ messages in thread
From: Peter R @ 2017-05-15 20:53 UTC (permalink / raw)
  To: Pieter Wuille; +Cc: Bitcoin Dev

Hi Pieter,

I wanted to say that I thought this write-up was excellent!  And efficiently hashing the UTXO set in this rolling fashion is a very exciting idea!! 

Peter R

> On May 15, 2017, at 1:01 PM, Pieter Wuille via bitcoin-dev <bitcoin-dev@lists•linuxfoundation.org> wrote:
> 
> Hello all,
> 
> I would like to discuss a way of computing a UTXO set hash that is
> very efficient to update, but does not support any compact proofs of
> existence or non-existence.
> 
> Much has been written on the topic of various data structures and
> derived hashes for the UTXO/TXO set before (including Alan Reiner's
> trust-free lite nodes [1], Peter Todd's TXO MMR commitments [2] [3],
> or Bram Cohen's TXO bitfield [4]). They all provide interesting extra
> functionality or tradeoffs, but require invasive changes to the P2P
> protocol or how wallets work, or force nodes to maintain their
> database in a normative fashion. Instead, here I focus on an efficient
> hash that supports nothing but comparing two UTXO sets. However, it is
> not incompatible with any of those other approaches, so we can gain
> some of the advantages of a UTXO hash without adopting something that
> may be incompatible with future protocol enhancements.
> 
> 1. Incremental hashing
> 
> Computing a hash of the UTXO set is easy when it does not need
> efficient updates, and when we can assume a fixed serialization with a
> normative ordering for the data in it - just serialize the whole thing
> and hash it. As different software or releases may use different
> database models for the UTXO set, a solution that is order-independent
> would seem preferable.
> 
> This brings us to the problem of computing a hash of unordered data.
> Several approaches that accomplish this through incremental hashing
> were suggested in [5], including XHASH, AdHash, and MuHash. XHASH
> consists of first hashing all the set elements independently, and
> XORing all those hashes together. This is insecure, as Gaussian
> elimination can easily find a subset of random hashes that XOR to a
> given value. AdHash/MuHash are similar, except addition/multiplication
> modulo a large prime are used instead of XOR. Wagner [6] showed that
> attacking XHASH or AdHash is an instance of a generalized birthday
> problem (called the k-sum problem in his paper, with unrestricted k),
> and gives a O(2^(2*sqrt(n)-1)) algorithm to attack it (for n-bit
> hashes). As a result, AdHash with 256-bit hashes only has 31 bits of
> security.
> 
> Thankfully, [6] also shows that the k-sum problem cannot be
> efficiently solved in groups in which the discrete logarithm problem
> is hard, as an efficient k-sum solver can be used to compute discrete
> logarithms. As a result, MuHash modulo a sufficiently large safe prime
> is provably secure under the DL assumption. Common guidelines on
> security parameters [7] say that 3072-bit DL has about 128 bits of
> security. A final 256-bit hash can be applied to the 3072-bit result
> without loss of security to reduce the final size.
> 
> An alternative to multiplication modulo a prime is using an elliptic
> curve group. Due to the ECDLP assumption, which the security of
> Bitcoin signatures already relies on, this also results in security
> against k-sum solving. This approach is used in the Elliptic Curve
> Multiset Hash (ECMH) in [8]. For this to work, we must "hash onto a
> curve point" in a way that results in points without known discrete
> logarithm. The paper suggests using (controversial) binary elliptic
> curves to make that operation efficient. If we only consider
> secp256k1, one approach is just reading potential X coordinates from a
> PRNG until one is found that has a corresponding Y coordinate
> according to the curve equation. On average, 2 iterations are needed.
> A constant time algorithm to hash onto the curve exists as well [9],
> but it is only slightly faster and is much more complicated to
> implement.
> 
> AdHash-like constructions with a sufficiently large intermediate hash
> can be made secure against Wagner's algorithm, as suggested in [10].
> 4160-bit hashes would be needed for 128 bits of security. When
> repetition is allowed, [8] gives a stronger attack against AdHash,
> suggesting that as much as 400000 bits are needed. While repetition is
> not directly an issue for our use case, it would be nice if
> verification software would not be required to check for duplicated
> entries.
> 
> 2. Efficient addition and deletion
> 
> Interestingly, both ECMH and MuHash not only support adding set
> elements in any order but also deleting in any order. As a result, we
> can simply maintain a running sum for the UTXO set as a whole, and
> add/subtract when creating/spending an output in it. In the case of
> MuHash it is slightly more complicated, as computing an inverse is
> relatively expensive. This can be solved by representing the running
> value as a fraction, and multiplying created elements into the
> numerator and spent elements into the denominator. Only when the final
> hash is desired, a single modular inverse and multiplication is needed
> to combine the two.
> 
> As the update operations are also associative, H(a)+H(b)+H(c)+H(d) can
> in fact be computed as (H(a)+H(b)) + (H(c)+H(d)). This implies that
> all of this is perfectly parallellizable: each thread can process an
> arbitrary subset of the update operations, allowing them to be
> efficiently combined later.
> 
> 3. Comparison of approaches
> 
> Numbers below are based on preliminary benchmarks on a single thread
> of a i7-6820HQ CPU running at 3.4GHz.
> 
> (1) (MuHash) Multiplying 3072-bit hashes mod 2^3072 - 1103717 (the
> largest 3072-bit safe prime).
>    * Needs a fast modular multiplication/inverse implementation.
>    * Using SHA512 + ChaCha20 for generating the hashes takes 1.2us per element.
>    * Modular multiplication using GMP takes 1.5us per element (2.5us
> with a 60-line C+asm implementation).
>    * 768 bytes for maintaining a running sum (384 for numerator, 384
> for denominator)
>    * Very common security assumption. Even if the DL assumption would
> be broken (but no k-sum algorithm faster than Wagner's is found), this
> still maintains 110 bits of security.
> 
> (2) (ECMH) Adding secp256k1 EC points
>    * Much more complicated than the previous approaches when
> implementing from scratch, but almost no extra complexity when ECDSA
> secp256k1 signature validation is already implemented.
>    * Using SHA512 + libsecp256k1's point decompression for generating
> the points takes 11us per element on average.
>    * Addition/subtracting of N points takes 5.25us + 0.25us*N.
>    * 64 bytes for a running sum.
>    * Identical security assumption as Bitcoin's signatures.
> 
> Using the numbers above, we find that:
> * Computing the hash from just the UTXO set takes (1) 2m15s (2) 9m20s
> * Processing all creations and spends in an average block takes (1)
> 24ms (2) 100ms
> * Processing precomputed per-transaction aggregates in an average
> block takes (1) 3ms (2) 0.5ms
> 
> Note that while (2) has higher CPU usage than (1) in general, it has
> lower latency when using precomputed per-transaction aggregates. Using
> such aggregates is also more feasible as they're only 64 bytes rather
> than 768. Because of simplicity, (1) has my preference.
> 
> Overall, these numbers are sufficiently low (note that they can be
> parallellized) that it would be reasonable for full nodes and/or other
> software to always maintain one of them, and effectively have a
> rolling cryptographical checksum of the UTXO set at all times.
> 
> 4. Use cases
> 
> * Replacement for Bitcoin Core's gettxoutsetinfo RPC's hash
> computation. This currently requires minutes of I/O and CPU, as it
> serializes and hashes the entire UTXO set. A rolling set hash would
> make this instant, making the whole RPC much more usable for sanity
> checking.
> * Assisting in implementation of fast sync methods with known good
> blocks/UTXO sets.
> * Database consistency checking: by remembering the UTXO set hash of
> the past few blocks (computed on the fly), a consistency check can be
> done that recomputes it based on the database.
> 
> 
>  [1] https://bitcointalk.org/index.php?topic=88208.0
>  [2] https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2016-May/012715.html
>  [3] https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2017-February/013591.html
>  [4] https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2017-March/013928.html
>  [5] https://cseweb.ucsd.edu/~mihir/papers/inchash.pdf
>  [6] https://people.eecs.berkeley.edu/~daw/papers/genbday.html
>  [7] https://www.keylength.com/
>  [8] https://arxiv.org/pdf/1601.06502.pdf
>  [9] https://www.di.ens.fr/~fouque/pub/latincrypt12.pdf
>  [10] http://csrc.nist.gov/groups/ST/hash/sha-3/Aug2014/documents/gligoroski_paper_sha3_2014_workshop.pdf
> 
> Cheers,
> 
> -- 
> Pieter
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists•linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [bitcoin-dev] Rolling UTXO set hashes
  2017-05-15 20:01 [bitcoin-dev] Rolling UTXO set hashes Pieter Wuille
  2017-05-15 20:53 ` Peter R
@ 2017-05-15 23:04 ` ZmnSCPxj
  2017-05-15 23:59   ` Gregory Maxwell
  1 sibling, 1 reply; 10+ messages in thread
From: ZmnSCPxj @ 2017-05-15 23:04 UTC (permalink / raw)
  To: Pieter Wuille; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 924 bytes --]

Good morning Pieter,

>4. Use cases
>
>* Replacement for Bitcoin Core's gettxoutsetinfo RPC's hash
>computation. This currently requires minutes of I/O and CPU, as it
>serializes and hashes the entire UTXO set. A rolling set hash would
>make this instant, making the whole RPC much more usable for sanity
>checking.
>* Assisting in implementation of fast sync methods with known good
>blocks/UTXO sets.
>* Database consistency checking: by remembering the UTXO set hash of
>the past few blocks (computed on the fly), a consistency check can be
>done that recomputes it based on the database.

Another use case I can think of is a potential "chain-flip" hard fork of block header formats, where the UTXO hash rather than merkle tree root of transactions is in the header, which would let lite nodes download a UTXO set from any full node and verify it by verifying only block headers starting from genesis.

Regards,
ZmnSCPxj

[-- Attachment #2: Type: text/html, Size: 1240 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [bitcoin-dev] Rolling UTXO set hashes
  2017-05-15 23:04 ` ZmnSCPxj
@ 2017-05-15 23:59   ` Gregory Maxwell
  2017-05-16  0:15     ` ZmnSCPxj
  2017-05-16 11:01     ` Peter Todd
  0 siblings, 2 replies; 10+ messages in thread
From: Gregory Maxwell @ 2017-05-15 23:59 UTC (permalink / raw)
  To: ZmnSCPxj; +Cc: Bitcoin Dev

On Mon, May 15, 2017 at 11:04 PM, ZmnSCPxj via bitcoin-dev
<bitcoin-dev@lists•linuxfoundation.org> wrote:
> transactions is in the header, which would let lite nodes download a UTXO
> set from any full node and verify it by verifying only block headers
> starting from genesis.

Ya, lite nodes with UTXO sets are one of the the oldest observed
advantages of a commitment to the UTXO data:

https://bitcointalk.org/index.php?topic=21995.0

But it requires a commitment. And for most of the arguments for those
you really want compact membership proofs.  The recent rise in
interest in full block lite clients (for privacy reasons), perhaps
complements the membership proofless usage.

Pieter describes some uses for doing something like this without a
commitment.  In my view, it's more interesting to first gain
experience with an operation without committing to it (which is a
consensus change and requires more care and consideration, which are
easier if people have implementation experience).

> rather than merkle tree root of transactions is in the header,

For audibility and engineering reasons it would need to be be in
addition to rather than rather than, because the proof of work needs
to commit to the witness data (in that kind of flip, the transactions
themselves become witnesses for UTXO deltas) or you get trivial DOS
attacks where people provide malleated blocks that have invalid
witnesses.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [bitcoin-dev] Rolling UTXO set hashes
  2017-05-15 23:59   ` Gregory Maxwell
@ 2017-05-16  0:15     ` ZmnSCPxj
  2017-05-16 11:01     ` Peter Todd
  1 sibling, 0 replies; 10+ messages in thread
From: ZmnSCPxj @ 2017-05-16  0:15 UTC (permalink / raw)
  To: Gregory Maxwell; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 2076 bytes --]

>On Mon, May 15, 2017 at 11:04 PM, ZmnSCPxj via bitcoin-dev
><bitcoin-dev@lists•linuxfoundation.org> wrote:
>> transactions is in the header, which would let lite nodes download a UTXO
>> set from any full node and verify it by verifying only block headers
>> starting from genesis.
>
>Ya, lite nodes with UTXO sets are one of the the oldest observed
>advantages of a commitment to the UTXO data:
>
>https://bitcointalk.org/index.php?topic=21995.0
>
>But it requires a commitment. And for most of the arguments for those
>you really want compact membership proofs. The recent rise in
>interest in full block lite clients (for privacy reasons), perhaps
>complements the membership proofless usage.
>
>Pieter describes some uses for doing something like this without a
>commitment. In my view, it's more interesting to first gain
>experience with an operation without committing to it (which is a
>consensus change and requires more care and consideration, which are
>easier if people have implementation experience).

I understand. Thank you for your explanation.

>> rather than merkle tree root of transactions is in the header,
>
>For audibility and engineering reasons it would need to be be in
>addition to rather than rather than, because the proof of work needs
>to commit to the witness data (in that kind of flip, the transactions
>themselves become witnesses for UTXO deltas) or you get trivial DOS
>attacks where people provide malleated blocks that have invalid
>witnesses.

Another thought I have, is that instead of committing to the UTXO of the block, to commit to the UTXO of the previous block, and the merkle tree root of the transactions in the current block.

My thought is that this would help reduce SPV mining, as a miner would need to actually scan any received new blocks in order to create the UTXO set of the previous block. An empty block would make things easier for the next block's miner, not the current block's miner. However, I'm not sure if my understanding is correct, or if there is some subtlety I missed in this regard.

Regards,
ZmnSCPxj

[-- Attachment #2: Type: text/html, Size: 2790 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [bitcoin-dev] Rolling UTXO set hashes
  2017-05-15 23:59   ` Gregory Maxwell
  2017-05-16  0:15     ` ZmnSCPxj
@ 2017-05-16 11:01     ` Peter Todd
  2017-05-16 18:17       ` Pieter Wuille
  1 sibling, 1 reply; 10+ messages in thread
From: Peter Todd @ 2017-05-16 11:01 UTC (permalink / raw)
  To: Gregory Maxwell; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 1715 bytes --]

On Mon, May 15, 2017 at 11:59:58PM +0000, Gregory Maxwell via bitcoin-dev wrote:
> On Mon, May 15, 2017 at 11:04 PM, ZmnSCPxj via bitcoin-dev
> <bitcoin-dev@lists•linuxfoundation.org> wrote:
> > transactions is in the header, which would let lite nodes download a UTXO
> > set from any full node and verify it by verifying only block headers
> > starting from genesis.
> 
> Ya, lite nodes with UTXO sets are one of the the oldest observed
> advantages of a commitment to the UTXO data:
> 
> https://bitcointalk.org/index.php?topic=21995.0
> 
> But it requires a commitment. And for most of the arguments for those
> you really want compact membership proofs.  The recent rise in
> interest in full block lite clients (for privacy reasons), perhaps
> complements the membership proofless usage.
> 
> Pieter describes some uses for doing something like this without a
> commitment.  In my view, it's more interesting to first gain
> experience with an operation without committing to it (which is a
> consensus change and requires more care and consideration, which are
> easier if people have implementation experience).

To be clear, *none* of the previous (U)TXO commitment schemes require *miners*
to participate in generating a commitment. While that was previously thought to
be true by many, I've seen no counter-arguments to the argument I published I
few months ago(1) that (U)TXO commitments did not require a soft-fork to
deploy.

1) "[bitcoin-dev] TXO commitments do not need a soft-fork to be useful",
   Peter Todd, Feb 23 2017,
   https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2017-February/013591.html

-- 
https://petertodd.org 'peter'[:-1]@petertodd.org

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [bitcoin-dev] Rolling UTXO set hashes
  2017-05-16 11:01     ` Peter Todd
@ 2017-05-16 18:17       ` Pieter Wuille
  2017-05-16 18:20         ` Gregory Maxwell
  0 siblings, 1 reply; 10+ messages in thread
From: Pieter Wuille @ 2017-05-16 18:17 UTC (permalink / raw)
  To: Peter Todd; +Cc: Bitcoin Dev

On Tue, May 16, 2017 at 4:01 AM, Peter Todd via bitcoin-dev
<bitcoin-dev@lists•linuxfoundation.org> wrote:
> To be clear, *none* of the previous (U)TXO commitment schemes require *miners*
> to participate in generating a commitment. While that was previously thought to
> be true by many, I've seen no counter-arguments to the argument I published I
> few months ago(1) that (U)TXO commitments did not require a soft-fork to
> deploy.
>
> 1) "[bitcoin-dev] TXO commitments do not need a soft-fork to be useful",
>    Peter Todd, Feb 23 2017,
>    https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2017-February/013591.html

I'm aware, I agree, and I even referenced that mail in my original post.

However, all of those approaches still require a network wide choice
to be useful. A validating node that does not maintain a UTXO X must
get a proof of its unspentness from somewhere for at least the block
which contains a spend of X. In a world where such a model is deployed
network-wide, that proof information is generated by the wallet and
relayed wherever needed. In a partial deployment however, you need
nodes that can produce the proof for other nodes, and the ability to
produce a proof is significantly more expensive than running either an
old or a new full node.

This ability to produce proofs becomes even harder when there are
different models deployed at once. Even just having a different
criterion for which UTXOs need a proof (eg. "only outputs created more
than 1000 blocks ago") may already cause compatibility issues. Combine
that with the multitude of ideas about this (insertion-ordered TXO
trees, txid-ordered UTXO Patricia tries, AVL+ trees, append-only
bitfield, ...) with different trade-offs (in CPU, RAM for validators,
complexity for wallets/index services, ...), I don't think we're quite
ready to make that choice.

To be clear: I'm very much in favor of moving to a model where the
responsibilities of full nodes are reduced in the long term. But
before that can happen there will need to be implementations,
experiments, analysis, ...

Because of that, I think it is worthwhile to investigate solutions to
the "how can we efficiently compare UTXO sets" problem separately from
the "how do we reduce full node costs by sending proofs instead of it
maintaining the data". And rolling UTXO set hashes are a solution for
just the first - and one that has very low costs and no normative
datastructures at all.

-- 
Pieter


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [bitcoin-dev] Rolling UTXO set hashes
  2017-05-16 18:17       ` Pieter Wuille
@ 2017-05-16 18:20         ` Gregory Maxwell
  2017-05-23  4:47           ` Rusty Russell
  0 siblings, 1 reply; 10+ messages in thread
From: Gregory Maxwell @ 2017-05-16 18:20 UTC (permalink / raw)
  To: Pieter Wuille; +Cc: Bitcoin Dev

On Tue, May 16, 2017 at 6:17 PM, Pieter Wuille <pieter.wuille@gmail•com> wrote:
> just the first - and one that has very low costs and no normative
> datastructures at all.

The serialization of the txout itself is normative, but very minimal.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [bitcoin-dev] Rolling UTXO set hashes
  2017-05-16 18:20         ` Gregory Maxwell
@ 2017-05-23  4:47           ` Rusty Russell
  2017-05-23 20:43             ` Pieter Wuille
  0 siblings, 1 reply; 10+ messages in thread
From: Rusty Russell @ 2017-05-23  4:47 UTC (permalink / raw)
  To: Gregory Maxwell, Pieter Wuille; +Cc: Bitcoin Dev

Gregory Maxwell via bitcoin-dev <bitcoin-dev@lists•linuxfoundation.org> writes:
> On Tue, May 16, 2017 at 6:17 PM, Pieter Wuille <pieter.wuille@gmail•com> wrote:
>> just the first - and one that has very low costs and no normative
>> datastructures at all.
>
> The serialization of the txout itself is normative, but very minimal.

I do prefer the (2) approach, BTW, as it reuses existing primitives, but
I know "simpler" means a different thing to mathier brains :)

Since it wasn't explicit in the proposal, I think the txout information
placed in the hash here is worth discussing.

I prefer a simple txid||outnumber[1], because it allows simple validation
without knowing the UTXO set itself; even a lightweight node can assert
that UTXOhash for block N+1 is valid if the UTXOhash for block N is
valid (and vice versa!) given block N+1.  And miners can't really use
that even if they were to try not validating against UTXO (!) because
they need to know input amounts for fees (which are becoming
significant).

If I want to hand you the complete validatable UTXO set, I need to hand
you all the txs with any unspent output, and some bitfield to indicate
which ones are unspent.

OTOH, if you serialize more (eg. ...||amount||scriptPubKey ?), then the UTXO
set size needed to validate the utxohash is a little smaller: you need
to send the txid, but not the tx nVersion, nLocktime or inputs.  But in a
SegWit world, that's actually *bigger* AFAICT.

Thanks,
Rusty.

[1] I think you could actually use txid^outnumber, and if that's not a
    curve point SHA256() again, etc.  But I don't think that saves any
    real time, and may cause other issues.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [bitcoin-dev] Rolling UTXO set hashes
  2017-05-23  4:47           ` Rusty Russell
@ 2017-05-23 20:43             ` Pieter Wuille
  0 siblings, 0 replies; 10+ messages in thread
From: Pieter Wuille @ 2017-05-23 20:43 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Bitcoin Dev

On Mon, May 22, 2017 at 9:47 PM, Rusty Russell <rusty@rustcorp•com.au> wrote:
> Gregory Maxwell via bitcoin-dev <bitcoin-dev@lists•linuxfoundation.org> writes:
>> On Tue, May 16, 2017 at 6:17 PM, Pieter Wuille <pieter.wuille@gmail•com> wrote:
>>> just the first - and one that has very low costs and no normative
>>> datastructures at all.
>>
>> The serialization of the txout itself is normative, but very minimal.
>
> I do prefer the (2) approach, BTW, as it reuses existing primitives, but
> I know "simpler" means a different thing to mathier brains :)

Oh, I didn't mean it that way at all. (1) is simpler to get decent
performance out of. Implementing (1) using any language that has big
integer support or can link against GMP is likely going to be faster
than the fastest possible implementation of (2).

> Since it wasn't explicit in the proposal, I think the txout information
> placed in the hash here is worth discussing.
>
> I prefer a simple txid||outnumber[1], because it allows simple validation
> without knowing the UTXO set itself; even a lightweight node can assert
> that UTXOhash for block N+1 is valid if the UTXOhash for block N is
> valid (and vice versa!) given block N+1.  And miners can't really use
> that even if they were to try not validating against UTXO (!) because
> they need to know input amounts for fees (which are becoming
> significant).
>
> If I want to hand you the complete validatable UTXO set, I need to hand
> you all the txs with any unspent output, and some bitfield to indicate
> which ones are unspent.

That seems to completely defeat the purpose... if I want to give you a
UTXO set, and prove its correctness wrt the hash you know... I need to
remember the full transactions those outputs came from?

> OTOH, if you serialize more (eg. ...||amount||scriptPubKey ?), then the UTXO
> set size needed to validate the utxohash is a little smaller: you need
> to send the txid, but not the tx nVersion, nLocktime or inputs.  But in a
> SegWit world, that's actually *bigger* AFAICT.

That's an interesting idea, but I believe you're forgetting:
* The size of txin prevout/nsequence, which is typically larger than
txouts (even when excluding scriptSig/witness data).
* The size of spent txouts for transactions with unspent outputs left.
* The fact that you can deduplicate the txids for txn that have
multiple unspent outputs in the UTXO set serialization, even if that
txid is repeated in the rolling hash computation.

The construction I was considering and benchmarking is using 256-bit
truncated SHA512(256bit txid || 32bit voutindex || 1bit coinbase ||
31bit height || CTxOut output) as secp256k1 X coordinate, or as key to
seed a ChaCha20 PRNG whose outputs is the 3072-bit MuHash number. The
reason for using SHA512 is that it can process most UTXOs in a single
transformation (as opposed to SHA256 which will almost always need 2).
The reason for using ChaCha20 is that it's incredibly fast for
producing much data when a key is already known. An alternative is
using SHAKE256 for the whole construction (as it both takes an
arbitrary amount of data, and produces an arbitrary length hash) - but
it's a bit slower.

> Thanks,
> Rusty.
>
> [1] I think you could actually use txid^outnumber, and if that's not a
>     curve point SHA256() again, etc.  But I don't think that saves any
>     real time, and may cause other issues.

That just seems scary to me...

-- 
Pieter


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-05-23 20:43 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-15 20:01 [bitcoin-dev] Rolling UTXO set hashes Pieter Wuille
2017-05-15 20:53 ` Peter R
2017-05-15 23:04 ` ZmnSCPxj
2017-05-15 23:59   ` Gregory Maxwell
2017-05-16  0:15     ` ZmnSCPxj
2017-05-16 11:01     ` Peter Todd
2017-05-16 18:17       ` Pieter Wuille
2017-05-16 18:20         ` Gregory Maxwell
2017-05-23  4:47           ` Rusty Russell
2017-05-23 20:43             ` Pieter Wuille

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox