Big Concept ACK. I think this would be one of the biggest usability improvements for Bitcoin and I see no security issues with the assumevalid approach. I also agree that it's important to start work on this even before the ultimate, perfect accumulator has been designed/tested and the commitment scheme can always be upgraded later on. assumeutxo syncing actually seems pretty orthogonal to the accumulator research.

I have a few questions

- So any nodes that do an initial sync will stop at the assumeutxo height, serialize a snapshot of the chain state and store it? How many nodes are expected to do this? Any idea how long this takes? Should it be enabled by default?

- Would pruned nodes still download all historic blocks to double-check the snapshot or only full nodes that intend to serve block data?

- How long are old snapshots retained? Presumably during a new release nodes should keep at least a version back. Without P2P signalling of which snapshots are available, they maybe have to keep all old snapshots or even download old ones.

and comments

- The snapshot should probably be chunked up to minimize the amount of bandwidth/IO/memory a malicious node could waste before you realize. Also, it would make parallel downloading easier.

On Tue, Apr 2, 2019 at 4:43 PM James O'Beirne via bitcoin-dev <bitcoin-dev@lists.linuxfoundation.org> wrote:

Hi,

I'd like to discuss assumeutxo, which is an appealing and simple
optimization in the spirit of assumevalid[0].

# Motivation

To start a fully validating bitcoin client from scratch, that client currently
needs to perform an initial block download. To the surprise of no one, IBD
takes a linear amount time based on the length of the chain's history. For
clients running on modest hardware under limited bandwidth constraints,
say a mobile device, completing IBD takes a considerable amount of time
and thus poses serious usability challenges.

As a result, having fully validating clients run on such hardware is rare and
basically unrealistic. Clients with even moderate resource constraints
are encouraged to rely on the SPV trust model. Though we have promising
improvements to existing SPV modes pending deployment[1], it's worth
thinking about a mechanism that would allow such clients to use trust
models closer to full validation.

The subject of this mail is a proposal for a complementary alternative to SPV
modes, and which is in the spirit of an existing default, `assumevalid`. It may
help modest clients transact under a security model that closely resembles
full validation within minutes instead of hours or days.

# assumeutxo

The basic idea is to allow nodes to initialize using a serialized version of the
UTXO set rendered by another node at some predetermined height. The
initializing node syncs the headers chain from the network, then obtains and
loads one of these UTXO snapshots (i.e. a serialized version of the UTXO set
bundled with the block header indicating its "base" and some other metadata).

Based upon the snapshot, the node is able to quickly reconstruct its chainstate,
and compares a hash of the resulting UTXO set to a preordained hash hard-coded
in the software a la assumevalid. This all takes ~23 minutes, not accounting for
download of the 3.2GB snapshot[2].

The node then syncs to the network tip and afterwards begins a simultaneous
background validation (i.e., a conventional IBD) up to the base height of the
snapshot in order to achieve full validation. Crucially, even while the
background validation is happening the node can validate incoming blocks and
transact with the benefit of the full (assumed-valid) UTXO set.

Snapshots could be obtained from multiple separate peers in the same manner as
block download, but I haven't put much thought into this. In concept it doesn't
matter too much where the snapshots come from since their validity is
determined via content hash.

# Security

Obviously there are some security implications due consideration. While this
proposal is in the spirit of assumevalid, practical attacks may become easier.
Under assumevalid, a user can be tricked into transacting under a false history
if an attacker convinces them to start bitcoind with a malicious `-assumevalid`
parameter, sybils their node, and then feeds them a bogus chain encompassing
all of the hard-coded checkpoints[3].

The same attack is made easier in assumeutxo because, unlike in assumevalid,
the attacker need not construct a valid PoW chain to get the victim's node into
a false state; they simply need to get the user to accept a bad `-assumeutxo`
parameter and then supply them an easily made UTXO snapshot containing, say, a
false coin assignment.

For this reason, I recommend that if we were to implement assumeutxo, we not
allow its specification via commandline argument[4].

Beyond this risk, I can't think of material differences in security relative to
assumevalid, though I appeal to the list for help with this.

# More fully validating clients

A particularly exciting use-case for assumeutxo is the possibility of mobile
devices functioning as fully validating nodes with access to the complete UTXO
set (as an alternative to SPV models). The total resource burden needed to start a node
from scratch based on a snapshot is, at time of writing, a ~(3.2GB
+ blocks_to_tip * 4MB) download and a few minutes of processing time, which sounds
manageable for many mobile devices currently in use.

A mobile user could initialize an assumed-valid bitcoin node within an hour,
transact immediately, and complete a pruned full validation of their
assumed-valid chain over the next few days, perhaps only doing the background
IBD when their device has access to suitable high-bandwidth connections.

If we end up implementing an accumulator-based UTXO scaling design[5][6] down
the road, it's easy to imagine an analogous process that would allow very fast
startup using an accumulator of a few kilobytes in lieu of a multi-GB snapshot.

---

I've created a related issue at our Github repository here:
https://github.com/bitcoin/bitcoin/issues/15605

and have submitted a draft implementation of snapshot usage via RPC here:
https://github.com/bitcoin/bitcoin/pull/15606

I'd like to discuss here whether this is a good fit for Bitcoin conceptually. Concrete
plans for deployment steps should be discussed in the Github issue, and after all
that my implementation may be reviewed as a sketch of the specific software
changes necessary.

Regards,
James

[0]: https://bitcoincore.org/en/2017/03/08/release-0.14.0/#assumed-valid-blocks
[1]: https://github.com/bitcoin/bips/blob/master/bip-0157.mediawiki
[2]: as tested at height 569895, on a 12 core Intel Xeon Silver 4116 CPU @ 2.10GHz
[3]: https://github.com/bitcoin/bitcoin/blob/84d0fdc/src/chainparams.cpp#L145-L161
[4]: Marco Falke is due credit for this point
[5]: utreexo: https://www.youtube.com/watch?v=edRun-6ubCc
[6]: Boneh, Bunz, Fisch on accumulators: https://eprint.iacr.org/2018/1188

_______________________________________________
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev