You should read this: https://bitcointalk.org/index.php?topic=153662.10 On Wed, May 16, 2018 at 7:36 PM, Cory Fields via bitcoin-dev < bitcoin-dev@lists.linuxfoundation.org> wrote: > Tl;dr: Rather than storing all unspent outputs, store their hashes. > Untrusted > peers can supply the full outputs when needed, with very little overhead. > Any attempt to spoof those outputs would be apparent, as their hashes > would not > be present in the hash set. There are many advantages to this, most > apparently > in disk and memory savings, as well as a validation speedup. The primary > disadvantage is a small increase in network traffic. I believe that the > advantages outweigh the disadvantages. > > -- > > Bitcoin’s unspent transaction output set (usually referred to as “The UTXO > set”) has two primary roles: providing proof that previous outputs exist > to be > spent, and providing the actual previous output data for verification when > new > transactions attempts to spend them. These roles are not usually discussed > independently, but as Bram Cohen's TXO Bitfield [0] idea hints, there are > compelling reasons to consider them this way. > > To see why, consider running a node with the following changes: > > - For each new output, gather all extra data that will be needed for > verification when spending it later as an input: the amount, > scriptPubKey, > creation height, coinbaseness, and output type (p2pkh, p2sh, p2wpkh, > etc.). > Call this the Dereferenced Prevout data. > - Create a hash from the concatenation of the new outpoint and the > dereferenced > prevout data. Call this a Unspent Transaction Output Hash. > - Rather than storing the full dereferenced prevout entries in a UTXO set > as is > currently done, instead store their hashes to an Unspent Transaction > Output > Hash Set, or UHS. > - When relaying a transaction, append the dereferenced prevout for each > input. > > Now when a transaction is received, it contains everything needed for > verification, including the input amount, height, and coinbaseness, which > would > have otherwise required a lookup the UTXO set. > > To verify an input's unspentness, again create a hash from the > concatenation of > the referenced outpoint and the provided dereferenced prevout, and check > for > its presence in the UHS. The hash will only be present if a hash of the > exact > same data was previously added to (and not since removed from) the UHS. As > such, we are protected from a peer attempting to lie about the dereferenced > prevout data. > > ### Some benefits of the UHS model > > - Requires no consensus changes, purely a p2p/implementation change. > > - UHS is substantially smaller than a full UTXO set (just over half for the > main chain, see below). In-memory caching can be much more effective as a > result. > > - A block’s transactions can be fully verified before doing a potentially > expensive database lookup for the previous output data. The UHS can be > queried afterwards (or in parallel) to verify previous output inclusion. > > - Entire blocks could potentially be verified out-of-order because all > input > data is provided; only the inclusion checks have to be in-order. > Admittedly > this is likely too complicated to be realistic. > > - pay-to-pubkey outputs are less burdensome on full nodes, since they use > no > more space on-disk than pay-to-pubkey-hash or pay-to-script-hash. > Taproot and > Graftroot outputs may share the same benefits. > > - The burden of holding UTXO data is technically shifted from the > verifiers to > the spender. In reality, full nodes will likely always have a copy as > well, > but conceptually it's a slight improvement to the incentive model. > > - Block data from peers can also be used to roll backwards during a reorg. > This > potentially enables an even more aggressive pruning mode. > > - UTXO storage size grows exactly linearly with UTXO count, as opposed to > growing linearly with UTXO data size. This may be relevant when > considering > new larger output types which would otherwise cause the UTXO Set size to > increase more quickly. > > - The UHS is a simple set, no need for a key-value database. LevelDB could > potentially be dropped as a dependency in some distant future. > > - Potentially integrates nicely with Pieter Wuille's Rolling UTXO set > hashes > [1]. Unspent Transaction Output Hashes would simply be mapped to points > on a > curve before adding them to the set. > > - With the help of inclusion proofs and rolling hashes, libbitcoinconsensus > could potentially safely verify entire blocks. The size of the required > proofs would be largely irrelevant as they would be consumed locally. > > - Others? > > ### TxIn De-duplication > > Setting aside the potential benefits, the obvious drawback of using a UHS > is a > significant network traffic increase. Fortunately, some properties of > transactions can be exploited to offset most of the difference. > > For quick reference: > > p2pkh scriptPubKey: DUP HASH160 [pubkey hash] EQUALVERIFY CHECKSIG > p2pkh scriptSig: [signature] [pubkey] > > p2sh scriptPubKey: HASH160 [script hash] EQUAL > p2sh scriptSig: [signature(s)] [script] > > Notice that if a peer is sending a scriptPubKey and a scriptSig together, > as > they would when using a UHS, there would likely be some redundancy. Using a > p2sh output for example, the scriptPubKey contains the script hash, and the > scriptSig contains the script itself. Therefore when sending dereferenced > prevouts over the wire, any hash which can be computed can be omitted and > only > the preimages sent. > > Non-standard output scripts must be sent in-full, though because they > account > for only ~1% of all current UTXOs, they are rare enough to be ignored here. > > ### Intra-block Script De-duplication > > When transactions are chained together in the same block, dereferenced > prevout > data for these inputs would be redundant, as the full output data is > already > present. For that reason, these dereferenced prevouts can be omitted when > sending over the wire. > > The downside would be a new reconstruction pass requirement prior to > validation. > > ### Data > > Here's some preliminary testing with a naive POC implementation patched > into > Bitcoin Core. Note that the final sizes will depend on optimization of the > serialization format. The format used for these tests is suboptimal for > sure. > Syncing mainnet to block 516346: > > UTXO Node UHS Node > IBD Network Data: 153G 157G > Block disk space: 175G 157G > UTXO disk space : 2.8G 1.6G > Total disk space: 177.8G 158.6G > > The on-disk block-space reduction comes from the elimination of the Undo > data > that Bitcoin Core uses to roll back orphaned blocks. For UHS Nodes, this > data > is merged into to the block data and de-duplicated. > > Compared to the UXTO model, using a UHS reduces disk space by ~12%, yet > only > requires ~5% more data over the wire. > > Experimentation shows intra-block de-duplication to be of little help in > practice, as it only reduces overhead by ~0.2% on mainnet. It could become > more > useful if, for example, CPFP usage increases substantially in the future. > > ### Safety > > Assuming sha256 for the UHS's hash function, I don't believe any > fundamental > changes to Bitcoin's security model are introduced. Because the unspent > transaction output hashes commit to all necessary data, including output > types, > it should not be possible to be tricked into validating using mutated or > forged > inputs. > > ### Compatibility > > Transitioning from the current UTXO model would be annoying, but not > particularly painful. I'll briefly describe my current preferred approach, > but > it makes sense to largely ignore this until there's been some discussion > about > UHS in general. > > A new service-bit should be allocated to indicate that a node is willing to > serve blocks and transactions with dereferenced prevout data appended. Once > connected to a node advertising this feature, nodes would use a new getdata > flag, creating MSG_PREVDATA_BLOCK and MSG_PREVDATA_TX. > > Because current full nodes already have this data readily available in the > block-undo files, it is trivial to append on-the-fly. For that reason, it > would > be easy to backport patches to the current stable version of Bitcoin Core > in > order to enable serving these blocks even before they could be consumed. > This > would avoid an awkward bootstrapping phase where there may only be a few > nodes > available to serve to all new nodes. > > Admittedly I haven't put much thought into the on-disk format, I'd rather > leave > that to a database person. Though it does seem like a reasonable excuse to > consider moving away from LevelDB. > > Wallets would begin holding full prevout data for their unspent outputs, > though > they could probably back-into the data as-is. > > ### Serialization > > I would prefer to delay this discussion until a more high-level discussion > has > been had, otherwise this would be a magnet for nits. The format used to > gather > the data above can be seen in the implementation below. > > It should be noted, though, that the size of a UHS is directly dependent > on the > chosen hash function. Smaller hashes could potentially be used, but I > believe > that to be unwise. > > ### Drawbacks > > The primary drawback of this approach is the ~5% network ovhead. > > Additionally, there is the possibility that a few "bridge nodes" may be > needed > for some time. In a future with a network of only pruned UHS nodes, an old > wallet with no knowledge of its dereferenced prevout data would need to > broadcast an old-style transaction, and have a bridge node append the extra > data before forwarding it along the network. > > I won't speculate further there, except to say that I can't imagine a > transition problem that wouldn't have a straightforward solution. > > Migration issues aside, am I missing any obvious drawbacks? > > ### Implementation > > This code [2] was a quick hack-job, just enough to gather some initial > data. It > builds a UHS in memory and never flushes to disk. Only a single run works, > nasty things will happen upon restart. It should only be viewed in order > to get > an idea of what changes are needed. Only enough for IBD is implemented, > mempool/wallet/rpc are likely all broken. It is definitely not > consensus-safe. > > ### Acknowledgement > > I consider the UHS concept to be an evolution of Bram Cohen's TXO bitfield > idea. Bram also provided invaluable input when initially walking through > the > feasibility of a UHS. > > Pieter Wuille's work on Rolling UTXO set hashes served as a catalyst for > rethinking how the UTXO set may be represented. > > Additional thanks to those at at Financial Crypto and the CoreDev event > afterwards who helped to flesh out the idea: > > Tadge Dryja > Greg Sanders > John Newbery > Neha Narula > Jeremy Rubin > Jim Posen > ...and everyone else who has chimed in. > > > [0] https://lists.linuxfoundation.org/pipermail/bitcoin-dev/ > 2017-March/013928.html > [1] https://lists.linuxfoundation.org/pipermail/bitcoin-dev/ > 2017-May/014337.html > [2] https://github.com/theuni/bitcoin/tree/utxo-set-hash3 > _______________________________________________ > bitcoin-dev mailing list > bitcoin-dev@lists.linuxfoundation.org > https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev >