Matt: That's a good point. I'll do up a chart comparing utxo size at each block, as well as comparing over-the-wire size for each block. I think the period of coalescing earlier this year should be a good example of what you're describing.

Greg: heh, I was wondering how long it would take for someone to point out that I'm cheating. I avoided using the word "compression", mostly to side-step having the discussion turning into reworking the wire serialization. But you're absolutely right, the de-duping benefits are independent of the UHS use-case.

Cory

On Thu, May 17, 2018, 12:56 PM Gregory Maxwell <greg@xiph.org> wrote:
On Wed, May 16, 2018 at 4:36 PM, Cory Fields via bitcoin-dev
<bitcoin-dev@lists.linuxfoundation.org> wrote:
> Tl;dr: Rather than storing all unspent outputs, store their hashes.

My initial thoughts are it's not _completely_ obvious to me that a 5%
ongoing bandwidth increase is actually a win to get something like a
40% reduction in the size of a pruned node (and less than a 1%
reduction in an archive node) primarily because I've not seen size of
a pruned node cited as a usage limiting factor basically anywhere. I
would assume it is a win but wouldn't be shocked to see a careful
analysis that concluded it wasn't.

But perhaps more interestingly, I think the overhead is not really 5%,
but it's 5% measured in the context of the phenomenally inefficient tx
mechanisms ( https://bitcointalk.org/index.php?topic=1377345.0 ).
Napkin math on the size of a txn alone tells me it's more like a 25%
increase if you just consider size of tx vs size of
tx+scriptpubkeys,amounts.  If I'm not missing something there, I think
that would get in into a very clear not-win range.

On the positive side is that it doesn't change the blockchain
datastructure, so it's something implementations could do without
marrying the network to it forever.