Hi Fabian,

We did consider indexing all txos -- even, amusingly, by using ordinals --
but decided that the extra index requirements for the decompressor (which
otherwise just requires a bit of extra CPU cycles but nothing beyond a
normal Core node).

A while ago we looked into putting the whole UTXOset into a trie so that
we could do prefix lookups. I think we discarded this idea for the same
reason, and because it could lead to surprising behavior for users since
a compressed tx might get invalidated by some UTXO showing up whose
prefix is too close to one of its inputs. Where "prefix" likely means
some special-purpose hash of the prevout that users will never otherwise
encounter.

We were also a bit put off by the data structure complexity since the
UTXO set no longer fits in RAM so it takes nontrivial effort to
implement a new index :) plus it drops our chances of getting code into
Core by a very large factor.

We can swag what the space savings would be: there are 122MM utxos right
now, which is a bit under 2^27. So assuming a uniform distribution of
prefixes we'd need to specify 28 bits to identify a UTXO. To contrast,
to identify a blockheight we need 20 bits and then maybe 12 more bits to
specify a TXO within a block. Plus whatever varint overhead we have.
(I've been working on this project but busy with family stuff and don't
remember exactly where we landed on the varints for this. I think we
agreed that there was room for improvement but didn't want to hold up
posting the rest of the concept because of it.)


The TL;DR is that we probably save a little less than a byte per input,
on average, which is not trivial but probably not worth the decreased
UX and greatly increased implementation complexity.


Best
Andrew



On Fri, Sep 01, 2023 at 10:24:54AM +0000, Fabian via bitcoin-dev wrote:
> Hi Tom,
> 
> without having gone into the details yet, thanks for the great effort you have put into this research and implementation already!
> 
> > The bulk of our size savings come from replacing the prevout of each input by a block height and index.
> 
> Have you also considered using just an index from a sorted UTXO set instead? The potential additional space saving might be minor but this would make the scheme compatible with pruning. I had this on my list as a future research topic but didn't get around to it yet.
> 
> Thanks,
> Fabian
> ------- Original Message -------
> On Thursday, August 31st, 2023 at 11:30 PM, Tom Briar via bitcoin-dev <bitcoin-dev@lists.linuxfoundation.org> wrote:
> 
> > Hey everyone,
> >
> > I've been working on a way to compress bitcoin transactions for transmission throughsteganography, satellite broadcasting,
> > and other low bandwidth channels with high CPU availability on decompression.
> >
> > [compressed_transactions.md](https://github.com/TomBriar/bitcoin/blob/2023-05--tx-compression/doc/compressed_transactions.md)
> >
> > In the document I describe a compression schema that's tailored for the most common transactions single parties are likely to make.
> > In every case it falls back such that no transaction will become malformed or corrupted.
> > Here's a PR for implementing this schema.
> >
> > [2023 05 tx compression](https://github.com/TomBriar/bitcoin/pull/3)
> > Thanks-
> > Tom.

> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


-- 
Andrew Poelstra
Director of Research, Blockstream
Email: apoelstra at wpsoftware.net
Web:   https://www.wpsoftware.net/andrew

The sun is always shining in space
    -Justin Lewis-Webster