public inbox for bitcoindev@googlegroups.com
 help / color / mirror / Atom feed
* [bitcoin-dev] TXO bitfield size graphs
@ 2018-05-23 23:48 Jim Posen
  2018-05-24  2:43 ` Bram Cohen
  0 siblings, 1 reply; 3+ messages in thread
From: Jim Posen @ 2018-05-23 23:48 UTC (permalink / raw)
  To: Bitcoin Protocol Discussion


[-- Attachment #1.1: Type: text/plain, Size: 2715 bytes --]

I decided to look into the metrics around compression ratios of TXO
bitfields, as proposed by Bram Cohen [1]. I'm specifically interested in
the feasibility of committing to them with block headers. In combination
with block commitments to TXOs themselves, this would enable UTXO
inclusion/exclusion proofs for light clients.

First, looking just at proofs of inclusion in the UTXO set, each block
needs what Bram calls a "proof of position." Concretely, one such
construction is a Merkle root over all of the block's newly created coins,
including their output data (scriptPubKey + amount), the outpoint (txid +
index), and an absolute index of the output in the entire blockchain. A
Merkle branch in this tree constitutes a proof of position. Alternatively,
the "position", rather than being an absolute index in the chain, could be
a block hash plus an output index within the block.

Let's say we use the absolute index in the chain as position. A TXO
spentness bitfield can be constructed for the entire chain, which is added
to when new coins are created and modified when they are spent. In order to
compactly prove spentness in this bitfield to a client, one could chunk up
the bitfield and construct a Merkle Mountain Range [2] over the chunks.
Instead of building an MMR over outputs themselves, as proposed by Peter
Todd [3], an MMR constructed over bitfield chunks grows far slower, by a
large constant factor. Slower growth means faster updates.

So there's the question of how much these bitfields can be compressed. We
expect some decent level because patterns of spending coins are very
non-random.

The top graph in the attached figure shows the compression ratios possible
on a TXO bitfield split into 4 KiB chunks, using gzip (level=9) and lz4.
Data was collected at block height 523,303. You can see that the
compression ratio is much lower for older chunks and is worse for more
recent blocks. Over the entire history, gzip achieves 34.4%, lz4 54.8%, and
bz2 37.6%. I'm kind of surprised that the ratios are not lower with
off-the-shelf algorithms. And that gzip performs better than bz2 (it seems
to be a factor of the chunk size?).

Alternatively, we can look at bitfields stored separately by block, which
is more compatible with constructions where an output's position is its
block hash plus relative index. The per-block bitfield sizes are shown in
the bottom graph. The compression ratios overall are 50% for gzip, 70% for
lz4, and 61.5% for bz2.

[1]
https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2017-March/013928.html
[2]
https://github.com/opentimestamps/opentimestamps-server/blob/master/doc/merkle-mountain-range.md
[3] https://petertodd.org/2016/delayed-txo-commitments

[-- Attachment #1.2: Type: text/html, Size: 3656 bytes --]

[-- Attachment #2: bitfield_sizes.svg --]
[-- Type: image/svg+xml, Size: 1788185 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [bitcoin-dev] TXO bitfield size graphs
  2018-05-23 23:48 [bitcoin-dev] TXO bitfield size graphs Jim Posen
@ 2018-05-24  2:43 ` Bram Cohen
  2018-05-24  4:02   ` Jim Posen
  0 siblings, 1 reply; 3+ messages in thread
From: Bram Cohen @ 2018-05-24  2:43 UTC (permalink / raw)
  To: Jim Posen, Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 3954 bytes --]

You compressed something which is truly natively a bitfield using regular
compression algorithms? That is expected to get horrible results. Much
better would be something which handles it natively, say doing run length
encoding on the number of repeated bits and compressing that using elias
omega encoding. That is suboptimal in a few ways but has the advantage of
working well both on things which are mostly zeros or mostly ones, and only
performs badly on truly random bits.

It isn't super clear how relevant this information is. The TXO bitfield is
fairly small to begin with, and to compress the data in real time would
require a special data structure which gets worse compression than straight
compressing the whole thing and has slower lookups than an uncompressed
version. Writing such a thing sounds like an interesting project though.

On Wed, May 23, 2018 at 4:48 PM, Jim Posen via bitcoin-dev <
bitcoin-dev@lists•linuxfoundation.org> wrote:

> I decided to look into the metrics around compression ratios of TXO
> bitfields, as proposed by Bram Cohen [1]. I'm specifically interested in
> the feasibility of committing to them with block headers. In combination
> with block commitments to TXOs themselves, this would enable UTXO
> inclusion/exclusion proofs for light clients.
>
> First, looking just at proofs of inclusion in the UTXO set, each block
> needs what Bram calls a "proof of position." Concretely, one such
> construction is a Merkle root over all of the block's newly created coins,
> including their output data (scriptPubKey + amount), the outpoint (txid +
> index), and an absolute index of the output in the entire blockchain. A
> Merkle branch in this tree constitutes a proof of position. Alternatively,
> the "position", rather than being an absolute index in the chain, could be
> a block hash plus an output index within the block.
>
> Let's say we use the absolute index in the chain as position. A TXO
> spentness bitfield can be constructed for the entire chain, which is added
> to when new coins are created and modified when they are spent. In order to
> compactly prove spentness in this bitfield to a client, one could chunk up
> the bitfield and construct a Merkle Mountain Range [2] over the chunks.
> Instead of building an MMR over outputs themselves, as proposed by Peter
> Todd [3], an MMR constructed over bitfield chunks grows far slower, by a
> large constant factor. Slower growth means faster updates.
>
> So there's the question of how much these bitfields can be compressed. We
> expect some decent level because patterns of spending coins are very
> non-random.
>
> The top graph in the attached figure shows the compression ratios possible
> on a TXO bitfield split into 4 KiB chunks, using gzip (level=9) and lz4.
> Data was collected at block height 523,303. You can see that the
> compression ratio is much lower for older chunks and is worse for more
> recent blocks. Over the entire history, gzip achieves 34.4%, lz4 54.8%,
> and bz2 37.6%. I'm kind of surprised that the ratios are not lower with
> off-the-shelf algorithms. And that gzip performs better than bz2 (it seems
> to be a factor of the chunk size?).
>
> Alternatively, we can look at bitfields stored separately by block, which
> is more compatible with constructions where an output's position is its
> block hash plus relative index. The per-block bitfield sizes are shown in
> the bottom graph. The compression ratios overall are 50% for gzip, 70% for
> lz4, and 61.5% for bz2.
>
> [1] https://lists.linuxfoundation.org/pipermail/
> bitcoin-dev/2017-March/013928.html
> [2] https://github.com/opentimestamps/opentimestamps-
> server/blob/master/doc/merkle-mountain-range.md
> [3] https://petertodd.org/2016/delayed-txo-commitments
>
>
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists•linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>
>

[-- Attachment #2: Type: text/html, Size: 5426 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [bitcoin-dev] TXO bitfield size graphs
  2018-05-24  2:43 ` Bram Cohen
@ 2018-05-24  4:02   ` Jim Posen
  0 siblings, 0 replies; 3+ messages in thread
From: Jim Posen @ 2018-05-24  4:02 UTC (permalink / raw)
  To: bram; +Cc: Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 4768 bytes --]

Yes, certainly an RLE-style compression would work better in this instance,
but I wanted to see how well standard compression algorithms would work
without doing something custom. If there are other standard compression
schemes better suited to this, please let me know.

As far as relevance, I'll clarify that the intention is to compress the
bitfields when sending proofs of spentness/unspentness to light clients,
where bandwidth is a concern. As you note, the bitfields are small enough
that it's probably not necessary to store the compressed versions on full
nodes. Though lz4 is fast enough that it may be worthwhile to compress
before saving to disk.

On Wed, May 23, 2018 at 7:43 PM Bram Cohen <bram@chia•net> wrote:

> You compressed something which is truly natively a bitfield using regular
> compression algorithms? That is expected to get horrible results. Much
> better would be something which handles it natively, say doing run length
> encoding on the number of repeated bits and compressing that using elias
> omega encoding. That is suboptimal in a few ways but has the advantage of
> working well both on things which are mostly zeros or mostly ones, and only
> performs badly on truly random bits.
>
> It isn't super clear how relevant this information is. The TXO bitfield is
> fairly small to begin with, and to compress the data in real time would
> require a special data structure which gets worse compression than straight
> compressing the whole thing and has slower lookups than an uncompressed
> version. Writing such a thing sounds like an interesting project though.
>
> On Wed, May 23, 2018 at 4:48 PM, Jim Posen via bitcoin-dev <
> bitcoin-dev@lists•linuxfoundation.org> wrote:
>
>> I decided to look into the metrics around compression ratios of TXO
>> bitfields, as proposed by Bram Cohen [1]. I'm specifically interested in
>> the feasibility of committing to them with block headers. In combination
>> with block commitments to TXOs themselves, this would enable UTXO
>> inclusion/exclusion proofs for light clients.
>>
>> First, looking just at proofs of inclusion in the UTXO set, each block
>> needs what Bram calls a "proof of position." Concretely, one such
>> construction is a Merkle root over all of the block's newly created coins,
>> including their output data (scriptPubKey + amount), the outpoint (txid +
>> index), and an absolute index of the output in the entire blockchain. A
>> Merkle branch in this tree constitutes a proof of position. Alternatively,
>> the "position", rather than being an absolute index in the chain, could be
>> a block hash plus an output index within the block.
>>
>> Let's say we use the absolute index in the chain as position. A TXO
>> spentness bitfield can be constructed for the entire chain, which is added
>> to when new coins are created and modified when they are spent. In order to
>> compactly prove spentness in this bitfield to a client, one could chunk up
>> the bitfield and construct a Merkle Mountain Range [2] over the chunks.
>> Instead of building an MMR over outputs themselves, as proposed by Peter
>> Todd [3], an MMR constructed over bitfield chunks grows far slower, by a
>> large constant factor. Slower growth means faster updates.
>>
>> So there's the question of how much these bitfields can be compressed. We
>> expect some decent level because patterns of spending coins are very
>> non-random.
>>
>> The top graph in the attached figure shows the compression ratios
>> possible on a TXO bitfield split into 4 KiB chunks, using gzip (level=9)
>> and lz4. Data was collected at block height 523,303. You can see that the
>> compression ratio is much lower for older chunks and is worse for more
>> recent blocks. Over the entire history, gzip achieves 34.4%, lz4 54.8%,
>> and bz2 37.6%. I'm kind of surprised that the ratios are not lower with
>> off-the-shelf algorithms. And that gzip performs better than bz2 (it seems
>> to be a factor of the chunk size?).
>>
>> Alternatively, we can look at bitfields stored separately by block, which
>> is more compatible with constructions where an output's position is its
>> block hash plus relative index. The per-block bitfield sizes are shown in
>> the bottom graph. The compression ratios overall are 50% for gzip, 70% for
>> lz4, and 61.5% for bz2.
>>
>> [1]
>> https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2017-March/013928.html
>> [2]
>> https://github.com/opentimestamps/opentimestamps-server/blob/master/doc/merkle-mountain-range.md
>> [3] https://petertodd.org/2016/delayed-txo-commitments
>>
>>
>> _______________________________________________
>> bitcoin-dev mailing list
>> bitcoin-dev@lists•linuxfoundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>>
>>
>

[-- Attachment #2: Type: text/html, Size: 6381 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-05-24  4:02 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-23 23:48 [bitcoin-dev] TXO bitfield size graphs Jim Posen
2018-05-24  2:43 ` Bram Cohen
2018-05-24  4:02   ` Jim Posen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox