public inbox for bitcoindev@googlegroups.com
 help / color / mirror / Atom feed
* [bitcoin-dev] request BIP number for: "Support for Datastream Compression"
@ 2015-11-09 19:18 Peter Tschipper
  2015-11-09 20:41 ` Johnathan Corgan
  2015-11-09 21:04 ` Bob McElrath
  0 siblings, 2 replies; 21+ messages in thread
From: Peter Tschipper @ 2015-11-09 19:18 UTC (permalink / raw)
  To: Bitcoin Dev

This is my first time through this process so please bear with me. 

I opened a PR #6973 this morning for Zlib Block Compression for block
relay and at the request of @sipa  this should have a BIP associated
with it.   The idea is simple, to compress the datastream before
sending, initially for blocks only but it could theoretically be done
for transactions as well.  Initial results show an average of 20% block
compression and taking 90 milliseconds for a full block (on a very slow
laptop) to compress.  The savings will be mostly in terms of less
bandwidth used, but I would expect there to be a small performance gain
during the transmission of the blocks particularly where network latency
is higher. 

I think the BIP title, if accepted should be the more generic, "Support
for Datastream Compression"  rather than the PR title of "Zlib
Compression for block relay" since it could also be used for
transactions as well at a later time.

Thanks for your time...


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [bitcoin-dev] request BIP number for: "Support for Datastream Compression"
  2015-11-09 19:18 [bitcoin-dev] request BIP number for: "Support for Datastream Compression" Peter Tschipper
@ 2015-11-09 20:41 ` Johnathan Corgan
  2015-11-09 21:04 ` Bob McElrath
  1 sibling, 0 replies; 21+ messages in thread
From: Johnathan Corgan @ 2015-11-09 20:41 UTC (permalink / raw)
  To: Peter Tschipper; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 1206 bytes --]

On Mon, Nov 9, 2015 at 11:18 AM, Peter Tschipper via bitcoin-dev <
bitcoin-dev@lists•linuxfoundation.org> wrote:


> I opened a PR #6973 this morning for Zlib Block Compression for block
> relay and at the request of @sipa  this should have a BIP associated
> with it.   The idea is simple, to compress the datastream before
> sending, initially for blocks only but it could theoretically be done
> for transactions as well.  Initial results show an average of 20% block
> compression and taking 90 milliseconds for a full block (on a very slow
> laptop) to compress.  The savings will be mostly in terms of less
> bandwidth used, but I would expect there to be a small performance gain
> during the transmission of the blocks particularly where network latency
> is higher.
>

​The trade-off decisions among bandwidth savings, CPU performance, and
latency are local, and I think it shouldn't be assumed that any particular
node will want to support it.  I recommend that if P2P message compression
is implemented, it should be negotiated via the services field at
connection time.

-- 
Johnathan Corgan
Corgan Labs - SDR Training and Development Services
http://corganlabs.com

[-- Attachment #2: Type: text/html, Size: 1942 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [bitcoin-dev] request BIP number for: "Support for Datastream Compression"
  2015-11-09 19:18 [bitcoin-dev] request BIP number for: "Support for Datastream Compression" Peter Tschipper
  2015-11-09 20:41 ` Johnathan Corgan
@ 2015-11-09 21:04 ` Bob McElrath
  2015-11-10  1:58   ` gladoscc
  1 sibling, 1 reply; 21+ messages in thread
From: Bob McElrath @ 2015-11-09 21:04 UTC (permalink / raw)
  To: Peter Tschipper; +Cc: Bitcoin Dev

I would expect that since a block contains mostly hashes and crypto signatures,
it would be almost totally incompressible.  I just calculated compression ratios:

zlib    -15%    (file is LARGER)
gzip     28%
bzip2    25%

So zlib compression is right out.  How much is ~25% bandwidth savings worth to
people?  This seems not worth it to me.  :-/

Peter Tschipper via bitcoin-dev [bitcoin-dev@lists•linuxfoundation.org] wrote:
> This is my first time through this process so please bear with me. 
> 
> I opened a PR #6973 this morning for Zlib Block Compression for block
> relay and at the request of @sipa  this should have a BIP associated
> with it.   The idea is simple, to compress the datastream before
> sending, initially for blocks only but it could theoretically be done
> for transactions as well.  Initial results show an average of 20% block
> compression and taking 90 milliseconds for a full block (on a very slow
> laptop) to compress.  The savings will be mostly in terms of less
> bandwidth used, but I would expect there to be a small performance gain
> during the transmission of the blocks particularly where network latency
> is higher. 
> 
> I think the BIP title, if accepted should be the more generic, "Support
> for Datastream Compression"  rather than the PR title of "Zlib
> Compression for block relay" since it could also be used for
> transactions as well at a later time.
> 
> Thanks for your time...
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists•linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
> 
> 
> !DSPAM:5640ff47206804314022622!
--
Cheers, Bob McElrath

"For every complex problem, there is a solution that is simple, neat, and wrong."
    -- H. L. Mencken 



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [bitcoin-dev] request BIP number for: "Support for Datastream Compression"
  2015-11-09 21:04 ` Bob McElrath
@ 2015-11-10  1:58   ` gladoscc
  2015-11-10  5:40     ` Johnathan Corgan
  0 siblings, 1 reply; 21+ messages in thread
From: gladoscc @ 2015-11-10  1:58 UTC (permalink / raw)
  To: Bob McElrath; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 2545 bytes --]

I think 25% bandwidth savings is certainly considerable, especially for
people running full nodes in countries like Australia where internet
bandwidth is lower and there are data caps.

I absolutely would not dismiss 25% compression. gzip and bzip2 compression
is relatively standard, and I'd consider the point of implementation
complexity tradeoff to be somewhere along 5-10%.

On Tue, Nov 10, 2015 at 8:04 AM, Bob McElrath via bitcoin-dev <
bitcoin-dev@lists•linuxfoundation.org> wrote:

> I would expect that since a block contains mostly hashes and crypto
> signatures,
> it would be almost totally incompressible.  I just calculated compression
> ratios:
>
> zlib    -15%    (file is LARGER)
> gzip     28%
> bzip2    25%
>
> So zlib compression is right out.  How much is ~25% bandwidth savings
> worth to
> people?  This seems not worth it to me.  :-/
>
> Peter Tschipper via bitcoin-dev [bitcoin-dev@lists•linuxfoundation.org]
> wrote:
> > This is my first time through this process so please bear with me.
> >
> > I opened a PR #6973 this morning for Zlib Block Compression for block
> > relay and at the request of @sipa  this should have a BIP associated
> > with it.   The idea is simple, to compress the datastream before
> > sending, initially for blocks only but it could theoretically be done
> > for transactions as well.  Initial results show an average of 20% block
> > compression and taking 90 milliseconds for a full block (on a very slow
> > laptop) to compress.  The savings will be mostly in terms of less
> > bandwidth used, but I would expect there to be a small performance gain
> > during the transmission of the blocks particularly where network latency
> > is higher.
> >
> > I think the BIP title, if accepted should be the more generic, "Support
> > for Datastream Compression"  rather than the PR title of "Zlib
> > Compression for block relay" since it could also be used for
> > transactions as well at a later time.
> >
> > Thanks for your time...
> > _______________________________________________
> > bitcoin-dev mailing list
> > bitcoin-dev@lists•linuxfoundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
> >
> >
> > !DSPAM:5640ff47206804314022622!
> --
> Cheers, Bob McElrath
>
> "For every complex problem, there is a solution that is simple, neat, and
> wrong."
>     -- H. L. Mencken
>
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists•linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>

[-- Attachment #2: Type: text/html, Size: 3649 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [bitcoin-dev] request BIP number for: "Support for Datastream Compression"
  2015-11-10  1:58   ` gladoscc
@ 2015-11-10  5:40     ` Johnathan Corgan
  2015-11-10  9:44       ` Tier Nolan
  0 siblings, 1 reply; 21+ messages in thread
From: Johnathan Corgan @ 2015-11-10  5:40 UTC (permalink / raw)
  To: gladoscc; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 568 bytes --]

On Mon, Nov 9, 2015 at 5:58 PM, gladoscc via bitcoin-dev <
bitcoin-dev@lists•linuxfoundation.org> wrote:


> I think 25% bandwidth savings is certainly considerable, especially for
> people running full nodes in countries like Australia where internet
> bandwidth is lower and there are data caps.
>

​This reinforces the idea that such trade-off decisions should be be local
and negotiated between peers, not a required feature of the network P2P.​


-- 
Johnathan Corgan
Corgan Labs - SDR Training and Development Services
http://corganlabs.com

[-- Attachment #2: Type: text/html, Size: 1329 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [bitcoin-dev] request BIP number for: "Support for Datastream Compression"
  2015-11-10  5:40     ` Johnathan Corgan
@ 2015-11-10  9:44       ` Tier Nolan
       [not found]         ` <5642172C.701@gmail.com>
  0 siblings, 1 reply; 21+ messages in thread
From: Tier Nolan @ 2015-11-10  9:44 UTC (permalink / raw)
  Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 2135 bytes --]

The network protocol is not quite consensus critical, but it is important.

Two implementations of the decompressor might not be bug for bug
compatible.  This (potentially) means that a block could be designed that
won't decode properly for some version of the client but would work for
another.  This would fork the network.

A "raw" network library is unlikely to have the same problem.

Rather than just compress the stream, you could compress only block
messages only.  A new "cblock" message could be created that is a
compressed block.  This shouldn't reduce efficiency by much.

If a client fails to decode a cblock, then it can ask for the block to be
re-sent as a standard "block" message.

This means that it is a pure performance improvement.  If problems occur,
then the client can just switch back to uncompressed mode for that block.

You should look into the block relay system.  This gives a larger
improvement than simply compressing the stream.  The main benefit is
latency but it means that actual blocks don't have to be sent, so gives a
potential 50% compression ratio.  Normally, a node receives all the
transactions and then those transactions are included later in the block.



On Tue, Nov 10, 2015 at 5:40 AM, Johnathan Corgan via bitcoin-dev <
bitcoin-dev@lists•linuxfoundation.org> wrote:

> On Mon, Nov 9, 2015 at 5:58 PM, gladoscc via bitcoin-dev <
> bitcoin-dev@lists•linuxfoundation.org> wrote:
>
>
>> I think 25% bandwidth savings is certainly considerable, especially for
>> people running full nodes in countries like Australia where internet
>> bandwidth is lower and there are data caps.
>>
>
> ​This reinforces the idea that such trade-off decisions should be be local
> and negotiated between peers, not a required feature of the network P2P.​
>
>
> --
> Johnathan Corgan
> Corgan Labs - SDR Training and Development Services
> http://corganlabs.com
>
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists•linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>
>

[-- Attachment #2: Type: text/html, Size: 3569 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [bitcoin-dev] request BIP number for: "Support for Datastream Compression"
       [not found]         ` <5642172C.701@gmail.com>
@ 2015-11-10 16:17           ` Peter Tschipper
  2015-11-10 16:21             ` Jonathan Toomim
  2015-11-10 16:30           ` Tier Nolan
  1 sibling, 1 reply; 21+ messages in thread
From: Peter Tschipper @ 2015-11-10 16:17 UTC (permalink / raw)
  To: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 5481 bytes --]

On 10/11/2015 8:11 AM, Peter Tschipper wrote:
> On 10/11/2015 1:44 AM, Tier Nolan via bitcoin-dev wrote:
>> The network protocol is not quite consensus critical, but it is
>> important.
>>
>> Two implementations of the decompressor might not be bug for bug
>> compatible.  This (potentially) means that a block could be designed
>> that won't decode properly for some version of the client but would
>> work for another.  This would fork the network.
>>
>> A "raw" network library is unlikely to have the same problem.
>>
>> Rather than just compress the stream, you could compress only block
>> messages only.  A new "cblock" message could be created that is a
>> compressed block.  This shouldn't reduce efficiency by much.
>>
> I chose the more generic datastream compression so we could in the
> future apply to possibly to transactions but currently all that is
> planned, is to compress blocks, and that was really my only original
> intent until I saw that there might be some bandwidth savings for
> transactions as well. 
>
> The compression  however could be applied to any datastream but is not
> *forced* .  Basically it would just be a method call in CDatastream so
> we could do ss.compress and ss.decompress and apply that to blocks and
> possibly transactions if worthwhile and only IF compression is turned
> on.  But there is no intend to apply this to every type of message
> since most would be too small to benefit from compression.
>
> Here are some results of using the code in the PR to
> compress/decompress blocks using zlib compression level = 6.  This
> data was taken from the first 275K blocks in the mainnet blockchain. 
> Clearly once we get past 10KB we get pretty decent compression but
> even below that there is some benefit.  I'm still collecting data and
> will get the same for the whole blockchain.
>
> range = block size range
> ubytes = average size of uncompressed blocks
> cbytes = average size of compressed blocks
> ctime = average time to compress
> dtime = average time to decompress
> cmp_ratio% = compression ratio
> datapoints = number of datapoints taken
>
> range       ubytes    cbytes    ctime    dtime    cmp_ratio%    datapoints
> 0-250b      215         189    0.001    0.000    12.41            79498
> 250-500b    440         405    0.001    0.000    7.82            11903
> 500-1KB     762         702    0.001    0.000    7.83            10448
> 1KB-10KB    4166    3561    0.001    0.000    14.51            50572
> 10KB-100KB  40820    31597    0.005    0.001    22.59            75555
> 100KB-200KB 146238    106320    0.015    0.001    27.30            25024
> 200KB-300KB 242913    175482    0.025    0.002    27.76            20450
> 300KB-400KB 343430    251760    0.034    0.003    26.69            2069
> 400KB-500KB 457448    343495    0.045    0.004    24.91            1889
> 500KB-600KB 540736    424255    0.056    0.007    21.54            90
> 600KB-700KB 647851    506888    0.063    0.007    21.76            59
> 700KB-800KB 749513    586551    0.073    0.007    21.74            48
> 800KB-900KB 859439    652166    0.086    0.008    24.12            39
> 900KB-1MB   952333    725191    0.089    0.009    23.85            78
>
>> If a client fails to decode a cblock, then it can ask for the block
>> to be re-sent as a standard "block" message. 
> interesting idea.
>>
>> This means that it is a pure performance improvement.  If problems
>> occur, then the client can just switch back to uncompressed mode for
>> that block.
>>
>> You should look into the block relay system.  This gives a larger
>> improvement than simply compressing the stream.  The main benefit is
>> latency but it means that actual blocks don't have to be sent, so
>> gives a potential 50% compression ratio.  Normally, a node receives
>> all the transactions and then those transactions are included later
>> in the block.
>>
> There are better ways of sending new blocks, that's certainly true but
> for sending historical blocks and seding transactions I don't think
> so.  This PR is really designed to save bandwidth and not intended to
> be a huge performance improvement in terms of time spent sending.
>>
>> On Tue, Nov 10, 2015 at 5:40 AM, Johnathan Corgan via bitcoin-dev
>> <bitcoin-dev@lists•linuxfoundation.org> wrote:
>>
>>     On Mon, Nov 9, 2015 at 5:58 PM, gladoscc via bitcoin-dev
>>     <bitcoin-dev@lists•linuxfoundation.org
>>     <mailto:bitcoin-dev@lists•linuxfoundation.org>> wrote:
>>      
>>
>>         I think 25% bandwidth savings is certainly considerable,
>>         especially for people running full nodes in countries like
>>         Australia where internet bandwidth is lower and there are
>>         data caps.
>>
>>
>>     ​This reinforces the idea that such trade-off decisions should be
>>     be local and negotiated between peers, not a required feature of
>>     the network P2P.​
>>      
>>
>>     -- 
>>     Johnathan Corgan
>>     Corgan Labs - SDR Training and Development Services
>>     http://corganlabs.com
>>
>>     _______________________________________________
>>     bitcoin-dev mailing list
>>     bitcoin-dev@lists•linuxfoundation.org
>>     <mailto:bitcoin-dev@lists•linuxfoundation.org>
>>     https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>>
>>
>>
>>
>> _______________________________________________
>> bitcoin-dev mailing list
>> bitcoin-dev@lists•linuxfoundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>


[-- Attachment #2: Type: text/html, Size: 11847 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [bitcoin-dev] request BIP number for: "Support for Datastream Compression"
  2015-11-10 16:17           ` Peter Tschipper
@ 2015-11-10 16:21             ` Jonathan Toomim
  0 siblings, 0 replies; 21+ messages in thread
From: Jonathan Toomim @ 2015-11-10 16:21 UTC (permalink / raw)
  To: Peter Tschipper; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 128 bytes --]

Quick observation: block transmission would be compress-once, send-multiple-times, which makes the tradeoff a little better.


[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 496 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [bitcoin-dev] request BIP number for: "Support for Datastream Compression"
       [not found]         ` <5642172C.701@gmail.com>
  2015-11-10 16:17           ` Peter Tschipper
@ 2015-11-10 16:30           ` Tier Nolan
  2015-11-10 16:46             ` Jeff Garzik
       [not found]             ` <56421F1E.4050302@gmail.com>
  1 sibling, 2 replies; 21+ messages in thread
From: Tier Nolan @ 2015-11-10 16:30 UTC (permalink / raw)
  To: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 2479 bytes --]

On Tue, Nov 10, 2015 at 4:11 PM, Peter Tschipper <peter.tschipper@gmail•com>
wrote:

> There are better ways of sending new blocks, that's certainly true but for
> sending historical blocks and seding transactions I don't think so.  This
> PR is really designed to save bandwidth and not intended to be a huge
> performance improvement in terms of time spent sending.
>

If the main point is for historical data, then sticking to just blocks is
the best plan.

Since small blocks don't compress well, you could define a "cblocks"
message that handles multiple blocks (just concatenate the block messages
as payload before compression).

The sending peer could combine blocks so that each cblock is compressing at
least 10kB of block data (or whatever is optimal).  It is probably worth
specifying a maximum size for network buffer reasons (either 1MB or 1 block
maximum).

Similarly, transactions could be combined together and compressed "ctxs".
The inv messages could be modified so that you can request groups of 10-20
transactions.  That would depend on how much of an improvement compressed
transactions would represent.

More generally, you could define a message which is a compressed message
holder.  That is probably to complex to be worth the effort though.



>
> On Tue, Nov 10, 2015 at 5:40 AM, Johnathan Corgan via bitcoin-dev <
> <bitcoin-dev@lists•linuxfoundation.org>
> bitcoin-dev@lists•linuxfoundation.org> wrote:
>
>> On Mon, Nov 9, 2015 at 5:58 PM, gladoscc via bitcoin-dev <
>> bitcoin-dev@lists•linuxfoundation.org> wrote:
>>
>>
>>> I think 25% bandwidth savings is certainly considerable, especially for
>>> people running full nodes in countries like Australia where internet
>>> bandwidth is lower and there are data caps.
>>>
>>
>> ​This reinforces the idea that such trade-off decisions should be be
>> local and negotiated between peers, not a required feature of the network
>> P2P.​
>>
>>
>> --
>> Johnathan Corgan
>> Corgan Labs - SDR Training and Development Services
>> http://corganlabs.com
>>
>> _______________________________________________
>> bitcoin-dev mailing list
>> bitcoin-dev@lists•linuxfoundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>>
>>
>
>
> _______________________________________________
> bitcoin-dev mailing listbitcoin-dev@lists•linuxfoundation.orghttps://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>
>
>

[-- Attachment #2: Type: text/html, Size: 6099 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [bitcoin-dev] request BIP number for: "Support for Datastream Compression"
  2015-11-10 16:30           ` Tier Nolan
@ 2015-11-10 16:46             ` Jeff Garzik
  2015-11-10 17:09               ` Peter Tschipper
                                 ` (2 more replies)
       [not found]             ` <56421F1E.4050302@gmail.com>
  1 sibling, 3 replies; 21+ messages in thread
From: Jeff Garzik @ 2015-11-10 16:46 UTC (permalink / raw)
  To: Tier Nolan; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 3626 bytes --]

Comments:

1) cblock seems a reasonable way to extend the protocol.  Further wrapping
should probably be done at the stream level.

2) zlib has crappy security track record.

3) A fallback path to non-compressed is required, should compression fail
or crash.

4) Most blocks and transactions have runs of zeroes and/or highly common
bit-patterns, which contributes to useful compression even at smaller
sizes.  Peter Ts's most recent numbers bear this out.  zlib has a
dictionary (32K?) which works well with repeated patterns such as those you
see with concatenated runs of transactions.

5) LZO should provide much better compression, at a cost of CPU performance
and using a less-reviewed, less-field-tested library.





On Tue, Nov 10, 2015 at 11:30 AM, Tier Nolan via bitcoin-dev <
bitcoin-dev@lists•linuxfoundation.org> wrote:

>
>
> On Tue, Nov 10, 2015 at 4:11 PM, Peter Tschipper <
> peter.tschipper@gmail•com> wrote:
>
>> There are better ways of sending new blocks, that's certainly true but
>> for sending historical blocks and seding transactions I don't think so.
>> This PR is really designed to save bandwidth and not intended to be a huge
>> performance improvement in terms of time spent sending.
>>
>
> If the main point is for historical data, then sticking to just blocks is
> the best plan.
>
> Since small blocks don't compress well, you could define a "cblocks"
> message that handles multiple blocks (just concatenate the block messages
> as payload before compression).
>
> The sending peer could combine blocks so that each cblock is compressing
> at least 10kB of block data (or whatever is optimal).  It is probably worth
> specifying a maximum size for network buffer reasons (either 1MB or 1 block
> maximum).
>
> Similarly, transactions could be combined together and compressed "ctxs".
> The inv messages could be modified so that you can request groups of 10-20
> transactions.  That would depend on how much of an improvement compressed
> transactions would represent.
>
> More generally, you could define a message which is a compressed message
> holder.  That is probably to complex to be worth the effort though.
>
>
>
>>
>> On Tue, Nov 10, 2015 at 5:40 AM, Johnathan Corgan via bitcoin-dev <
>> <bitcoin-dev@lists•linuxfoundation.org>
>> bitcoin-dev@lists•linuxfoundation.org> wrote:
>>
>>> On Mon, Nov 9, 2015 at 5:58 PM, gladoscc via bitcoin-dev <
>>> bitcoin-dev@lists•linuxfoundation.org> wrote:
>>>
>>>
>>>> I think 25% bandwidth savings is certainly considerable, especially for
>>>> people running full nodes in countries like Australia where internet
>>>> bandwidth is lower and there are data caps.
>>>>
>>>
>>> ​This reinforces the idea that such trade-off decisions should be be
>>> local and negotiated between peers, not a required feature of the network
>>> P2P.​
>>>
>>>
>>> --
>>> Johnathan Corgan
>>> Corgan Labs - SDR Training and Development Services
>>> http://corganlabs.com
>>>
>>> _______________________________________________
>>> bitcoin-dev mailing list
>>> bitcoin-dev@lists•linuxfoundation.org
>>> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>>>
>>>
>>
>>
>> _______________________________________________
>> bitcoin-dev mailing listbitcoin-dev@lists•linuxfoundation.orghttps://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>>
>>
>>
>
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists•linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>
>

[-- Attachment #2: Type: text/html, Size: 7807 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [bitcoin-dev] request BIP number for: "Support for Datastream Compression"
       [not found]             ` <56421F1E.4050302@gmail.com>
@ 2015-11-10 16:46               ` Peter Tschipper
  0 siblings, 0 replies; 21+ messages in thread
From: Peter Tschipper @ 2015-11-10 16:46 UTC (permalink / raw)
  To: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 3993 bytes --]

On 10/11/2015 8:45 AM, Peter Tschipper wrote:
> On 10/11/2015 8:30 AM, Tier Nolan via bitcoin-dev wrote:
>>
>>
>> On Tue, Nov 10, 2015 at 4:11 PM, Peter Tschipper
>> <peter.tschipper@gmail•com> wrote:
>>
>>     There are better ways of sending new blocks, that's certainly
>>     true but for sending historical blocks and seding transactions I
>>     don't think so.  This PR is really designed to save bandwidth and
>>     not intended to be a huge performance improvement in terms of
>>     time spent sending.
>>
>>
>> If the main point is for historical data, then sticking to just
>> blocks is the best plan.
>>
> at the beginning yes.
>> Since small blocks don't compress well, you could define a "cblocks"
>> message that handles multiple blocks (just concatenate the block
>> messages as payload before compression). 
>>
> Small block are rare these days (but plenty of historical block), but
> still they get a 10% compression, not bad and I think worthwhile and
> the time it takes to compress small blocks is less that a millisecond
> so no loss there in time.   But still you have a good point and
> something worthy of doing after getting compression to work.  I think
> it's wise to keep it simple at first and build on the success later.
>> The sending peer could combine blocks so that each cblock is
>> compressing at least 10kB of block data (or whatever is optimal).  It
>> is probably worth specifying a maximum size for network buffer
>> reasons (either 1MB or 1 block maximum).
> Good idea. Same answer as above.
>> Similarly, transactions could be combined together and compressed
>> "ctxs".  The inv messages could be modified so that you can request
>> groups of 10-20 transactions.  That would depend on how much of an
>> improvement compressed transactions would represent.
>>
> Good idea. Same answer as above.
>> More generally, you could define a message which is a compressed
>> message holder.  That is probably to complex to be worth the effort
>> though.
> That's actually pretty easy to do and part of the plan.  Sending a
> cmp_block rather than a block makes it all easier to implement.  It's
> just a matter of doing pnode->pushmessage("cmp_block",
> compressed_block); and handling the "cmp_block" command string at the
> other end.
>>
>>  
>>
>>>
>>>     On Tue, Nov 10, 2015 at 5:40 AM, Johnathan Corgan via
>>>     bitcoin-dev <bitcoin-dev@lists•linuxfoundation.org
>>>     <mailto:bitcoin-dev@lists•linuxfoundation.org>> wrote:
>>>
>>>         On Mon, Nov 9, 2015 at 5:58 PM, gladoscc via bitcoin-dev
>>>         <bitcoin-dev@lists•linuxfoundation.org> wrote:
>>>          
>>>
>>>             I think 25% bandwidth savings is certainly considerable,
>>>             especially for people running full nodes in countries
>>>             like Australia where internet bandwidth is lower and
>>>             there are data caps.
>>>
>>>
>>>         ​ This reinforces the idea that such trade-off decisions
>>>         should be be local and negotiated between peers, not a
>>>         required feature of the network P2P.​
>>>          
>>>
>>>         -- 
>>>         Johnathan Corgan
>>>         Corgan Labs - SDR Training and Development Services
>>>         http://corganlabs.com
>>>
>>>         _______________________________________________
>>>         bitcoin-dev mailing list
>>>         bitcoin-dev@lists•linuxfoundation.org
>>>         <mailto:bitcoin-dev@lists•linuxfoundation.org>
>>>         https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>>>
>>>
>>>
>>>
>>>     _______________________________________________
>>>     bitcoin-dev mailing list
>>>     bitcoin-dev@lists•linuxfoundation.org
>>>     <mailto:bitcoin-dev@lists•linuxfoundation.org>
>>>     https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>>
>>
>>
>>
>> _______________________________________________
>> bitcoin-dev mailing list
>> bitcoin-dev@lists•linuxfoundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>


[-- Attachment #2: Type: text/html, Size: 12736 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [bitcoin-dev] request BIP number for: "Support for Datastream Compression"
  2015-11-10 16:46             ` Jeff Garzik
@ 2015-11-10 17:09               ` Peter Tschipper
  2015-11-11 18:35               ` Peter Tschipper
  2015-11-28 14:48               ` [bitcoin-dev] further test results for : "Datastream Compression of Blocks and Tx's" Peter Tschipper
  2 siblings, 0 replies; 21+ messages in thread
From: Peter Tschipper @ 2015-11-10 17:09 UTC (permalink / raw)
  To: bitcoin-dev

[-- Attachment #1: Type: text/plain, Size: 4893 bytes --]

On 10/11/2015 8:46 AM, Jeff Garzik via bitcoin-dev wrote:
> Comments:
>
> 1) cblock seems a reasonable way to extend the protocol.  Further
> wrapping should probably be done at the stream level.
agreed.
>
> 2) zlib has crappy security track record.
>
Zlib had a bad buffer overflow bug but that was in 2005 and it got a lot
of press at the time.  It's was fixed in version 1.2.3...we're on 1.2.8
now.  I'm not aware of any other current issues with zlib. Do you have a
citation?

> 3) A fallback path to non-compressed is required, should compression
> fail or crash.
agreed.
>
> 4) Most blocks and transactions have runs of zeroes and/or highly
> common bit-patterns, which contributes to useful compression even at
> smaller sizes.  Peter Ts's most recent numbers bear this out.  zlib
> has a dictionary (32K?) which works well with repeated patterns such
> as those you see with concatenated runs of transactions.
>
> 5) LZO should provide much better compression, at a cost of CPU
> performance and using a less-reviewed, less-field-tested library.
I don't think LZO will give as good compression here but I will do some
benchmarking when I can.

>
>
>
>
>
> On Tue, Nov 10, 2015 at 11:30 AM, Tier Nolan via bitcoin-dev
> <bitcoin-dev@lists•linuxfoundation.org
> <mailto:bitcoin-dev@lists•linuxfoundation.org>> wrote:
>
>
>
>     On Tue, Nov 10, 2015 at 4:11 PM, Peter Tschipper
>     <peter.tschipper@gmail•com <mailto:peter.tschipper@gmail•com>> wrote:
>
>         There are better ways of sending new blocks, that's certainly
>         true but for sending historical blocks and seding transactions
>         I don't think so.  This PR is really designed to save
>         bandwidth and not intended to be a huge performance
>         improvement in terms of time spent sending.
>
>
>     If the main point is for historical data, then sticking to just
>     blocks is the best plan.
>
>     Since small blocks don't compress well, you could define a
>     "cblocks" message that handles multiple blocks (just concatenate
>     the block messages as payload before compression). 
>
>     The sending peer could combine blocks so that each cblock is
>     compressing at least 10kB of block data (or whatever is optimal). 
>     It is probably worth specifying a maximum size for network buffer
>     reasons (either 1MB or 1 block maximum).
>
>     Similarly, transactions could be combined together and compressed
>     "ctxs".  The inv messages could be modified so that you can
>     request groups of 10-20 transactions.  That would depend on how
>     much of an improvement compressed transactions would represent.
>
>     More generally, you could define a message which is a compressed
>     message holder.  That is probably to complex to be worth the
>     effort though.
>
>      
>
>>
>>         On Tue, Nov 10, 2015 at 5:40 AM, Johnathan Corgan via
>>         bitcoin-dev <bitcoin-dev@lists•linuxfoundation.org
>>         <mailto:bitcoin-dev@lists•linuxfoundation.org>> wrote:
>>
>>             On Mon, Nov 9, 2015 at 5:58 PM, gladoscc via bitcoin-dev
>>             <bitcoin-dev@lists•linuxfoundation.org
>>             <mailto:bitcoin-dev@lists•linuxfoundation.org>> wrote:
>>              
>>
>>                 I think 25% bandwidth savings is certainly
>>                 considerable, especially for people running full
>>                 nodes in countries like Australia where internet
>>                 bandwidth is lower and there are data caps.
>>
>>
>>             ​ This reinforces the idea that such trade-off decisions
>>             should be be local and negotiated between peers, not a
>>             required feature of the network P2P.​
>>              
>>
>>             -- 
>>             Johnathan Corgan
>>             Corgan Labs - SDR Training and Development Services
>>             http://corganlabs.com
>>
>>             _______________________________________________
>>             bitcoin-dev mailing list
>>             bitcoin-dev@lists•linuxfoundation.org
>>             <mailto:bitcoin-dev@lists•linuxfoundation.org>
>>             https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>>
>>
>>
>>
>>         _______________________________________________
>>         bitcoin-dev mailing list
>>         bitcoin-dev@lists•linuxfoundation.org
>>         <mailto:bitcoin-dev@lists•linuxfoundation.org>
>>         https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>
>
>
>     _______________________________________________
>     bitcoin-dev mailing list
>     bitcoin-dev@lists•linuxfoundation.org
>     <mailto:bitcoin-dev@lists•linuxfoundation.org>
>     https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>
>
>
>
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists•linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


[-- Attachment #2: Type: text/html, Size: 15022 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [bitcoin-dev] request BIP number for: "Support for Datastream Compression"
  2015-11-10 16:46             ` Jeff Garzik
  2015-11-10 17:09               ` Peter Tschipper
@ 2015-11-11 18:35               ` Peter Tschipper
  2015-11-11 18:49                 ` Marco Pontello
  2015-11-28 14:48               ` [bitcoin-dev] further test results for : "Datastream Compression of Blocks and Tx's" Peter Tschipper
  2 siblings, 1 reply; 21+ messages in thread
From: Peter Tschipper @ 2015-11-11 18:35 UTC (permalink / raw)
  To: bitcoin-dev

[-- Attachment #1: Type: text/plain, Size: 6772 bytes --]

Here are the latest results on compression ratios for the first 295,000
blocks, compressionlevel=6.  I think there are more than enough
datapoints for statistical significance. 

Results are very much similar to the previous test.   I'll work on
getting a comparison between how much time savings/loss in time there is
when syncing the blockchains: compressed vs uncompressed.  Still, I
think it's clear that serving up compressed blocks, at least historical
blocks, will be of benefit for those that have bandwidth caps on their
internet connections.

The proposal, so far is fairly simple:
1) compress blocks with some compression library: currently zlib but I
can investigate other possiblities
2) As a fall back we need to advertise compression as a service.  That
way we can turn off compression AND decompression completely if needed.
3) Do the compression at the datastream level in the code.  CDataStream
is the obvious place.


Test Results:

range = block size range
ubytes = average size of uncompressed blocks
cbytes = average size of compressed blocks
ctime = average time to compress
dtime = average time to decompress
cmp_ratio% = compression ratio
datapoints = number of datapoints taken

range       ubytes    cbytes    ctime    dtime    cmp_ratio%    datapoints
0-250b      215            189    0.001    0.000    12.40             91280
250-500b    438            404    0.001    0.000    7.85             13217
500-1KB     761            701    0.001    0.000    7.86               11434
1KB-10KB    4149    3547    0.001    0.000      14.51             52180
10KB-100KB  41934    32604    0.005    0.001    22.25         82890
100KB-200KB 146303    108080    0.016    0.001    26.13    29886
200KB-300KB 243299    179281    0.025    0.002    26.31    25066
300KB-400KB 344636    266177    0.036    0.003    22.77    4956
400KB-500KB 463201    356862    0.046    0.004    22.96    3167
500KB-600KB 545123    429854    0.056    0.005    21.15    366
600KB-700KB 647736    510931    0.065    0.006    21.12    254
700KB-800KB 746540    587287    0.073    0.008    21.33    294
800KB-900KB 868121    682650    0.087    0.008    21.36    199
900KB-1MB   945747    726307    0.091    0.010    23.20    304

On 10/11/2015 8:46 AM, Jeff Garzik via bitcoin-dev wrote:
> Comments:
>
> 1) cblock seems a reasonable way to extend the protocol.  Further
> wrapping should probably be done at the stream level.
>
> 2) zlib has crappy security track record.
>
> 3) A fallback path to non-compressed is required, should compression
> fail or crash.
>
> 4) Most blocks and transactions have runs of zeroes and/or highly
> common bit-patterns, which contributes to useful compression even at
> smaller sizes.  Peter Ts's most recent numbers bear this out.  zlib
> has a dictionary (32K?) which works well with repeated patterns such
> as those you see with concatenated runs of transactions.
>
> 5) LZO should provide much better compression, at a cost of CPU
> performance and using a less-reviewed, less-field-tested library.
>
>
>
>
>
> On Tue, Nov 10, 2015 at 11:30 AM, Tier Nolan via bitcoin-dev
> <bitcoin-dev@lists•linuxfoundation.org
> <mailto:bitcoin-dev@lists•linuxfoundation.org>> wrote:
>
>
>
>     On Tue, Nov 10, 2015 at 4:11 PM, Peter Tschipper
>     <peter.tschipper@gmail•com <mailto:peter.tschipper@gmail•com>> wrote:
>
>         There are better ways of sending new blocks, that's certainly
>         true but for sending historical blocks and seding transactions
>         I don't think so.  This PR is really designed to save
>         bandwidth and not intended to be a huge performance
>         improvement in terms of time spent sending.
>
>
>     If the main point is for historical data, then sticking to just
>     blocks is the best plan.
>
>     Since small blocks don't compress well, you could define a
>     "cblocks" message that handles multiple blocks (just concatenate
>     the block messages as payload before compression). 
>
>     The sending peer could combine blocks so that each cblock is
>     compressing at least 10kB of block data (or whatever is optimal). 
>     It is probably worth specifying a maximum size for network buffer
>     reasons (either 1MB or 1 block maximum).
>
>     Similarly, transactions could be combined together and compressed
>     "ctxs".  The inv messages could be modified so that you can
>     request groups of 10-20 transactions.  That would depend on how
>     much of an improvement compressed transactions would represent.
>
>     More generally, you could define a message which is a compressed
>     message holder.  That is probably to complex to be worth the
>     effort though.
>
>      
>
>>
>>         On Tue, Nov 10, 2015 at 5:40 AM, Johnathan Corgan via
>>         bitcoin-dev <bitcoin-dev@lists•linuxfoundation.org
>>         <mailto:bitcoin-dev@lists•linuxfoundation.org>> wrote:
>>
>>             On Mon, Nov 9, 2015 at 5:58 PM, gladoscc via bitcoin-dev
>>             <bitcoin-dev@lists•linuxfoundation.org
>>             <mailto:bitcoin-dev@lists•linuxfoundation.org>> wrote:
>>              
>>
>>                 I think 25% bandwidth savings is certainly
>>                 considerable, especially for people running full
>>                 nodes in countries like Australia where internet
>>                 bandwidth is lower and there are data caps.
>>
>>
>>             ​ This reinforces the idea that such trade-off decisions
>>             should be be local and negotiated between peers, not a
>>             required feature of the network P2P.​
>>              
>>
>>             -- 
>>             Johnathan Corgan
>>             Corgan Labs - SDR Training and Development Services
>>             http://corganlabs.com
>>
>>             _______________________________________________
>>             bitcoin-dev mailing list
>>             bitcoin-dev@lists•linuxfoundation.org
>>             <mailto:bitcoin-dev@lists•linuxfoundation.org>
>>             https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>>
>>
>>
>>
>>         _______________________________________________
>>         bitcoin-dev mailing list
>>         bitcoin-dev@lists•linuxfoundation.org
>>         <mailto:bitcoin-dev@lists•linuxfoundation.org>
>>         https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>
>
>
>     _______________________________________________
>     bitcoin-dev mailing list
>     bitcoin-dev@lists•linuxfoundation.org
>     <mailto:bitcoin-dev@lists•linuxfoundation.org>
>     https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>
>
>
>
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists•linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


[-- Attachment #2: Type: text/html, Size: 16946 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [bitcoin-dev] request BIP number for: "Support for Datastream Compression"
  2015-11-11 18:35               ` Peter Tschipper
@ 2015-11-11 18:49                 ` Marco Pontello
  2015-11-11 19:05                   ` Jonathan Toomim
  2015-11-11 19:11                   ` [bitcoin-dev] request BIP number for: "Support for Datastream Compression" Peter Tschipper
  0 siblings, 2 replies; 21+ messages in thread
From: Marco Pontello @ 2015-11-11 18:49 UTC (permalink / raw)
  To: Peter Tschipper; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 7383 bytes --]

A random thought: aren't most communication over a data link already
compressed, at some point?
When I used a modem, we had the V.42bis protocol. Now, nearly all ADSL
connections using PPPoE, surely are. And so on.
I'm not sure another level of generic, data agnostic kind of compression
will really give us some real-life practical advantage over that.

Something that could take advantage of of special knowledge of the specific
data, instead, would be an entirely different matter.

Just my 2c.

On Wed, Nov 11, 2015 at 7:35 PM, Peter Tschipper via bitcoin-dev <
bitcoin-dev@lists•linuxfoundation.org> wrote:

> Here are the latest results on compression ratios for the first 295,000
> blocks, compressionlevel=6.  I think there are more than enough datapoints
> for statistical significance.
>
> Results are very much similar to the previous test.   I'll work on getting
> a comparison between how much time savings/loss in time there is when
> syncing the blockchains: compressed vs uncompressed.  Still, I think it's
> clear that serving up compressed blocks, at least historical blocks, will
> be of benefit for those that have bandwidth caps on their internet
> connections.
>
> The proposal, so far is fairly simple:
> 1) compress blocks with some compression library: currently zlib but I can
> investigate other possiblities
> 2) As a fall back we need to advertise compression as a service.  That way
> we can turn off compression AND decompression completely if needed.
> 3) Do the compression at the datastream level in the code.  CDataStream is
> the obvious place.
>
>
> Test Results:
>
> range = block size range
> ubytes = average size of uncompressed blocks
> cbytes = average size of compressed blocks
> ctime = average time to compress
> dtime = average time to decompress
> cmp_ratio% = compression ratio
> datapoints = number of datapoints taken
>
> range       ubytes    cbytes    ctime    dtime    cmp_ratio%    datapoints
> 0-250b      215            189    0.001    0.000    12.40             91280
> 250-500b    438            404    0.001    0.000    7.85             13217
> 500-1KB     761            701    0.001    0.000    7.86
> 11434
> 1KB-10KB    4149    3547    0.001    0.000      14.51             52180
> 10KB-100KB  41934    32604    0.005    0.001    22.25         82890
> 100KB-200KB 146303    108080    0.016    0.001    26.13    29886
> 200KB-300KB 243299    179281    0.025    0.002    26.31    25066
> 300KB-400KB 344636    266177    0.036    0.003    22.77    4956
> 400KB-500KB 463201    356862    0.046    0.004    22.96    3167
> 500KB-600KB 545123    429854    0.056    0.005    21.15    366
> 600KB-700KB 647736    510931    0.065    0.006    21.12    254
> 700KB-800KB 746540    587287    0.073    0.008    21.33    294
> 800KB-900KB 868121    682650    0.087    0.008    21.36    199
> 900KB-1MB   945747    726307    0.091    0.010    23.20    304
>
> On 10/11/2015 8:46 AM, Jeff Garzik via bitcoin-dev wrote:
>
> Comments:
>
> 1) cblock seems a reasonable way to extend the protocol.  Further wrapping
> should probably be done at the stream level.
>
> 2) zlib has crappy security track record.
>
> 3) A fallback path to non-compressed is required, should compression fail
> or crash.
>
> 4) Most blocks and transactions have runs of zeroes and/or highly common
> bit-patterns, which contributes to useful compression even at smaller
> sizes.  Peter Ts's most recent numbers bear this out.  zlib has a
> dictionary (32K?) which works well with repeated patterns such as those you
> see with concatenated runs of transactions.
>
> 5) LZO should provide much better compression, at a cost of CPU
> performance and using a less-reviewed, less-field-tested library.
>
>
>
>
>
> On Tue, Nov 10, 2015 at 11:30 AM, Tier Nolan via bitcoin-dev <
> <bitcoin-dev@lists•linuxfoundation.org>
> bitcoin-dev@lists•linuxfoundation.org> wrote:
>
>>
>>
>> On Tue, Nov 10, 2015 at 4:11 PM, Peter Tschipper <
>> <peter.tschipper@gmail.com>peter.tschipper@gmail•com> wrote:
>>
>>> There are better ways of sending new blocks, that's certainly true but
>>> for sending historical blocks and seding transactions I don't think so.
>>> This PR is really designed to save bandwidth and not intended to be a huge
>>> performance improvement in terms of time spent sending.
>>>
>>
>> If the main point is for historical data, then sticking to just blocks is
>> the best plan.
>>
>> Since small blocks don't compress well, you could define a "cblocks"
>> message that handles multiple blocks (just concatenate the block messages
>> as payload before compression).
>>
>> The sending peer could combine blocks so that each cblock is compressing
>> at least 10kB of block data (or whatever is optimal).  It is probably worth
>> specifying a maximum size for network buffer reasons (either 1MB or 1 block
>> maximum).
>>
>> Similarly, transactions could be combined together and compressed
>> "ctxs".  The inv messages could be modified so that you can request groups
>> of 10-20 transactions.  That would depend on how much of an improvement
>> compressed transactions would represent.
>>
>> More generally, you could define a message which is a compressed message
>> holder.  That is probably to complex to be worth the effort though.
>>
>>
>>
>>>
>>> On Tue, Nov 10, 2015 at 5:40 AM, Johnathan Corgan via bitcoin-dev <
>>> <bitcoin-dev@lists•linuxfoundation.org>
>>> bitcoin-dev@lists•linuxfoundation.org> wrote:
>>>
>>>> On Mon, Nov 9, 2015 at 5:58 PM, gladoscc via bitcoin-dev <
>>>> <bitcoin-dev@lists•linuxfoundation.org>
>>>> bitcoin-dev@lists•linuxfoundation.org> wrote:
>>>>
>>>>
>>>>> I think 25% bandwidth savings is certainly considerable, especially
>>>>> for people running full nodes in countries like Australia where internet
>>>>> bandwidth is lower and there are data caps.
>>>>>
>>>>
>>>> ​ This reinforces the idea that such trade-off decisions should be be
>>>> local and negotiated between peers, not a required feature of the network
>>>> P2P.​
>>>>
>>>>
>>>> --
>>>> Johnathan Corgan
>>>> Corgan Labs - SDR Training and Development Services
>>>> <http://corganlabs.com>http://corganlabs.com
>>>>
>>>> _______________________________________________
>>>> bitcoin-dev mailing list
>>>> bitcoin-dev@lists•linuxfoundation.org
>>>> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>>>>
>>>>
>>>
>>>
>>> _______________________________________________
>>> bitcoin-dev mailing listbitcoin-dev@lists•linuxfoundation.orghttps://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>>>
>>>
>>>
>>
>> _______________________________________________
>> bitcoin-dev mailing list
>> bitcoin-dev@lists•linuxfoundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>>
>>
>
>
> _______________________________________________
> bitcoin-dev mailing listbitcoin-dev@lists•linuxfoundation.orghttps://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>
>
>
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists•linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>
>


-- 
Try the Online TrID File Identifier
http://mark0.net/onlinetrid.aspx

[-- Attachment #2: Type: text/html, Size: 17296 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [bitcoin-dev] request BIP number for: "Support for Datastream Compression"
  2015-11-11 18:49                 ` Marco Pontello
@ 2015-11-11 19:05                   ` Jonathan Toomim
  2015-11-13 21:58                     ` [bitcoin-dev] Block Compression (Datastream Compression) test results using the PR#6973 compression prototype Peter Tschipper
  2015-11-11 19:11                   ` [bitcoin-dev] request BIP number for: "Support for Datastream Compression" Peter Tschipper
  1 sibling, 1 reply; 21+ messages in thread
From: Jonathan Toomim @ 2015-11-11 19:05 UTC (permalink / raw)
  To: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 870 bytes --]

Data compression adds latency and reduces predictability, so engineers have decided to leave compression to application layers instead of transport layer or lower in order to let the application designer decide what tradeoffs to make.

On Nov 11, 2015, at 10:49 AM, Marco Pontello via bitcoin-dev <bitcoin-dev@lists•linuxfoundation.org> wrote:

> A random thought: aren't most communication over a data link already compressed, at some point?
> When I used a modem, we had the V.42bis protocol. Now, nearly all ADSL connections using PPPoE, surely are. And so on.
> I'm not sure another level of generic, data agnostic kind of compression will really give us some real-life practical advantage over that.
> 
> Something that could take advantage of of special knowledge of the specific data, instead, would be an entirely different matter.
> 
> Just my 2c.


[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 496 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [bitcoin-dev] request BIP number for: "Support for Datastream Compression"
  2015-11-11 18:49                 ` Marco Pontello
  2015-11-11 19:05                   ` Jonathan Toomim
@ 2015-11-11 19:11                   ` Peter Tschipper
  1 sibling, 0 replies; 21+ messages in thread
From: Peter Tschipper @ 2015-11-11 19:11 UTC (permalink / raw)
  To: Marco Pontello; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 9113 bytes --]

If that were true then we wouldn't need to gzip large files before
sending them over the internet.  Data compression generally helps
transmission speed as long as the amount of compression is high enough
and the time it takes is low enough to make it worthwhile.  On a
corporate LAN it's generally not worthwhile unless you're dealing with
very large files, but over a corporate WAN or the internet where network
latency can be high it is IMO a worthwhile endevor.



On 11/11/2015 10:49 AM, Marco Pontello wrote:
> A random thought: aren't most communication over a data link already
> compressed, at some point?
> When I used a modem, we had the V.42bis protocol. Now, nearly all ADSL
> connections using PPPoE, surely are. And so on.
> I'm not sure another level of generic, data agnostic kind of
> compression will really give us some real-life practical advantage
> over that.
>
> Something that could take advantage of of special knowledge of the
> specific data, instead, would be an entirely different matter.
>
> Just my 2c.
>
> On Wed, Nov 11, 2015 at 7:35 PM, Peter Tschipper via bitcoin-dev
> <bitcoin-dev@lists•linuxfoundation.org
> <mailto:bitcoin-dev@lists•linuxfoundation.org>> wrote:
>
>     Here are the latest results on compression ratios for the first
>     295,000 blocks, compressionlevel=6.  I think there are more than
>     enough datapoints for statistical significance. 
>
>     Results are very much similar to the previous test.   I'll work on
>     getting a comparison between how much time savings/loss in time
>     there is when syncing the blockchains: compressed vs
>     uncompressed.  Still, I think it's clear that serving up
>     compressed blocks, at least historical blocks, will be of benefit
>     for those that have bandwidth caps on their internet connections.
>
>     The proposal, so far is fairly simple:
>     1) compress blocks with some compression library: currently zlib
>     but I can investigate other possiblities
>     2) As a fall back we need to advertise compression as a service. 
>     That way we can turn off compression AND decompression completely
>     if needed.
>     3) Do the compression at the datastream level in the code. 
>     CDataStream is the obvious place.
>
>
>     Test Results:
>
>     range = block size range
>     ubytes = average size of uncompressed blocks
>     cbytes = average size of compressed blocks
>     ctime = average time to compress
>     dtime = average time to decompress
>     cmp_ratio% = compression ratio
>     datapoints = number of datapoints taken
>
>     range       ubytes    cbytes    ctime    dtime    cmp_ratio%   
>     datapoints
>     0-250b      215            189    0.001    0.000    12.40        
>         91280
>     250-500b    438            404    0.001    0.000    7.85          
>       13217
>     500-1KB     761            701    0.001    0.000   
>     7.86               11434
>     1KB-10KB    4149    3547    0.001    0.000      14.51            
>     52180
>     10KB-100KB  41934    32604    0.005    0.001    22.25         82890
>     100KB-200KB 146303    108080    0.016    0.001    26.13    29886
>     200KB-300KB 243299    179281    0.025    0.002    26.31    25066
>     300KB-400KB 344636    266177    0.036    0.003    22.77    4956
>     400KB-500KB 463201    356862    0.046    0.004    22.96    3167
>     500KB-600KB 545123    429854    0.056    0.005    21.15    366
>     600KB-700KB 647736    510931    0.065    0.006    21.12    254
>     700KB-800KB 746540    587287    0.073    0.008    21.33    294
>     800KB-900KB 868121    682650    0.087    0.008    21.36    199
>     900KB-1MB   945747    726307    0.091    0.010    23.20    304
>
>     On 10/11/2015 8:46 AM, Jeff Garzik via bitcoin-dev wrote:
>>     Comments:
>>
>>     1) cblock seems a reasonable way to extend the protocol.  Further
>>     wrapping should probably be done at the stream level.
>>
>>     2) zlib has crappy security track record.
>>
>>     3) A fallback path to non-compressed is required, should
>>     compression fail or crash.
>>
>>     4) Most blocks and transactions have runs of zeroes and/or highly
>>     common bit-patterns, which contributes to useful compression even
>>     at smaller sizes.  Peter Ts's most recent numbers bear this out.
>>      zlib has a dictionary (32K?) which works well with repeated
>>     patterns such as those you see with concatenated runs of
>>     transactions.
>>
>>     5) LZO should provide much better compression, at a cost of CPU
>>     performance and using a less-reviewed, less-field-tested library.
>>
>>
>>
>>
>>
>>     On Tue, Nov 10, 2015 at 11:30 AM, Tier Nolan via bitcoin-dev
>>     <bitcoin-dev@lists•linuxfoundation.org
>>     <mailto:bitcoin-dev@lists•linuxfoundation.org>> wrote:
>>
>>
>>
>>         On Tue, Nov 10, 2015 at 4:11 PM, Peter Tschipper
>>         <peter.tschipper@gmail•com
>>         <mailto:peter.tschipper@gmail•com>> wrote:
>>
>>             There are better ways of sending new blocks, that's
>>             certainly true but for sending historical blocks and
>>             seding transactions I don't think so.  This PR is really
>>             designed to save bandwidth and not intended to be a huge
>>             performance improvement in terms of time spent sending.
>>
>>
>>         If the main point is for historical data, then sticking to
>>         just blocks is the best plan.
>>
>>         Since small blocks don't compress well, you could define a
>>         "cblocks" message that handles multiple blocks (just
>>         concatenate the block messages as payload before compression). 
>>
>>         The sending peer could combine blocks so that each cblock is
>>         compressing at least 10kB of block data (or whatever is
>>         optimal).  It is probably worth specifying a maximum size for
>>         network buffer reasons (either 1MB or 1 block maximum).
>>
>>         Similarly, transactions could be combined together and
>>         compressed "ctxs".  The inv messages could be modified so
>>         that you can request groups of 10-20 transactions.  That
>>         would depend on how much of an improvement compressed
>>         transactions would represent.
>>
>>         More generally, you could define a message which is a
>>         compressed message holder.  That is probably to complex to be
>>         worth the effort though.
>>
>>          
>>
>>>
>>>             On Tue, Nov 10, 2015 at 5:40 AM, Johnathan Corgan via
>>>             bitcoin-dev <bitcoin-dev@lists•linuxfoundation.org
>>>             <mailto:bitcoin-dev@lists•linuxfoundation.org>> wrote:
>>>
>>>                 On Mon, Nov 9, 2015 at 5:58 PM, gladoscc via
>>>                 bitcoin-dev <bitcoin-dev@lists•linuxfoundation.org
>>>                 <mailto:bitcoin-dev@lists•linuxfoundation.org>> wrote:
>>>                  
>>>
>>>                     I think 25% bandwidth savings is certainly
>>>                     considerable, especially for people running full
>>>                     nodes in countries like Australia where internet
>>>                     bandwidth is lower and there are data caps.
>>>
>>>
>>>                 ​ This reinforces the idea that such trade-off
>>>                 decisions should be be local and negotiated between
>>>                 peers, not a required feature of the network P2P.​
>>>                  
>>>
>>>                 -- 
>>>                 Johnathan Corgan
>>>                 Corgan Labs - SDR Training and Development Services
>>>                 http://corganlabs.com
>>>
>>>                 _______________________________________________
>>>                 bitcoin-dev mailing list
>>>                 bitcoin-dev@lists•linuxfoundation.org
>>>                 <mailto:bitcoin-dev@lists•linuxfoundation.org>
>>>                 https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>>>
>>>
>>>
>>>
>>>             _______________________________________________
>>>             bitcoin-dev mailing list
>>>             bitcoin-dev@lists•linuxfoundation.org
>>>             <mailto:bitcoin-dev@lists•linuxfoundation.org>
>>>             https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>>
>>
>>
>>         _______________________________________________
>>         bitcoin-dev mailing list
>>         bitcoin-dev@lists•linuxfoundation.org
>>         <mailto:bitcoin-dev@lists•linuxfoundation.org>
>>         https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>>
>>
>>
>>
>>     _______________________________________________
>>     bitcoin-dev mailing list
>>     bitcoin-dev@lists•linuxfoundation.org
>>     <mailto:bitcoin-dev@lists•linuxfoundation.org>
>>     https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>
>
>     _______________________________________________
>     bitcoin-dev mailing list
>     bitcoin-dev@lists•linuxfoundation.org
>     <mailto:bitcoin-dev@lists•linuxfoundation.org>
>     https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>
>
>
>
> -- 
> Try the Online TrID File Identifier
> http://mark0.net/onlinetrid.aspx


[-- Attachment #2: Type: text/html, Size: 25638 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [bitcoin-dev] Block Compression (Datastream Compression) test results using the PR#6973 compression prototype
  2015-11-11 19:05                   ` Jonathan Toomim
@ 2015-11-13 21:58                     ` Peter Tschipper
  2015-11-18 14:00                       ` [bitcoin-dev] More findings: " Peter Tschipper
  0 siblings, 1 reply; 21+ messages in thread
From: Peter Tschipper @ 2015-11-13 21:58 UTC (permalink / raw)
  To: bitcoin-dev

[-- Attachment #1: Type: text/plain, Size: 1931 bytes --]

Some further Block Compression tests results that compare performance
when network latency is added to the mix.

Running two nodes, windows 7, compressionlevel=6, syncing the first
200000 blocks from one node to another.  Running on a highspeed wireless
LAN with no connections to the outside world.
Network latency was added by using Netbalancer to induce the 30ms and
60ms latencies.

From the data not only are bandwidth savings seen but also a small
performance savings as well.  However, the overall the value in
compressing blocks appears to be in terms of saving bandwidth.  

I was also surprised to see that there was no real difference in
performance when no latency was present; apparently the time it takes to
compress is about equal to the performance savings in such a situation.


The following results compare the tests in terms of how long it takes to
sync the blockchain, compressed vs uncompressed and with varying latencies.
uncmp = uncompressed
cmp = compressed

num blocks sync'd 	uncmp (secs) 	cmp (secs) 	uncmp 30ms (secs) 	cmp 30ms
(secs) 	uncmp 60ms (secs) 	cmp 60ms (secs)
10000 	264 	269 	265 	257 	274 	275
20000 	482 	492 	479 	467 	499 	497
30000 	703 	717 	693 	676 	724 	724
40000 	918 	939 	902 	886 	947 	944
50000 	1140 	1157 	1114 	1094 	1171 	1167
60000 	1362 	1380 	1329 	1310 	1400 	1395
70000 	1583 	1597 	1547 	1526 	1637 	1627
80000 	1810 	1817 	1767 	1745 	1872 	1862
90000 	2031 	2036 	1985 	1958 	2109 	2098
100000 	2257 	2260 	2223 	2184 	2385 	2355
110000 	2553 	2486 	2478 	2422 	2755 	2696
120000 	2800 	2724 	2849 	2771 	3345 	3254
130000 	3078 	2994 	3356 	3257 	4125 	4006
140000 	3442 	3365 	3979 	3870 	5032 	4904
150000 	3803 	3729 	4586 	4464 	5928 	5797
160000 	4148 	4075 	5168 	5034 	6801 	6661
170000 	4509 	4479 	5768 	5619 	7711 	7557
180000 	4947 	4924 	6389 	6227 	8653 	8479
190000 	5858 	5855 	7302 	7107 	9768 	9566
200000 	6980 	6969 	8469 	8220 	10944 	10724



[-- Attachment #2: Type: text/html, Size: 10768 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [bitcoin-dev] More findings: Block Compression (Datastream Compression) test results using the PR#6973 compression prototype
  2015-11-13 21:58                     ` [bitcoin-dev] Block Compression (Datastream Compression) test results using the PR#6973 compression prototype Peter Tschipper
@ 2015-11-18 14:00                       ` Peter Tschipper
  0 siblings, 0 replies; 21+ messages in thread
From: Peter Tschipper @ 2015-11-18 14:00 UTC (permalink / raw)
  To: bitcoin-dev

[-- Attachment #1: Type: text/plain, Size: 5198 bytes --]

Hi all,

I'm still doing a little more investigation before opening up a formal
bip PR, but getting close.  Here are some more findings.

After moving the compression from main.cpp to streams.h (CDataStream) it
was a simple matter to add compression to transactions as well. Results
as follows:

range = block size range
ubytes = average size of uncompressed transactions
cbytes = average size of compressed transactions
cmp_ratio% = compression ratio
datapoints = number of datapoints taken

range 	ubytes 	cbytes 	cmp_ratio% 	datapoints
0-250b 	220 	227 	-3.16 	23780
250-500b 	356 	354 	0.68 	20882
500-600 	534 	505 	5.29 	2772
600-700 	653 	608 	6.95 	1853
700-800 	757 	649 	14.22 	578
800-900  	822 	758 	7.77 	661
900-1KB 	954 	862 	9.69 	906
1KB-10KB  	2698 	2222 	17.64 	3370
10KB-100KB 	15463 	12092 	21.8 	15429


A couple of obvious observations.  Transactions don't compress well
below 500 bytes but do very well beyond 1KB where there are a great deal
of those large spam type transactions.   However, most transactions
happen to be in the < 500 byte range.  So the next step was to appy
bundling, or the creating of a "blob" for those smaller transactions, if
and only if there are multiple tx's in the getdata receive queue for a
peer.  Doing that yields some very good compression ratios.  Some
examples as follows:

The best one I've seen so far was the following where 175 transactions
were bundled into one blob before being compressed.  That yielded a 20%
compression ratio, but that doesn't take into account the savings from
the unneeded 174 message headers (24 bytes each) as well as 174 TCP
ACK's of 52 bytes each which yields and additional 76*174=13224 bytes,
making the overall bandwidth savings 32%, in this particular case.

*2015-11-18 01:09:09.002061 compressed blob from 79890 to 67426 txcount:175*

To be sure, this was an extreme example.  Most transaction blobs were in
the 2 to 10 transaction range.  Such as the following:

*2015-11-17 21:08:28.469313 compressed blob from 3199 to 2876 txcount:10*

But even here the savings are 10%, far better than the "nothing" we
would get without bundling, but add to that the 76 byte * 9 transaction
savings and we have a total 20% savings in bandwidth for transactions
that otherwise would not be compressible.

The same bundling was applied to blocks and very good compression ratios
are seen when sync'ing the blockchain.

Overall the bundling or blobbing of tx's and blocks seems to be a good
idea for improving bandwith use but also there is a scalability factor
here, when the system is busy, transactions are bundled more often,
compressed, sent faster, keeping message queue and network chatter to a
minimum.

I think I have enough information to put together a formal BIP with the
exception of which compression library to implement.  These tests were
done using ZLib but I'll also be running tests in the coming days with
LZO (Jeff Garzik's suggestion) and perhaps Snappy.  If there are any
other libraries that people would like me to get results for please let
me know and I'll pick maybe the top 2 or 3 and get results back to the
group.



On 13/11/2015 1:58 PM, Peter Tschipper wrote:
> Some further Block Compression tests results that compare performance
> when network latency is added to the mix.
>
> Running two nodes, windows 7, compressionlevel=6, syncing the first
> 200000 blocks from one node to another.  Running on a highspeed
> wireless LAN with no connections to the outside world.
> Network latency was added by using Netbalancer to induce the 30ms and
> 60ms latencies.
>
> From the data not only are bandwidth savings seen but also a small
> performance savings as well.  However, the overall the value in
> compressing blocks appears to be in terms of saving bandwidth.  
>
> I was also surprised to see that there was no real difference in
> performance when no latency was present; apparently the time it takes
> to compress is about equal to the performance savings in such a situation.
>
>
> The following results compare the tests in terms of how long it takes
> to sync the blockchain, compressed vs uncompressed and with varying
> latencies.
> uncmp = uncompressed
> cmp = compressed
>
> num blocks sync'd 	uncmp (secs) 	cmp (secs) 	uncmp 30ms (secs) 	cmp
> 30ms (secs) 	uncmp 60ms (secs) 	cmp 60ms (secs)
> 10000 	264 	269 	265 	257 	274 	275
> 20000 	482 	492 	479 	467 	499 	497
> 30000 	703 	717 	693 	676 	724 	724
> 40000 	918 	939 	902 	886 	947 	944
> 50000 	1140 	1157 	1114 	1094 	1171 	1167
> 60000 	1362 	1380 	1329 	1310 	1400 	1395
> 70000 	1583 	1597 	1547 	1526 	1637 	1627
> 80000 	1810 	1817 	1767 	1745 	1872 	1862
> 90000 	2031 	2036 	1985 	1958 	2109 	2098
> 100000 	2257 	2260 	2223 	2184 	2385 	2355
> 110000 	2553 	2486 	2478 	2422 	2755 	2696
> 120000 	2800 	2724 	2849 	2771 	3345 	3254
> 130000 	3078 	2994 	3356 	3257 	4125 	4006
> 140000 	3442 	3365 	3979 	3870 	5032 	4904
> 150000 	3803 	3729 	4586 	4464 	5928 	5797
> 160000 	4148 	4075 	5168 	5034 	6801 	6661
> 170000 	4509 	4479 	5768 	5619 	7711 	7557
> 180000 	4947 	4924 	6389 	6227 	8653 	8479
> 190000 	5858 	5855 	7302 	7107 	9768 	9566
> 200000 	6980 	6969 	8469 	8220 	10944 	10724
>
>


[-- Attachment #2: Type: text/html, Size: 18590 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [bitcoin-dev] further test results for : "Datastream Compression of Blocks and Tx's"
  2015-11-10 16:46             ` Jeff Garzik
  2015-11-10 17:09               ` Peter Tschipper
  2015-11-11 18:35               ` Peter Tschipper
@ 2015-11-28 14:48               ` Peter Tschipper
  2015-11-29  0:30                 ` Jonathan Toomim
  2 siblings, 1 reply; 21+ messages in thread
From: Peter Tschipper @ 2015-11-28 14:48 UTC (permalink / raw)
  To: bitcoin-dev

[-- Attachment #1: Type: text/plain, Size: 8938 bytes --]

Hi All,

Here are some final results of testing with the reference implementation
for compressing blocks and transactions. This implementation also
concatenates blocks and transactions when possible so you'll see data
sizes in the 1-2MB ranges.

Results below show the time it takes to sync the first part of the
blockchain, comparing Zlib to the LZOx library.  (LZOf was also tried
but wasn't found to be as good as LZOx).  The following shows tests run
with and without latency.  With latency on the network, all compression
libraries performed much better than without compression.

I don't think it's entirely obvious which is better, Zlib or LZO. 
Although I prefer the higher compression of Zlib, overall I would have
to give the edge to LZO.  With LZO we have the fastest most scalable
option when at the lowest compression setting which will be a boost in
performance for users that want peformance over compression, and then at
the high end LZO provides decent compression which approaches Zlib,
(although at a higher cost) but good for those that want to save more
bandwidth.

Uncompressed 60ms 	Zlib-1 (60ms) 	Zlib-6 (60ms) 	LZOx-1 (60ms) 	LZOx-999
(60ms)
219 	299 	296 	294 	291
432 	568 	565 	558 	548
652 	835 	836 	819 	811
866 	1106 	1107 	1081 	1071
1082 	1372 	1381 	1341 	1333
1309 	1644 	1654 	1605 	1600
1535 	1917 	1936 	1873 	1875
1762 	2191 	2210 	2141 	2141
1992 	2463 	2486 	2411 	2411
2257 	2748 	2780 	2694 	2697
2627 	3034 	3076 	2970 	2983
3226 	3416 	3397 	3266 	3302
4010 	3983 	3773 	3625 	3703
4914 	4503 	4292 	4127 	4287
5806 	4928 	4719 	4529 	4821
6674 	5249 	5164 	4840 	5314
7563 	5603 	5669 	5289 	6002
8477 	6054 	6268 	5858 	6638
9843 	7085 	7278 	6868 	7679
11338 	8215 	8433 	8044 	8795



These results from testing on a highspeed wireless LAN (very small latency)

Results in seconds 	
	
	
	
	
Num blocks sync'd 	Uncompressed 	Zlib-1 	Zlib-6 	LZOx-1 	LZOx-999
10000 	255 	232 	233 	231 	257
20000 	464 	414 	420 	407 	453
30000 	677 	594 	611 	585 	650
40000 	887 	782 	795 	760 	849
50000 	1099 	961 	977 	933 	1048
60000 	1310 	1145 	1167 	1110 	1259
70000 	1512 	1330 	1362 	1291 	1470
80000 	1714 	1519 	1552 	1469 	1679
90000 	1917 	1707 	1747 	1650 	1882
100000 	2122 	1905 	1950 	1843 	2111
110000 	2333 	2107 	2151 	2038 	2329
120000 	2560 	2333 	2376 	2256 	2580
130000 	2835 	2656 	2679 	2558 	2921
140000 	3274 	3259 	3161 	3051 	3466
150000 	3662 	3793 	3547 	3440 	3919
160000 	4040 	4172 	3937 	3767 	4416
170000 	4425 	4625 	4379 	4215 	4958
180000 	4860 	5149 	4895 	4781 	5560
190000 	5855 	6160 	5898 	5805 	6557
200000 	7004 	7234 	7051 	6983 	7770



The following show the compression ratio acheived for various sizes of
data.  Zlib is the clear
winner for compressibility, with LZOx-999 coming close but at a cost.

range 	Zlib-1 cmp%
	Zlib-6 cmp% 	LZOx-1 cmp% 	LZOx-999 cmp%
0-250b 	12.44 	12.86 	10.79 	14.34
250-500b  	19.33 	12.97 	10.34 	11.11
600-700 	16.72 	n/a 	12.91 	17.25
700-800 	6.37 	7.65 	4.83 	8.07
900-1KB 	6.54 	6.95 	5.64 	7.9
1KB-10KB 	25.08 	25.65 	21.21 	22.65
10KB-100KB 	19.77 	21.57 	14.37 	19.02
100KB-200KB 	21.49 	23.56 	15.37 	21.55
200KB-300KB 	23.66 	24.18 	16.91 	22.76
300KB-400KB 	23.4 	23.7 	16.5 	21.38
400KB-500KB 	24.6 	24.85 	17.56 	22.43
500KB-600KB 	25.51 	26.55 	18.51 	23.4
600KB-700KB 	27.25 	28.41 	19.91 	25.46
700KB-800KB 	27.58 	29.18 	20.26 	27.17
800KB-900KB 	27 	29.11 	20 	27.4
900KB-1MB 	28.19 	29.38 	21.15 	26.43
1MB -2MB 	27.41 	29.46 	21.33 	27.73


The following shows the time in seconds to compress data of various
sizes.  LZO1x is the
fastest and as file sizes increase, LZO1x time hardly increases at all. 
It's interesing
to note as compression ratios increase LZOx-999 performs much worse than
Zlib.  So LZO is faster
on the low end and slower (5 to 6 times slower) on the high end.

range 	Zlib-1 	Zlib-6 	LZOx-1 	LZOx-999 cmp%
0-250b    	0.001 	0 	0 	0
250-500b   	0 	0 	0 	0.001
500-1KB     	0 	0 	0 	0.001
1KB-10KB    	0.001 	0.001 	0 	0.002
10KB-100KB   	0.004 	0.006 	0.001 	0.017
100KB-200KB  	0.012 	0.017 	0.002 	0.054
200KB-300KB  	0.018 	0.024 	0.003 	0.087
300KB-400KB  	0.022 	0.03 	0.003 	0.121
400KB-500KB  	0.027 	0.037 	0.004 	0.151
500KB-600KB  	0.031 	0.044 	0.004 	0.184
600KB-700KB  	0.035 	0.051 	0.006 	0.211
700KB-800KB  	0.039 	0.057 	0.006 	0.243
800KB-900KB  	0.045 	0.064 	0.006 	0.27
900KB-1MB   	0.049 	0.072 	0.006 	0.307


On 10/11/2015 8:46 AM, Jeff Garzik via bitcoin-dev wrote:
> Comments:
>
> 1) cblock seems a reasonable way to extend the protocol.  Further
> wrapping should probably be done at the stream level.
>
> 2) zlib has crappy security track record.
>
> 3) A fallback path to non-compressed is required, should compression
> fail or crash.
>
> 4) Most blocks and transactions have runs of zeroes and/or highly
> common bit-patterns, which contributes to useful compression even at
> smaller sizes.  Peter Ts's most recent numbers bear this out.  zlib
> has a dictionary (32K?) which works well with repeated patterns such
> as those you see with concatenated runs of transactions.
>
> 5) LZO should provide much better compression, at a cost of CPU
> performance and using a less-reviewed, less-field-tested library.
>
>
>
>
>
> On Tue, Nov 10, 2015 at 11:30 AM, Tier Nolan via bitcoin-dev
> <bitcoin-dev@lists•linuxfoundation.org
> <mailto:bitcoin-dev@lists•linuxfoundation.org>> wrote:
>
>
>
>     On Tue, Nov 10, 2015 at 4:11 PM, Peter Tschipper
>     <peter.tschipper@gmail•com <mailto:peter.tschipper@gmail•com>> wrote:
>
>         There are better ways of sending new blocks, that's certainly
>         true but for sending historical blocks and seding transactions
>         I don't think so.  This PR is really designed to save
>         bandwidth and not intended to be a huge performance
>         improvement in terms of time spent sending.
>
>
>     If the main point is for historical data, then sticking to just
>     blocks is the best plan.
>
>     Since small blocks don't compress well, you could define a
>     "cblocks" message that handles multiple blocks (just concatenate
>     the block messages as payload before compression). 
>
>     The sending peer could combine blocks so that each cblock is
>     compressing at least 10kB of block data (or whatever is optimal). 
>     It is probably worth specifying a maximum size for network buffer
>     reasons (either 1MB or 1 block maximum).
>
>     Similarly, transactions could be combined together and compressed
>     "ctxs".  The inv messages could be modified so that you can
>     request groups of 10-20 transactions.  That would depend on how
>     much of an improvement compressed transactions would represent.
>
>     More generally, you could define a message which is a compressed
>     message holder.  That is probably to complex to be worth the
>     effort though.
>
>      
>
>>
>>         On Tue, Nov 10, 2015 at 5:40 AM, Johnathan Corgan via
>>         bitcoin-dev <bitcoin-dev@lists•linuxfoundation.org
>>         <mailto:bitcoin-dev@lists•linuxfoundation.org>> wrote:
>>
>>             On Mon, Nov 9, 2015 at 5:58 PM, gladoscc via bitcoin-dev
>>             <bitcoin-dev@lists•linuxfoundation.org
>>             <mailto:bitcoin-dev@lists•linuxfoundation.org>> wrote:
>>              
>>
>>                 I think 25% bandwidth savings is certainly
>>                 considerable, especially for people running full
>>                 nodes in countries like Australia where internet
>>                 bandwidth is lower and there are data caps.
>>
>>
>>             ​ This reinforces the idea that such trade-off decisions
>>             should be be local and negotiated between peers, not a
>>             required feature of the network P2P.​
>>              
>>
>>             -- 
>>             Johnathan Corgan
>>             Corgan Labs - SDR Training and Development Services
>>             http://corganlabs.com
>>
>>             _______________________________________________
>>             bitcoin-dev mailing list
>>             bitcoin-dev@lists•linuxfoundation.org
>>             <mailto:bitcoin-dev@lists•linuxfoundation.org>
>>             https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>>
>>
>>
>>
>>         _______________________________________________
>>         bitcoin-dev mailing list
>>         bitcoin-dev@lists•linuxfoundation.org
>>         <mailto:bitcoin-dev@lists•linuxfoundation.org>
>>         https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>
>
>
>     _______________________________________________
>     bitcoin-dev mailing list
>     bitcoin-dev@lists•linuxfoundation.org
>     <mailto:bitcoin-dev@lists•linuxfoundation.org>
>     https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>
>
>
>
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists•linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


[-- Attachment #2: Type: text/html, Size: 47442 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [bitcoin-dev] further test results for : "Datastream Compression of Blocks and Tx's"
  2015-11-28 14:48               ` [bitcoin-dev] further test results for : "Datastream Compression of Blocks and Tx's" Peter Tschipper
@ 2015-11-29  0:30                 ` Jonathan Toomim
  2015-11-29  5:15                   ` Peter Tschipper
  0 siblings, 1 reply; 21+ messages in thread
From: Jonathan Toomim @ 2015-11-29  0:30 UTC (permalink / raw)
  To: Peter Tschipper; +Cc: bitcoin-dev


[-- Attachment #1.1: Type: text/plain, Size: 758 bytes --]

It appears you're using the term "compression ratio" to mean "size reduction". A compression ratio is the ratio (compressed / uncompressed). A 1 kB file compressed with a 10% compression ratio would be 0.1 kB. It seems you're using (1 - compressed/uncompressed), meaning that the compressed file would be 0.9 kB.

On Nov 28, 2015, at 6:48 AM, Peter Tschipper via bitcoin-dev <bitcoin-dev@lists•linuxfoundation.org> wrote:

> The following show the compression ratio acheived for various sizes of data.  Zlib is the clear
> winner for compressibility, with LZOx-999 coming close but at a cost.
> 
> range	Zlib-1 cmp%
> Zlib-6 cmp%	LZOx-1 cmp%	LZOx-999 cmp%
> 0-250b	12.44	12.86	10.79	14.34
> 250-500b 	19.33	12.97	10.34	11.11
> 
> 
> 
> 
> 


[-- Attachment #1.2: Type: text/html, Size: 3157 bytes --]

[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 496 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [bitcoin-dev] further test results for : "Datastream Compression of Blocks and Tx's"
  2015-11-29  0:30                 ` Jonathan Toomim
@ 2015-11-29  5:15                   ` Peter Tschipper
  0 siblings, 0 replies; 21+ messages in thread
From: Peter Tschipper @ 2015-11-29  5:15 UTC (permalink / raw)
  To: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 959 bytes --]

yes, you're right, it's just the percentage compressed (size reduction)

On 28/11/2015 4:30 PM, Jonathan Toomim wrote:
> It appears you're using the term "compression ratio" to mean "size
> reduction". A compression ratio is the ratio (compressed /
> uncompressed). A 1 kB file compressed with a 10% compression ratio
> would be 0.1 kB. It seems you're using (1 - compressed/uncompressed),
> meaning that the compressed file would be 0.9 kB.
>
> On Nov 28, 2015, at 6:48 AM, Peter Tschipper via bitcoin-dev
> <bitcoin-dev@lists•linuxfoundation.org
> <mailto:bitcoin-dev@lists•linuxfoundation.org>> wrote:
>
>> The following show the compression ratio acheived for various sizes
>> of data.  Zlib is the clear
>> winner for compressibility, with LZOx-999 coming close but at a cost.
>>
>> range 	Zlib-1 cmp%
>> 	Zlib-6 cmp% 	LZOx-1 cmp% 	LZOx-999 cmp%
>> 0-250b 	12.44 	12.86 	10.79 	14.34
>> 250-500b  	19.33 	12.97 	10.34 	11.11
>>
>> 	
>> 	
>> 	
>> 	
>>
>


[-- Attachment #2: Type: text/html, Size: 4850 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2015-11-29  5:15 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-09 19:18 [bitcoin-dev] request BIP number for: "Support for Datastream Compression" Peter Tschipper
2015-11-09 20:41 ` Johnathan Corgan
2015-11-09 21:04 ` Bob McElrath
2015-11-10  1:58   ` gladoscc
2015-11-10  5:40     ` Johnathan Corgan
2015-11-10  9:44       ` Tier Nolan
     [not found]         ` <5642172C.701@gmail.com>
2015-11-10 16:17           ` Peter Tschipper
2015-11-10 16:21             ` Jonathan Toomim
2015-11-10 16:30           ` Tier Nolan
2015-11-10 16:46             ` Jeff Garzik
2015-11-10 17:09               ` Peter Tschipper
2015-11-11 18:35               ` Peter Tschipper
2015-11-11 18:49                 ` Marco Pontello
2015-11-11 19:05                   ` Jonathan Toomim
2015-11-13 21:58                     ` [bitcoin-dev] Block Compression (Datastream Compression) test results using the PR#6973 compression prototype Peter Tschipper
2015-11-18 14:00                       ` [bitcoin-dev] More findings: " Peter Tschipper
2015-11-11 19:11                   ` [bitcoin-dev] request BIP number for: "Support for Datastream Compression" Peter Tschipper
2015-11-28 14:48               ` [bitcoin-dev] further test results for : "Datastream Compression of Blocks and Tx's" Peter Tschipper
2015-11-29  0:30                 ` Jonathan Toomim
2015-11-29  5:15                   ` Peter Tschipper
     [not found]             ` <56421F1E.4050302@gmail.com>
2015-11-10 16:46               ` [bitcoin-dev] request BIP number for: "Support for Datastream Compression" Peter Tschipper

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox