public inbox for bitcoindev@googlegroups.com
 help / color / mirror / Atom feed
From: Peter R <peter_r@gmx•com>
To: Pieter Wuille <pieter.wuille@gmail•com>,
	Bitcoin Protocol Discussion
	<bitcoin-dev@lists•linuxfoundation.org>
Subject: Re: [bitcoin-dev] Compact Block Relay BIP
Date: Mon, 9 May 2016 11:34:47 -0700	[thread overview]
Message-ID: <D0F57BE3-EFD2-4557-8D05-704D9C4E5EA1@gmx.com> (raw)
In-Reply-To: <5730C37E.2000004@gmail.com>

Hi Pieter,

> I tried to derive what length of short ids is actually necessary (some
> write-up is on
> https://gist.github.com/sipa/b2eb2e486156b5509ac711edd16153ed but it's
> incomplete).
> 
> For any reasonable numbers I can come up with (in a very wide range),
> the number of bits needed is very well approximated by:
> 
>  log2(#receiver_mempool_txn * #block_txn_not_in_receiver_mempool /
> acceptable_per_block_failure_rate)
> 
> For example, with 20000 mempool transactions, 2500 transactions in a
> block, 95% hitrate, and a chance of 1 in 10000 blocks to fail to
> reconstruct, needed_bits = log2(20000 * 2500 * (1 - 0.95) / 0.0001) =
> 34.54, or 5 byte txids would suffice.
> 
> Note that 1 in 10000 failures may sound like a lot, but this is for each
> individual connection, and since every transmission uses separately
> salted identifiers, occasional failures should not affect global
> propagation. Given that transmission failures due to timeouts, network
> connectivity, ... already occur much more frequently than once every few
> gigabytes (what 10000 blocks corresponds to), that's probably already
> more than enough.
> 
> In short: I believe 5 or 6 byte txids should be enough, but perhaps it
> makes sense to allow the sender to choose (so he can weigh trying
> multiple nonces against increasing the short txid length).

[9 May 16 @ 11am PDT]  

We worked on this with respect to “Xthin" for Bitcoin Unlimited, and came to a similar conclusion.  

But we (I think it was theZerg) also noticed another trick: if the node receiving the thin blocks has a small number of collisions with transactions in its mempool (e.g., 1 or 2), then it can test each possible block against the Merkle root in the block header to determine the correct one.  Using this technique, it should be possible to further reduce the number of bytes used for the txids.  That being said, even thin blocks built from 64-bit short IDs represent a tremendous savings compared to standard block propagation.  So we (Bitcoin Unlimited) decided not to pursue this optimization any further at that time.

***

It’s also interesting to ask what the information-theoretic minimum amount of information necessary for a node to re-construct a block is. The way I’m thinking about this currently[1] is that the node needs all of the transactions in the block that were not initially part of its mempool, plus enough information to select and ordered subset from that mempool that represents the block.  If m is the number of transactions in mempool and n is the number of transactions in the block, then the number of possible subsets (C') is given by the binomial coefficient:

  C' =  m! / [n! (m - n)!]

Since there are n! possible orderings for each subset, the total number of possible blocks (C) of size n from a mempool of size m is

  C = n! C’ = m! / (m-n)!

Assuming that all possible blocks are equally likely, the Shannon entropy (the information that must be communicated) is the base-2 logarithm of the number of possible blocks.  After making some approximations, this works out very close to

   minimum information ~= n * log2(m),

which for your case of 20,000 transactions in mempool (m = 20,000) and a 2500-transaction block (n = 2500), yields

   minimum information = 2500 * log2(20,000) ~ 2500 * 15 bits.

In other words, a lower bound on the information required is about 2 bytes per transactions for every transaction in the block that the node is already aware of, as well as all the missing transactions in full. 

Of course, this assumes an unlimited number of round trips, and it is probably complicated by other factors that I haven’t considered (queue the “spherical cow” jokes :), but I thought it was interesting that a technique like Xthin or compact blocks is already pretty close to this limit.  

Cheers,
Peter 

[1] There are still some things that I can’t wrap my mind around that I’d love to discuss with another math geek :)




  reply	other threads:[~2016-05-09 18:40 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-02 22:13 Matt Corallo
2016-05-03  5:02 ` Gregory Maxwell
2016-05-06  3:09   ` Matt Corallo
2016-05-08  0:40 ` Johnathan Corgan
2016-05-08  3:24   ` Matt Corallo
2016-05-09  9:35     ` Tom Zander
2016-05-09 10:43       ` Gregory Maxwell
2016-05-09 11:32         ` Tom
     [not found]           ` <CAAS2fgR01=SfpAdHhFd_DFa9VNiL=e1g4FiguVRywVVSqFe9rA@mail.gmail.com>
2016-05-09 12:12             ` [bitcoin-dev] Fwd: " Gregory Maxwell
2016-05-09 23:37               ` [bitcoin-dev] " Peter R
2016-05-10  1:42                 ` Peter R
2016-05-10  2:12                 ` Gregory Maxwell
2016-05-09 13:40           ` Peter Todd
2016-05-09 13:57             ` Tom
2016-05-09 14:04               ` Bryan Bishop
2016-05-09 17:06 ` Pieter Wuille
2016-05-09 18:34   ` Peter R [this message]
2016-05-10  5:28   ` Rusty Russell
2016-05-10 10:07     ` Gregory Maxwell
2016-05-10 21:23       ` Rusty Russell
2016-05-11  1:12         ` Matt Corallo
2016-05-18  1:49   ` Matt Corallo
2016-05-08 10:25 Nicolas Dorier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=D0F57BE3-EFD2-4557-8D05-704D9C4E5EA1@gmx.com \
    --to=peter_r@gmx$(echo .)com \
    --cc=bitcoin-dev@lists$(echo .)linuxfoundation.org \
    --cc=pieter.wuille@gmail$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox