public inbox for bitcoindev@googlegroups.com
 help / color / mirror / Atom feed
* Re: [bitcoin-dev] Compact Block Relay BIP
@ 2016-05-08 10:25 Nicolas Dorier
  0 siblings, 0 replies; 22+ messages in thread
From: Nicolas Dorier @ 2016-05-08 10:25 UTC (permalink / raw)
  To: Matt Corallo; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 495 bytes --]

Interesting, can you provide some historical context around it so I
understand better ?
Actually I know that your relay's protocol (and about what I see in
abstract) was about optimizing propagation time and not bandwidth.

And I agree that bandwidth is what need to be optimized for nodes.
So far there was two other proposal that I know only from name and theory
which is xthin block and ILBT which would also have decreased bandwidth.

Can you quickly describe how does it compares to them ?

[-- Attachment #2: Type: text/html, Size: 606 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [bitcoin-dev] Compact Block Relay BIP
  2016-05-09 17:06 ` Pieter Wuille
  2016-05-09 18:34   ` Peter R
  2016-05-10  5:28   ` Rusty Russell
@ 2016-05-18  1:49   ` Matt Corallo
  2 siblings, 0 replies; 22+ messages in thread
From: Matt Corallo @ 2016-05-18  1:49 UTC (permalink / raw)
  To: Pieter Wuille, Bitcoin Protocol Discussion

Implemented a few of your suggestions.

Also opened a formal pull request for the BIP at
https://github.com/bitcoin/bips/pull/389 and the code at
https://github.com/bitcoin/bitcoin/pull/8068.

On 05/09/16 17:06, Pieter Wuille via bitcoin-dev wrote:
> On 05/03/2016 12:13 AM, lf-lists at mattcorallo.com (Matt Corallo) wrote:
>> Hi all,
>>
>> The following is a BIP-formatted design spec for compact block relay
>> designed to limit on wire bytes during block relay. You can find the
>> latest version of this document at
>> https://github.com/TheBlueMatt/bips/blob/master/bip-TODO.mediawiki.
> 
> Hi Matt,
> 
> thank you for working on this!
> 
>> ===New data structures===
>> Several new data structures are added to the P2P network to relay
>> compact blocks: PrefilledTransaction, HeaderAndShortIDs,
>> BlockTransactionsRequest, and BlockTransactions. Additionally, we
>> introduce a new variable-length integer encoding for use in these data
>> structures.
>>
>> For the purposes of this section, CompactSize refers to the
>> variable-length integer encoding used across the existing P2P protocol
>> to encode array lengths, among other things, in 1, 3, 5 or 9 bytes.
> 
> This is a not, but I think it's a bit strange to have two separate
> variable length integers in the same specification. I understand is one
> is already the default for variable-length integers currently, and there
> are reasons to use the other one for efficiency reasons in some places,
> but perhaps we should aim to get everything using the latter?

Fixed, the whole thing now uses New Varints.

>> ====New VarInt====
>> Variable-length integers: bytes are a MSB base-128 encoding of the number.
>> The high bit in each byte signifies whether another digit follows. To make
>> sure the encoding is one-to-one, one is subtracted from all but the last
>> digit.
> 
> Maybe it's worth mentioning that it is based on ASN.1 BER's compressed
> integer format (see
> https://www.itu.int/ITU-T/studygroups/com17/languages/X.690-0207.pdf
> section 8.1.3.5), though with a small modification to make every integer
> have a single unique encoding.
> 
>> ====HeaderAndShortIDs====
>> A HeaderAndShortIDs structure is used to relay a block header, the short
>> transactions IDs used for matching already-available transactions, and a
>> select few transactions which we expect a peer may be missing.
>>
>> |shortids||List of uint64_ts||8*shortids_length bytes||Little
>> Endian||The short transaction IDs calculated from the transactions which
>> were not provided explicitly in prefilledtxn
> 
> I tried to derive what length of short ids is actually necessary (some
> write-up is on
> https://gist.github.com/sipa/b2eb2e486156b5509ac711edd16153ed but it's
> incomplete).
> 
> For any reasonable numbers I can come up with (in a very wide range),
> the number of bits needed is very well approximated by:
> 
>   log2(#receiver_mempool_txn * #block_txn_not_in_receiver_mempool /
> acceptable_per_block_failure_rate)
> 
> For example, with 20000 mempool transactions, 2500 transactions in a
> block, 95% hitrate, and a chance of 1 in 10000 blocks to fail to
> reconstruct, needed_bits = log2(20000 * 2500 * (1 - 0.95) / 0.0001) =
> 34.54, or 5 byte txids would suffice.
> 
> Note that 1 in 10000 failures may sound like a lot, but this is for each
> individual connection, and since every transmission uses separately
> salted identifiers, occasional failures should not affect global
> propagation. Given that transmission failures due to timeouts, network
> connectivity, ... already occur much more frequently than once every few
> gigabytes (what 10000 blocks corresponds to), that's probably already
> more than enough.
> 
> In short: I believe 5 or 6 byte txids should be enough, but perhaps it
> makes sense to allow the sender to choose (so he can weigh trying
> multiple nonces against increasing the short txid length).

I switched to 6-byte short txids.

>> ====Short transaction IDs====
>> Short transaction IDs are used to represent a transaction without
>> sending a full 256-bit hash. They are calculated by:
>> # single-SHA256 hashing the block header with the nonce appended (in
>> little-endian)
>> # XORing each 8-byte chunk of the double-SHA256 transaction hash with
>> each corresponding 8-byte chunk of the hash from the previous step
>> # Adding each of the XORed 8-byte chunks together (in little-endian)
>> iteratively to find the short transaction ID
> 
> An alternative would be using SipHash-1-3 (a form of SipHash with
> reduced iteration counts; the default is SipHash-2-4). SipHash was
> designed as a Message Authentication Code, where the security
> requirements are much stronger than in our case (in particular, we don't
> care about observers being able to finding the key, as the key is just
> public knowledge here). One of the designers of SipHash has commented
> that SipHash-1-3 for collision resistance in hash tables may be enough:
> https://github.com/rust-lang/rust/issues/29754#issuecomment-156073946
> 
> Using SipHash-1-3 on modern hardware would take ~32 CPU cycles per txid.

Switched to SipHash2-4.

>> ===Implementation Notes===
> 
> There are a few more heuristics that MAY be used to improve performance:
> 
> * Receivers should treat short txids in blocks that match multiple
> mempool transactions as non-matches, and request the transactions. This
> significantly reduces the failure to reconstruct.

Done.

> * When constructing a compact block to send, the sender can verify it
> against its own mempool to check for collisions, and if so, choose to
> either try another nonce, or increase the short txid length.

Additionally we should compare to the orphan pool (which apparently
helps a lot).


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [bitcoin-dev] Compact Block Relay BIP
  2016-05-10 21:23       ` Rusty Russell
@ 2016-05-11  1:12         ` Matt Corallo
  0 siblings, 0 replies; 22+ messages in thread
From: Matt Corallo @ 2016-05-11  1:12 UTC (permalink / raw)
  To: Rusty Russell, Bitcoin Protocol Discussion,
	Rusty Russell via bitcoin-dev, Gregory Maxwell

Replies inline.

On May 10, 2016 5:23:55 PM EDT, Rusty Russell via bitcoin-dev <bitcoin-dev@lists•linuxfoundation.org> wrote:
>Gregory Maxwell <greg@xiph•org> writes:
>> On Tue, May 10, 2016 at 5:28 AM, Rusty Russell via bitcoin-dev
>> <bitcoin-dev@lists•linuxfoundation.org> wrote:
>>> I used variable-length bit encodings, and used the shortest encoding
>>> which is unique to you (including mempool).  It's a little more
>work,
>>> but for an average node transmitting a block with 1300 txs and
>another
>>> ~3000 in the mempool, you expect about 12 bits per transaction. 
>IOW,
>>> about 1/5 of your current size.  Critically, we might be able to fit
>in
>>> two or three TCP packets.
>>
>> Hm. 12 bits sounds very small even giving those figures. Why failure
>> rate were you targeting?
>
>That's a good question; I was assuming a best-case in which we have
>mempool set reconciliation (handwave) thus know they are close.  But
>there's also an alterior motive: any later more sophisticated approach
>will want variable-length IDs, and I'd like Matt to do the work :)

Yea, there's already an ongoing discussion of that, and the UDP stuff will definitely want something different than the current proposals.

>In particular, you can significantly narrow the possibilities for a
>block by sending the min-fee-per-kb and a list of "txs in my mempool
>which didn't get in" and "txs which did despite not making the
>fee-per-kb".  Those turn out to be tiny, and often make set
>reconciliation trivial.  That's best done with variable-length IDs.
>
>> (*Not interesting because it mostly reduces exposure to loss and the
>> gods of TCP, but since those are the long poles in the latency tent,
>> it's best to escape them entirely, see Matt's udp_wip branch.)
>
>I'm not convinced on UDP; it always looks impressive, but then ends up
>reimplementing TCP in practice.  We should be well within a TCP window
>for these, so it's hard to see where we'd win.

Not at all. The goal with the UDP stuff I've been working on is not to provide reliable transport. Like the relay network, it is assumed some percent of blocks will fail to transit properly, and you will use some other transport to figure out how to get the block. Indeed, a big part of my desire for diversity in network protocols is to enable them to make tradeoffs in reliability/privacy/etc.

>>> I would also avoid the nonce to save recalculating for each node,
>and
>>> instead define an id as:
>>
>> Doing this would greatly increase the cost of a collision though, as
>> it would happen in many places in the network at once over the on the
>> network at once, rather than just happening on a single link, thus
>> hardly impacting overall propagation.
>
>"Greatly increase"?  I don't see that.
>
>Let's assume an attacker grinds out 10,000 txs with 128 bits of the
>same
>TXID, and gets them all in a block.  They then win the lottery and get
>a
>collision.  Now we have to transmit ~48 bytes more than expected.

I assume what Greg was referring to the idea that if there is a conflict, a given block will require an extra round trip when being broadcast between roughly each peer, compounding the effect across each hop.

>> Using the same nonce means you also would not get a recovery gain
>from
>> jointly decoding using compact blocks sent from multiple peers (which
>> you'll have anyways in high bandwidth mode).
>
>Not quite true, since if their mempools differ they'll use different
>encoding lengths, but yes, you'll get less of this.

... Assuming different encoding lengths aren't just truncated, but ok :).

>> With a nonce a sender does have the option of reusing what they got--
>> but the actual encoding cost is negligible, for a 2500 transaction
>> block its 27 microseconds (once per block, shared across all peers)
>> using Pieter's suggestion of siphash 1-3 instead of the cheaper
>> construct in the current draft.
>>
>> Of course, if you're going to check your whole mempool to reroll the
>> nonce, thats another matter-- but that seems wasteful compared to
>just
>> using a table driven size with a known negligible failure rate.
>
>I'm not worried about the sender: The recipient needs to encode all the
>mempool.
>
>>> As Peter R points out, we could later enhance receiver to brute
>force
>>> collisions (you could speed that by sending a XOR of all the txids,
>but
>>> really if there are more than a few collisions, give up).
>>
>> The band between "no collisions" and "infeasible many" is fairly
>> narrow.  You can add a small amount more space to the ids and
>> immediately be in the no collision zone.
>
>Indeed, I would be adding extra bits in the sender and not implementing
>brute force in the receiver.  But I welcome someone else to do so.
>
>Cheers,
>Rusty.
>_______________________________________________
>bitcoin-dev mailing list
>bitcoin-dev@lists•linuxfoundation.org
>https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [bitcoin-dev] Compact Block Relay BIP
  2016-05-10 10:07     ` Gregory Maxwell
@ 2016-05-10 21:23       ` Rusty Russell
  2016-05-11  1:12         ` Matt Corallo
  0 siblings, 1 reply; 22+ messages in thread
From: Rusty Russell @ 2016-05-10 21:23 UTC (permalink / raw)
  To: Gregory Maxwell, Bitcoin Protocol Discussion

Gregory Maxwell <greg@xiph•org> writes:
> On Tue, May 10, 2016 at 5:28 AM, Rusty Russell via bitcoin-dev
> <bitcoin-dev@lists•linuxfoundation.org> wrote:
>> I used variable-length bit encodings, and used the shortest encoding
>> which is unique to you (including mempool).  It's a little more work,
>> but for an average node transmitting a block with 1300 txs and another
>> ~3000 in the mempool, you expect about 12 bits per transaction.  IOW,
>> about 1/5 of your current size.  Critically, we might be able to fit in
>> two or three TCP packets.
>
> Hm. 12 bits sounds very small even giving those figures. Why failure
> rate were you targeting?

That's a good question; I was assuming a best-case in which we have
mempool set reconciliation (handwave) thus know they are close.  But
there's also an alterior motive: any later more sophisticated approach
will want variable-length IDs, and I'd like Matt to do the work :)

In particular, you can significantly narrow the possibilities for a
block by sending the min-fee-per-kb and a list of "txs in my mempool
which didn't get in" and "txs which did despite not making the
fee-per-kb".  Those turn out to be tiny, and often make set
reconciliation trivial.  That's best done with variable-length IDs.

> (*Not interesting because it mostly reduces exposure to loss and the
> gods of TCP, but since those are the long poles in the latency tent,
> it's best to escape them entirely, see Matt's udp_wip branch.)

I'm not convinced on UDP; it always looks impressive, but then ends up
reimplementing TCP in practice.  We should be well within a TCP window
for these, so it's hard to see where we'd win.

>> I would also avoid the nonce to save recalculating for each node, and
>> instead define an id as:
>
> Doing this would greatly increase the cost of a collision though, as
> it would happen in many places in the network at once over the on the
> network at once, rather than just happening on a single link, thus
> hardly impacting overall propagation.

"Greatly increase"?  I don't see that.

Let's assume an attacker grinds out 10,000 txs with 128 bits of the same
TXID, and gets them all in a block.  They then win the lottery and get a
collision.  Now we have to transmit ~48 bytes more than expected.

> Using the same nonce means you also would not get a recovery gain from
> jointly decoding using compact blocks sent from multiple peers (which
> you'll have anyways in high bandwidth mode).

Not quite true, since if their mempools differ they'll use different
encoding lengths, but yes, you'll get less of this.

> With a nonce a sender does have the option of reusing what they got--
> but the actual encoding cost is negligible, for a 2500 transaction
> block its 27 microseconds (once per block, shared across all peers)
> using Pieter's suggestion of siphash 1-3 instead of the cheaper
> construct in the current draft.
>
> Of course, if you're going to check your whole mempool to reroll the
> nonce, thats another matter-- but that seems wasteful compared to just
> using a table driven size with a known negligible failure rate.

I'm not worried about the sender: The recipient needs to encode all the
mempool.

>> As Peter R points out, we could later enhance receiver to brute force
>> collisions (you could speed that by sending a XOR of all the txids, but
>> really if there are more than a few collisions, give up).
>
> The band between "no collisions" and "infeasible many" is fairly
> narrow.  You can add a small amount more space to the ids and
> immediately be in the no collision zone.

Indeed, I would be adding extra bits in the sender and not implementing
brute force in the receiver.  But I welcome someone else to do so.

Cheers,
Rusty.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [bitcoin-dev] Compact Block Relay BIP
  2016-05-10  5:28   ` Rusty Russell
@ 2016-05-10 10:07     ` Gregory Maxwell
  2016-05-10 21:23       ` Rusty Russell
  0 siblings, 1 reply; 22+ messages in thread
From: Gregory Maxwell @ 2016-05-10 10:07 UTC (permalink / raw)
  To: Rusty Russell, Bitcoin Protocol Discussion

On Tue, May 10, 2016 at 5:28 AM, Rusty Russell via bitcoin-dev
<bitcoin-dev@lists•linuxfoundation.org> wrote:
> I used variable-length bit encodings, and used the shortest encoding
> which is unique to you (including mempool).  It's a little more work,
> but for an average node transmitting a block with 1300 txs and another
> ~3000 in the mempool, you expect about 12 bits per transaction.  IOW,
> about 1/5 of your current size.  Critically, we might be able to fit in
> two or three TCP packets.

Hm. 12 bits sounds very small even giving those figures. Why failure
rate were you targeting?

I've mostly been thing in terms of 3000 txn, and 20k mempools, and
blocks which are 90% consistent with the remote mempool, targeting
1/100000 failure rates (which is roughly where it should be to put it
well below link failure levels).

If going down the path of more complexity, set reconciliation is
enormously more efficient (e.g. 90% reduction), which no amount of
packing/twiddling can achieve.

But the savings of going from 20kb to 3kb is not interesting enough to
justify it*.  My expectation is that later we'll deploy set
reconciliation to fix relay efficiency, where the savings is _much_
larger,  and then with the infrastructure in place we could define
another compactblock mode that used it.

(*Not interesting because it mostly reduces exposure to loss and the
gods of TCP, but since those are the long poles in the latency tent,
it's best to escape them entirely, see Matt's udp_wip branch.)

> I would also avoid the nonce to save recalculating for each node, and
> instead define an id as:

Doing this would greatly increase the cost of a collision though, as
it would happen in many places in the network at once over the on the
network at once, rather than just happening on a single link, thus
hardly impacting overall propagation.

(The downside of the nonce is that you get an exponential increase in
the rate that a collision happens "somewhere", but links fail
"somewhere" all the time-- propagation overall doesn't care about
that.)

Using the same nonce means you also would not get a recovery gain from
jointly decoding using compact blocks sent from multiple peers (which
you'll have anyways in high bandwidth mode).

With a nonce a sender does have the option of reusing what they got--
but the actual encoding cost is negligible, for a 2500 transaction
block its 27 microseconds (once per block, shared across all peers)
using Pieter's suggestion of siphash 1-3 instead of the cheaper
construct in the current draft.

Of course, if you're going to check your whole mempool to reroll the
nonce, thats another matter-- but that seems wasteful compared to just
using a table driven size with a known negligible failure rate.

64-bits as a maximum length is high enough that the collision rate
would be negligible even under fairly unrealistic assumptions-- so
long as it's salted. :)

> As Peter R points out, we could later enhance receiver to brute force
> collisions (you could speed that by sending a XOR of all the txids, but
> really if there are more than a few collisions, give up).

The band between "no collisions" and "infeasible many" is fairly
narrow.  You can add a small amount more space to the ids and
immediately be in the no collision zone.

Some earlier work we had would send small amount of erasure coding
data of the next couple bytes of the IDs.  E.g. the receiver in all
the IDs you know, mark totally unknown IDs as erased and the let the
error correction fix the rest. This let you algebraically resolve
collisions _far_ beyond what could be feasibly bruteforced. Pieter
went and implemented... but the added cost of encoding and software
complexity seem not worth it.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [bitcoin-dev] Compact Block Relay BIP
  2016-05-09 17:06 ` Pieter Wuille
  2016-05-09 18:34   ` Peter R
@ 2016-05-10  5:28   ` Rusty Russell
  2016-05-10 10:07     ` Gregory Maxwell
  2016-05-18  1:49   ` Matt Corallo
  2 siblings, 1 reply; 22+ messages in thread
From: Rusty Russell @ 2016-05-10  5:28 UTC (permalink / raw)
  To: Pieter Wuille, Bitcoin Protocol Discussion, bitcoin-dev

Pieter Wuille via bitcoin-dev <bitcoin-dev@lists•linuxfoundation.org> writes:
> On 05/03/2016 12:13 AM, lf-lists at mattcorallo.com (Matt Corallo) wrote:
>> Hi all,
>> 
>> The following is a BIP-formatted design spec for compact block relay
>> designed to limit on wire bytes during block relay. You can find the
>> latest version of this document at
>> https://github.com/TheBlueMatt/bips/blob/master/bip-TODO.mediawiki.
>
> Hi Matt,
>
> thank you for working on this!

Indeed!  Sorry for the delayed feedback.

>> |shortids||List of uint64_ts||8*shortids_length bytes||Little
>> Endian||The short transaction IDs calculated from the transactions which
>> were not provided explicitly in prefilledtxn
>
> I tried to derive what length of short ids is actually necessary (some
> write-up is on
> https://gist.github.com/sipa/b2eb2e486156b5509ac711edd16153ed but it's
> incomplete).

I did this for IBLT testing.

I used variable-length bit encodings, and used the shortest encoding
which is unique to you (including mempool).  It's a little more work,
but for an average node transmitting a block with 1300 txs and another
~3000 in the mempool, you expect about 12 bits per transaction.  IOW,
about 1/5 of your current size.  Critically, we might be able to fit in
two or three TCP packets.

The wire encoding of all those bit arrays was:
  [varint-min-numbits] - Shortest bit array length
  [varint-array-size]  - Number of bit arrays.
          [varint-num].... - Number of entries in array N (x varint-array-size)
  [packed-bit-arrays...]

  Last byte was padded with zeros.
  See: https://github.com/rustyrussell/bitcoin-iblt/blob/master/wire_encode.cpp#L12

I would also avoid the nonce to save recalculating for each node, and
instead define an id as:

        [<64-bit-short-id>][txid]

Since you only ever send as many bits as needed to distinguish, this only
makes a difference if there actually are collisions.

As Peter R points out, we could later enhance receiver to brute force
collisions (you could speed that by sending a XOR of all the txids, but
really if there are more than a few collisions, give up).

And a prototype could just always send 64-bit ids to start.

Cheers,
Rusty.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [bitcoin-dev] Compact Block Relay BIP
  2016-05-09 23:37               ` [bitcoin-dev] " Peter R
  2016-05-10  1:42                 ` Peter R
@ 2016-05-10  2:12                 ` Gregory Maxwell
  1 sibling, 0 replies; 22+ messages in thread
From: Gregory Maxwell @ 2016-05-10  2:12 UTC (permalink / raw)
  To: Peter R; +Cc: Bitcoin Development Discussion

On Mon, May 9, 2016 at 11:37 PM, Peter R <peter_r@gmx•com> wrote:
> It is a standard result that there are
>     m! / [n! (m-n)!]
> ways of picking n numbers from a set of m numbers, so there are
>
>     (2^32)! / [2! (2^32 - 2)!] ~ 2^63
> possible pairs in a set of 2^32 transactions.  So wouldn’t you have to perform approximately 2^63 comparisons in order to identify which pair of transactions are the two that collide?
>
> Perhaps I made an error or there is a faster way to scan your set to find the collision.  Happy to be corrected…

$ echo -n Perhaps. 00000000f2736d91 |sha256sum
359dfa6d4c2eb2ac81535392d68af4b5e1cb6d9c6321e8f111d3244329b6a4d8
$ echo -n Perhaps. 0000000011ac0388 |sha256sum
359dfa6d4c2eb2ac44d54d0ceeb2212500cb34617b9360695432f6c0fde9b006

Try search term "collision", or there may be an undergrad Data
structures and algorithms coarse online-- you want something covering
"cycle finding".

(Though even ignoring efficient cycle finding, your factorial argument
doesn't hold... you can simply sort the data... Search term
"quicksort" for a relevant algorithm).


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [bitcoin-dev] Compact Block Relay BIP
  2016-05-09 23:37               ` [bitcoin-dev] " Peter R
@ 2016-05-10  1:42                 ` Peter R
  2016-05-10  2:12                 ` Gregory Maxwell
  1 sibling, 0 replies; 22+ messages in thread
From: Peter R @ 2016-05-10  1:42 UTC (permalink / raw)
  To: Bitcoin Protocol Discussion

[9 May 16 @ 6:40 PDT]

For those interested in the hash collision attack discussion, it turns out there is a faster way to scan your set to find the collision:  you’d keep a sorted list of the hashes for each TX you generate and then use binary search to check that list for a collision for each new TX you randomly generate. Performing these operations can probably be reduced to N lg N complexity, which is doable for N ~2^32.   In other words, I now agree that the attack is feasible.  

Cheers,
Peter

hat tip to egs

> On May 9, 2016, at 4:37 PM, Peter R via bitcoin-dev <bitcoin-dev@lists•linuxfoundation.org> wrote:
> 
> Greg Maxwell wrote:
> 
>> What are you talking about? You seem profoundly confused here...
>> 
>> I obtain some txouts. I write a transaction spending them in malleable
>> form (e.g. sighash single and an op_return output).. then grind the
>> extra output to produce different hashes.  After doing this 2^32 times
>> I am likely to find two which share the same initial 8 bytes of txid.
> 
> [9 May 16 @ 4:30 PDT]
> 
> I’m trying to understand the collision attack that you're explaining to Tom Zander.  
> 
> Mathematica is telling me that if I generated 2^32 random transactions, that the chances that the initial 64-bits on one of the pairs of transactions is about 40%.  So I am following you up to this point.  Indeed, there is a good chance that a pair of transactions from a set of 2^32 will have a collision in the first 64 bits.  
> 
> But how do you actually find that pair from within your large set?  The only way I can think of is to check if the first 64-bits is equal for every possible pair until I find it.  How many possible pairs are there?  
> 
> It is a standard result that there are 
> 
>    m! / [n! (m-n)!] 
> 
> ways of picking n numbers from a set of m numbers, so there are
> 
>    (2^32)! / [2! (2^32 - 2)!] ~ 2^63
> 
> possible pairs in a set of 2^32 transactions.  So wouldn’t you have to perform approximately 2^63 comparisons in order to identify which pair of transactions are the two that collide?
> 
> Perhaps I made an error or there is a faster way to scan your set to find the collision.  Happy to be corrected…
> 
> Best regards,
> Peter
> 
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists•linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [bitcoin-dev] Compact Block Relay BIP
  2016-05-09 12:12             ` [bitcoin-dev] Fwd: " Gregory Maxwell
@ 2016-05-09 23:37               ` Peter R
  2016-05-10  1:42                 ` Peter R
  2016-05-10  2:12                 ` Gregory Maxwell
  0 siblings, 2 replies; 22+ messages in thread
From: Peter R @ 2016-05-09 23:37 UTC (permalink / raw)
  To: Gregory Maxwell, Bitcoin Development Discussion

Greg Maxwell wrote:

> What are you talking about? You seem profoundly confused here...
> 
> I obtain some txouts. I write a transaction spending them in malleable
> form (e.g. sighash single and an op_return output).. then grind the
> extra output to produce different hashes.  After doing this 2^32 times
> I am likely to find two which share the same initial 8 bytes of txid.

[9 May 16 @ 4:30 PDT]

I’m trying to understand the collision attack that you're explaining to Tom Zander.  

Mathematica is telling me that if I generated 2^32 random transactions, that the chances that the initial 64-bits on one of the pairs of transactions is about 40%.  So I am following you up to this point.  Indeed, there is a good chance that a pair of transactions from a set of 2^32 will have a collision in the first 64 bits.  

But how do you actually find that pair from within your large set?  The only way I can think of is to check if the first 64-bits is equal for every possible pair until I find it.  How many possible pairs are there?  

It is a standard result that there are 

    m! / [n! (m-n)!] 

ways of picking n numbers from a set of m numbers, so there are

    (2^32)! / [2! (2^32 - 2)!] ~ 2^63

possible pairs in a set of 2^32 transactions.  So wouldn’t you have to perform approximately 2^63 comparisons in order to identify which pair of transactions are the two that collide?

Perhaps I made an error or there is a faster way to scan your set to find the collision.  Happy to be corrected…

Best regards,
Peter



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [bitcoin-dev] Compact Block Relay BIP
  2016-05-09 17:06 ` Pieter Wuille
@ 2016-05-09 18:34   ` Peter R
  2016-05-10  5:28   ` Rusty Russell
  2016-05-18  1:49   ` Matt Corallo
  2 siblings, 0 replies; 22+ messages in thread
From: Peter R @ 2016-05-09 18:34 UTC (permalink / raw)
  To: Pieter Wuille, Bitcoin Protocol Discussion

Hi Pieter,

> I tried to derive what length of short ids is actually necessary (some
> write-up is on
> https://gist.github.com/sipa/b2eb2e486156b5509ac711edd16153ed but it's
> incomplete).
> 
> For any reasonable numbers I can come up with (in a very wide range),
> the number of bits needed is very well approximated by:
> 
>  log2(#receiver_mempool_txn * #block_txn_not_in_receiver_mempool /
> acceptable_per_block_failure_rate)
> 
> For example, with 20000 mempool transactions, 2500 transactions in a
> block, 95% hitrate, and a chance of 1 in 10000 blocks to fail to
> reconstruct, needed_bits = log2(20000 * 2500 * (1 - 0.95) / 0.0001) =
> 34.54, or 5 byte txids would suffice.
> 
> Note that 1 in 10000 failures may sound like a lot, but this is for each
> individual connection, and since every transmission uses separately
> salted identifiers, occasional failures should not affect global
> propagation. Given that transmission failures due to timeouts, network
> connectivity, ... already occur much more frequently than once every few
> gigabytes (what 10000 blocks corresponds to), that's probably already
> more than enough.
> 
> In short: I believe 5 or 6 byte txids should be enough, but perhaps it
> makes sense to allow the sender to choose (so he can weigh trying
> multiple nonces against increasing the short txid length).

[9 May 16 @ 11am PDT]  

We worked on this with respect to “Xthin" for Bitcoin Unlimited, and came to a similar conclusion.  

But we (I think it was theZerg) also noticed another trick: if the node receiving the thin blocks has a small number of collisions with transactions in its mempool (e.g., 1 or 2), then it can test each possible block against the Merkle root in the block header to determine the correct one.  Using this technique, it should be possible to further reduce the number of bytes used for the txids.  That being said, even thin blocks built from 64-bit short IDs represent a tremendous savings compared to standard block propagation.  So we (Bitcoin Unlimited) decided not to pursue this optimization any further at that time.

***

It’s also interesting to ask what the information-theoretic minimum amount of information necessary for a node to re-construct a block is. The way I’m thinking about this currently[1] is that the node needs all of the transactions in the block that were not initially part of its mempool, plus enough information to select and ordered subset from that mempool that represents the block.  If m is the number of transactions in mempool and n is the number of transactions in the block, then the number of possible subsets (C') is given by the binomial coefficient:

  C' =  m! / [n! (m - n)!]

Since there are n! possible orderings for each subset, the total number of possible blocks (C) of size n from a mempool of size m is

  C = n! C’ = m! / (m-n)!

Assuming that all possible blocks are equally likely, the Shannon entropy (the information that must be communicated) is the base-2 logarithm of the number of possible blocks.  After making some approximations, this works out very close to

   minimum information ~= n * log2(m),

which for your case of 20,000 transactions in mempool (m = 20,000) and a 2500-transaction block (n = 2500), yields

   minimum information = 2500 * log2(20,000) ~ 2500 * 15 bits.

In other words, a lower bound on the information required is about 2 bytes per transactions for every transaction in the block that the node is already aware of, as well as all the missing transactions in full. 

Of course, this assumes an unlimited number of round trips, and it is probably complicated by other factors that I haven’t considered (queue the “spherical cow” jokes :), but I thought it was interesting that a technique like Xthin or compact blocks is already pretty close to this limit.  

Cheers,
Peter 

[1] There are still some things that I can’t wrap my mind around that I’d love to discuss with another math geek :)




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [bitcoin-dev] Compact Block Relay BIP
  2016-05-02 22:13 Matt Corallo
  2016-05-03  5:02 ` Gregory Maxwell
  2016-05-08  0:40 ` Johnathan Corgan
@ 2016-05-09 17:06 ` Pieter Wuille
  2016-05-09 18:34   ` Peter R
                     ` (2 more replies)
  2 siblings, 3 replies; 22+ messages in thread
From: Pieter Wuille @ 2016-05-09 17:06 UTC (permalink / raw)
  To: bitcoin-dev

On 05/03/2016 12:13 AM, lf-lists at mattcorallo.com (Matt Corallo) wrote:
> Hi all,
> 
> The following is a BIP-formatted design spec for compact block relay
> designed to limit on wire bytes during block relay. You can find the
> latest version of this document at
> https://github.com/TheBlueMatt/bips/blob/master/bip-TODO.mediawiki.

Hi Matt,

thank you for working on this!

> ===New data structures===
> Several new data structures are added to the P2P network to relay
> compact blocks: PrefilledTransaction, HeaderAndShortIDs,
> BlockTransactionsRequest, and BlockTransactions. Additionally, we
> introduce a new variable-length integer encoding for use in these data
> structures.
> 
> For the purposes of this section, CompactSize refers to the
> variable-length integer encoding used across the existing P2P protocol
> to encode array lengths, among other things, in 1, 3, 5 or 9 bytes.

This is a not, but I think it's a bit strange to have two separate
variable length integers in the same specification. I understand is one
is already the default for variable-length integers currently, and there
are reasons to use the other one for efficiency reasons in some places,
but perhaps we should aim to get everything using the latter?

> ====New VarInt====
> Variable-length integers: bytes are a MSB base-128 encoding of the number.
> The high bit in each byte signifies whether another digit follows. To make
> sure the encoding is one-to-one, one is subtracted from all but the last
> digit.

Maybe it's worth mentioning that it is based on ASN.1 BER's compressed
integer format (see
https://www.itu.int/ITU-T/studygroups/com17/languages/X.690-0207.pdf
section 8.1.3.5), though with a small modification to make every integer
have a single unique encoding.

> ====HeaderAndShortIDs====
> A HeaderAndShortIDs structure is used to relay a block header, the short
> transactions IDs used for matching already-available transactions, and a
> select few transactions which we expect a peer may be missing.
> 
> |shortids||List of uint64_ts||8*shortids_length bytes||Little
> Endian||The short transaction IDs calculated from the transactions which
> were not provided explicitly in prefilledtxn

I tried to derive what length of short ids is actually necessary (some
write-up is on
https://gist.github.com/sipa/b2eb2e486156b5509ac711edd16153ed but it's
incomplete).

For any reasonable numbers I can come up with (in a very wide range),
the number of bits needed is very well approximated by:

  log2(#receiver_mempool_txn * #block_txn_not_in_receiver_mempool /
acceptable_per_block_failure_rate)

For example, with 20000 mempool transactions, 2500 transactions in a
block, 95% hitrate, and a chance of 1 in 10000 blocks to fail to
reconstruct, needed_bits = log2(20000 * 2500 * (1 - 0.95) / 0.0001) =
34.54, or 5 byte txids would suffice.

Note that 1 in 10000 failures may sound like a lot, but this is for each
individual connection, and since every transmission uses separately
salted identifiers, occasional failures should not affect global
propagation. Given that transmission failures due to timeouts, network
connectivity, ... already occur much more frequently than once every few
gigabytes (what 10000 blocks corresponds to), that's probably already
more than enough.

In short: I believe 5 or 6 byte txids should be enough, but perhaps it
makes sense to allow the sender to choose (so he can weigh trying
multiple nonces against increasing the short txid length).

> ====Short transaction IDs====
> Short transaction IDs are used to represent a transaction without
> sending a full 256-bit hash. They are calculated by:
> # single-SHA256 hashing the block header with the nonce appended (in
> little-endian)
> # XORing each 8-byte chunk of the double-SHA256 transaction hash with
> each corresponding 8-byte chunk of the hash from the previous step
> # Adding each of the XORed 8-byte chunks together (in little-endian)
> iteratively to find the short transaction ID

An alternative would be using SipHash-1-3 (a form of SipHash with
reduced iteration counts; the default is SipHash-2-4). SipHash was
designed as a Message Authentication Code, where the security
requirements are much stronger than in our case (in particular, we don't
care about observers being able to finding the key, as the key is just
public knowledge here). One of the designers of SipHash has commented
that SipHash-1-3 for collision resistance in hash tables may be enough:
https://github.com/rust-lang/rust/issues/29754#issuecomment-156073946

Using SipHash-1-3 on modern hardware would take ~32 CPU cycles per txid.

> ===Implementation Notes===

There are a few more heuristics that MAY be used to improve performance:

* Receivers should treat short txids in blocks that match multiple
mempool transactions as non-matches, and request the transactions. This
significantly reduces the failure to reconstruct.

* When constructing a compact block to send, the sender can verify it
against its own mempool to check for collisions, and if so, choose to
either try another nonce, or increase the short txid length.

Cheers,

-- 
Pieter


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [bitcoin-dev] Compact Block Relay BIP
  2016-05-09 13:57             ` Tom
@ 2016-05-09 14:04               ` Bryan Bishop
  0 siblings, 0 replies; 22+ messages in thread
From: Bryan Bishop @ 2016-05-09 14:04 UTC (permalink / raw)
  To: Tom, Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 696 bytes --]

On Mon, May 9, 2016 at 8:57 AM, Tom via bitcoin-dev <
bitcoin-dev@lists•linuxfoundation.org> wrote:

> The moderators failed to catch his aggressive tone while moderating my post
> (see archives) for being too aggressive.
>

IIRC you were previously informed by moderators (on the same reddit thread
to which you refer) that it would seem you had canceled your email from the
moderation queue, contrary to your retelling above. This is now reaching
far into off-topic and further posts on this subject should be sent to
bitcoin-discuss@lists•linuxfoundation.org or
bitcoin-dev-owners@lists•linuxfoundation.org instead of the bitcoin-dev
mailing list.

- Bryan
http://heybryan.org/
1 512 203 0507

[-- Attachment #2: Type: text/html, Size: 1284 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [bitcoin-dev] Compact Block Relay BIP
  2016-05-09 13:40           ` Peter Todd
@ 2016-05-09 13:57             ` Tom
  2016-05-09 14:04               ` Bryan Bishop
  0 siblings, 1 reply; 22+ messages in thread
From: Tom @ 2016-05-09 13:57 UTC (permalink / raw)
  To: Peter Todd; +Cc: Bitcoin Development Discussion

On Monday 09 May 2016 13:40:55 Peter Todd wrote:
> >> [It's a little disconcerting that you appear to be maintaining a fork
> >> and are unaware of this.]
> >
> >ehm...
> 
> Can you please explain why you moved the above part of gmaxwell's reply to
> here,

A personal attack had no place in the technical discussion, I moved it out.



Initially I asked him to please avoid personal attacks, but I thought better 
of it and edited my reply to just "ehm...".


The moderators failed to catch his aggressive tone while moderating my post 
(see archives) for being too aggressive.

I'm sure this message will also not be allowed through. I would not even blame 
the moderators since this, and Peters, messages were both off-topic.

I thank you for todays talks, it makes me certain of the thing I said this 
weekend on Reddit that this list is not a suitable place for all the different 
stakeholders to talk on a level playing field.

If any of you agree, please urge the approach that we replace the entire 
moderation team with a new one. This will be the least painful solution for 
everyone in the ecosystem.

Thanks again.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [bitcoin-dev] Compact Block Relay BIP
  2016-05-09 11:32         ` Tom
@ 2016-05-09 13:40           ` Peter Todd
  2016-05-09 13:57             ` Tom
       [not found]           ` <CAAS2fgR01=SfpAdHhFd_DFa9VNiL=e1g4FiguVRywVVSqFe9rA@mail.gmail.com>
  1 sibling, 1 reply; 22+ messages in thread
From: Peter Todd @ 2016-05-09 13:40 UTC (permalink / raw)
  To: Tom, Bitcoin Development Discussion, Tom via bitcoin-dev,
	Gregory Maxwell
  Cc: Bitcoin Dev

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512



On 9 May 2016 07:32:59 GMT-04:00, Tom via bitcoin-dev <bitcoin-dev@lists•linuxfoundation.org> wrote:
>On Monday 09 May 2016 10:43:02 Gregory Maxwell wrote:
>> Service bits are not generally a good mechanism for negating optional
>> peer-local parameters.
>
>Service bits are exactly the right solution to indicate additional p2p
>feature-support.
>
>
>> [It's a little disconcerting that you appear to be maintaining a fork
>> and are unaware of this.]
>
>ehm...

Can you please explain why you moved the above part of gmaxwell's reply to here, when previously it was right after:

>> > Wait, you didn't steal the variable length encoding from an
>existing
>> > standard and you programmed a new one?
>>
>> This is one of the two variable length encodings used for years in
>> Bitcoin Core. This is just the first time it's shown up in a BIP.

here?

Editing gmaxwells reply like that changes the tone of the message significantly.
-----BEGIN PGP SIGNATURE-----

iQE9BAEBCgAnIBxQZXRlciBUb2RkIDxwZXRlQHBldGVydG9kZC5vcmc+BQJXMJNd
AAoJEGOZARBE6K+yz4MH/0fQNM8SQdT7a1zljOSJW17ZLs6cEwVXZc/fOtvrNnOa
CkzXqylPrdT+BWBhPOwDlrzRa/2w5JAJDHRFoR8ZEidasxNDuSfhT3PwulBxmBqs
qoXhg0ujzRv9736vKENzMI4y2HbfHmqOrlLSZrlk8zqBGmlp1fMqVjFriQN66dnV
6cYFVyMVz0x/e4mXw8FigSQxkDAJ6gnfSInecQuZLT7H4g2xomIs6kQbqULHAylS
sFaK4uXy7Vr/sgBbitEQPDHGwywRoA+7EhExb2XpvL6hdyQbL1G1i6SPxGkwKg7R
MAuBPku/FraGo+qfcaA8R7eYKmyP4qZfZly317Aoo6Q=
=NtSN
-----END PGP SIGNATURE-----



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [bitcoin-dev] Compact Block Relay BIP
  2016-05-09 10:43       ` Gregory Maxwell
@ 2016-05-09 11:32         ` Tom
  2016-05-09 13:40           ` Peter Todd
       [not found]           ` <CAAS2fgR01=SfpAdHhFd_DFa9VNiL=e1g4FiguVRywVVSqFe9rA@mail.gmail.com>
  0 siblings, 2 replies; 22+ messages in thread
From: Tom @ 2016-05-09 11:32 UTC (permalink / raw)
  To: Gregory Maxwell; +Cc: Bitcoin Dev

On Monday 09 May 2016 10:43:02 Gregory Maxwell wrote:
> On Mon, May 9, 2016 at 9:35 AM, Tom Zander via bitcoin-dev
> 
> <bitcoin-dev@lists•linuxfoundation.org> wrote:
> > You misunderstand the networking effects.
> > The fact that your node is required to choose which one to set the
> > announce
> > bit on implies that it needs to predict which node will have the best data
> > in the future.
> 
> Not required. It may. 

It is required, in the reference of wanting to actually use compact block 
relay.


> Testing on actual nodes in the actual network (not a "lab") shows

Apologies, I thought that the term was wider known.  "Laboratory situations" 
is used where I am from as the opposite of real-world messy and unpredictable 
situations.

So, your measurements may be true, but are not useful to decide how well it 
behaves under less optimal situations. aka "the real world".

> This also _increases_ robustness. Right now a single peer failing at
> the wrong time will delay blocks with a long time out.

If your peers that were supposed to send you a compact block fail, then you'll 
end up in exactly that same situation again.  Only with various timeouts in 
between before you get your block making it a magnitude slower.

In networking this is solved by reacting instead of predicting. The network is 
not stable. Your protocol design assumes it to be.


> > Another problem with your solution is that nodes send a much larger amount
> > of unsolicited data to peers in the form of the thin-block compared to
> > the normal inv or header-first data.
> 
> "High bandwidth" mode 

Another place where I may have explained better.
This is not about the difference about the two modes of your design.
This is about the design as a whole. As compared to current.


> > Am I to understand that you choose the solution based on the fact that
> > service bits are too expensive to extend? (if not, please respond to my
> > previous question actually answering the suggestion)
> > 
> > That sounds like a rather bad way of doing design. Maybe you can add a
> > second service bits field of message instead and then do the compact
> > blocks correctly.
> Service bits are not generally a good mechanism for negating optional
> peer-local parameters.

Service bits are exactly the right solution to indicate additional p2p 
feature-support.


> [It's a little disconcerting that you appear to be maintaining a fork
> and are unaware of this.]

ehm...


> > Wait, you didn't steal the variable length encoding from an existing
> > standard and you programmed a new one?
> 
> This is one of the two variable length encodings used for years in
> Bitcoin Core. This is just the first time it's shown up in a BIP.
>
> > Look at UTF-8 on wikipedia, you may have "invented" the same encoding that
> > IBM published in 1992.
> 
> The similarity with UTF-8 is that both are variable length and some
> control information is in the high bits. The similarity ends there.

That's all fine and well, it doesn't at any point take away from my point that 
any specification should NOT invent something new that has for decades had a 
great specification already.

If you make a spec to be used by all nodes, on the wire, don't base it on your 
proprietary implementation. Please.


> > Just the first (highest) 8 bytes of a sha256 hash.
> > 
> > The amount of collisions will not be less if you start xoring the rest.
> > The whole reason for doing this extra work is also irrelevant as a spam
> > protection.
> 
> Then you expose it to a trivial collision attack:  To find two 64 bit
> hashes that collide I need perform only roughly 2^32 computation. Then
> I can send them to the network.

No, you still need to have done a POW.

Next to that, your scheme is 2^32 computations *and* some XORs. The XORs are 
percentage wise a rounding error on the total time. So your argument also 
destroys your own addition.

> This issue is eliminated by salting the hash. 

The issue is better eliminated by not allowing nodes to send uninvited large 
messages.

I don't think we're getting anywhere.

I'm not sold on your design and I explained why. I tried explaining in this 
email some misconceptions that may have appeared after my initial emails. I 
hope things are more clear.




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [bitcoin-dev] Compact Block Relay BIP
  2016-05-09  9:35     ` Tom Zander
@ 2016-05-09 10:43       ` Gregory Maxwell
  2016-05-09 11:32         ` Tom
  0 siblings, 1 reply; 22+ messages in thread
From: Gregory Maxwell @ 2016-05-09 10:43 UTC (permalink / raw)
  To: Tom Zander; +Cc: Bitcoin Dev

On Mon, May 9, 2016 at 9:35 AM, Tom Zander via bitcoin-dev
<bitcoin-dev@lists•linuxfoundation.org> wrote:
> You misunderstand the networking effects.
> The fact that your node is required to choose which one to set the announce
> bit on implies that it needs to predict which node will have the best data in
> the future.

Not required. It may. If it chooses fortunately, latency is reduced--
to 0.5 RTT in many cases. If not-- nothing harmful happens.

Testing on actual nodes in the actual network (not a "lab") shows that
blocks are normally requested from one of the last three peers they
were requested from 70% of the time, with no special affordances or
skipping samples when peers disconnected.

(77% for last 4, 88% for last 8)

This also _increases_ robustness. Right now a single peer failing at
the wrong time will delay blocks with a long time out. In high
bandwidth mode the redundancy means that node will be much more likely
to make progress without timeout delays-- so long at least one of the
the selected opportunistic mode peers was successful.

Because the decision is non-normative to the protocol, nodes can
decide based on better criteria if better criteria is discovered in
the future.

> Another problem with your solution is that nodes send a much larger amount of
> unsolicited data to peers in the form of the thin-block compared to the normal
> inv or header-first data.

"High bandwidth" mode uses somewhat more bandwidth than low
bandwidth... but still >>10 times less than an ordinary getdata relay
which is used ubiquitously today.

If a node is trying to minimize bandwidth usage, it can choose to not
request the "high bandwidth" mode.

The latency bound cannot be achieved without unsolicited data. The
best we can while achieving 0.5 RTT is try to arrange things so that
the information received is maximally useful and as small as
reasonably possible.

If receivers implemented joint decoding (combining multiple
comprblocks in the event of faild decoding) 4 byte IDs would be
completely reasonable, and were what I originally suggested (along
with forward error correction data, in that case).

> Am I to understand that you choose the solution based on the fact that service
> bits are too expensive to extend? (if not, please respond to my previous
> question actually answering the suggestion)
>
> That sounds like a rather bad way of doing design. Maybe you can add a second
> service bits field of message instead and then do the compact blocks correctly.

Service bits are not generally a good mechanism for negating optional
peer-local parameters.

The settings for compactblocks can change at runtime, having to
reconnect to change them would be obnoxious.

> Wait, you didn't steal the variable length encoding from an existing standard
> and you programmed a new one?

This is one of the two variable length encodings used for years in
Bitcoin Core. This is just the first time it's shown up in a BIP.

[It's a little disconcerting that you appear to be maintaining a fork
and are unaware of this.]

> Look at UTF-8 on wikipedia, you may have "invented" the same encoding that IBM
> published in 1992.

The similarity with UTF-8 is that both are variable length and some
control information is in the high bits. The similarity ends there.

UTF-8 is more complex and less efficient for this application (coding
small numbers), as it has to handle things like resynchronization
which are critical in text but irrelevant in our framed, checksummed,
reliably transported binary protocol.

> Just the first (highest) 8 bytes of a sha256 hash.
>
> The amount of collisions will not be less if you start xoring the rest.
> The whole reason for doing this extra work is also irrelevant as a spam
> protection.

Then you expose it to a trivial collision attack:  To find two 64 bit
hashes that collide I need perform only roughly 2^32 computation. Then
I can send them to the network.  You cannot reason about these systems
just by assuming that bad things happen only according to pure chance.

This issue is eliminated by salting the hash.  Moreover, with
per-source randomization of the hash, when a rare chance collision
happens it only impacts a single node at a time, so the propagation
doesn't stall network wide on an unlucky block; it just goes slower on
a tiny number of links a tiny percent of the time (instead of breaking
everywhere an even tinyer amount of the time)-- in the non-attacker,
chance event case.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [bitcoin-dev] Compact Block Relay BIP
  2016-05-08  3:24   ` Matt Corallo
@ 2016-05-09  9:35     ` Tom Zander
  2016-05-09 10:43       ` Gregory Maxwell
  0 siblings, 1 reply; 22+ messages in thread
From: Tom Zander @ 2016-05-09  9:35 UTC (permalink / raw)
  To: Matt Corallo; +Cc: Bitcoin Dev

On Sunday, May 08, 2016 03:24:22 AM Matt Corallo wrote:
> >> ===Intended Protocol Flow===
> > 
> > I'm not a fan of the solution that a CNode should keep state and talk to
> > its remote nodes differently while announcing new blocks.
> > Its too complicated and ultimately counter-productive.
> > 
> > The problem is that an individual node needs to predict network behaviour
> > in advance. With the downside that if it guesses wrong that both nodes
> > end up paying for the wrong guess.
> > This is not a good way to design a p2p layer.
> 
> Nodes don't need to predict much in advance, and the cost for predicting
> wrong is 0 if your peers receive blocks with a few hundred ms between
> them (as we should expect) and you haven't set the announce bit on more
> than a few peers (as the spec requires for this reason).

You misunderstand the networking effects.
The fact that your node is required to choose which one to set the announce 
bit on implies that it needs to predict which node will have the best data in 
the future.
It needs to predict which nodes will not start being incommunicado and it 
requires them to predict all the things that are not possible to predict in a 
network.
In networking it is even more true than in stocks; results of the past are no 
guarantee for the future.

This means you are creating a fragile system. Your system will only work in 
laboratory situations.  It will fail spectacularly when the network or the 
internet is under stress or some parts fall away.


Another problem with your solution is that nodes send a much larger amount of 
unsolicited data to peers in the form of the thin-block compared to the normal 
inv or header-first data.

Saying this is mitigated by only subscribing on this data from a small 
subsection of nodes means you position yourself in a situation that I 
displayed above. A tradeoff of fragile and fast.  With no possible way to make 
a node automatically decide on a good equilibrium.


> It seems I forgot to add a suggested peer-preforwarding-selection
> algorithm in the text, but the intended use-case is to set the bit on
> peers which recently provided you blocks faster than other peers, up to
> only one or three peers. This is both simple and should be incredibly
> effective.

Network autorepair systems have been researched for decades, no real solution 
has as of yet appeared. 
PHDs are written on the subject and you want to make this a design for Bitcoin 
based on "[it] should be incredibly effective", I think you are underestimating 
the subject matter you are dealing with.


> > I would suggest that a new block is announced to all nodes equally and
> > then
> > individual nodes can respond with a request of either a 'compact' or a
> > normal block.
> > This is much more in line with the current design as well.
> > 
> > Detection if remote nodes support compact blocks, for the purpose of
> > requesting a compact-block, can be done either via a network-bit or just a
> > protocol version. Or something else entirely, if you have better
> > suggestions.
> 
> In line with recent trends, neither service bits nor protocol versions
> are particularly well-suited for this purpose.

Am I to understand that you choose the solution based on the fact that service 
bits are too expensive to extend? (if not, please respond to my previous 
question actually answering the suggestion)

That sounds like a rather bad way of doing design. Maybe you can add a second 
service bits field of message instead and then do the compact blocks correctly.


> >> Variable-length integers: bytes are a MSB base-128 encoding of the
> >> number.
> >> The high bit in each byte signifies whether another digit follows.
> >> [snip bitwise spec]
> > 
> > I suggest just referring to UTF-8 which describes this just fine.
> > it is good practice to refer to existing specs when possible and not copy
> > the details.
> 
> Hmm? There is no UTF anywhere in this protocol. Indeed this section
> needs to be rewritten, as indicated. I'd recommend you read the code
> until I update the section with better text if you're confused.

Wait, you didn't steal the variable length encoding from an existing standard 
and you programmed a new one?
I strongly suggest you don't reinvent this kind of protocol level encodings 
but instead steal from something like UTF8. Which has been around for decades.

Please base your standard on other standards where possible.

Look at UTF-8 on wikipedia, you may have "invented" the same encoding that IBM 
published in 1992.


> >> ====Short transaction IDs====
> >> Short transaction IDs are used to represent a transaction without
> >> sending a full 256-bit hash. They are calculated by:
> >> # single-SHA256 hashing the block header with the nonce appended (in
> >> little-endian)
> >> # XORing each 8-byte chunk of the double-SHA256 transaction hash with
> >> each corresponding 8-byte chunk of the hash from the previous step
> >> # Adding each of the XORed 8-byte chunks together (in little-endian)
> >> iteratively to find the short transaction ID
> > 
> > I don't think this is needed. Just use the first 8 bytes.
> > The reason to do xor-ing doesn't hold up and extra complexity is unneeded.
> > Especially since you mention some lines down;
> > 
> >> The short transaction ID calculation is designed to take absolutely
> >> minimal processing time during block compaction to avoid introducing
> >> serious DoS vulnerabilities
> 
> I'm confused as to what, specifically, you're proposing this be changed
> to.

Just the first (highest) 8 bytes of a sha256 hash.

The amount of collisions will not be less if you start xoring the rest.
The whole reason for doing this extra work is also irrelevant as a spam 
protection. 

-- 
Tom Zander


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [bitcoin-dev] Compact Block Relay BIP
  2016-05-08  0:40 ` Johnathan Corgan
@ 2016-05-08  3:24   ` Matt Corallo
  2016-05-09  9:35     ` Tom Zander
  0 siblings, 1 reply; 22+ messages in thread
From: Matt Corallo @ 2016-05-08  3:24 UTC (permalink / raw)
  To: Tom; +Cc: Bitcoin Dev

(This response was originally off-list as moderators were still
deciding, here it is for those interested).

Hi Tom,

Thanks for reading the draft text and commenting! Replies inline.

Matt

On 05/08/16 00:40, Johnathan Corgan wrote:
> ---------- Forwarded message ----------
> From: Tom <tomz@freedommail•ch <mailto:tomz@freedommail•ch>>
> To: bitcoin-dev@lists•linuxfoundation.org
> <mailto:bitcoin-dev@lists•linuxfoundation.org>, Matt Corallo <lf-lists@mattcorallo•com <mailto:lf-lists@mattcorallo•com>>
> Cc: 
> Date: Fri, 06 May 2016 13:31:15 +0100
> Subject: Re: [bitcoin-dev] Compact Block Relay BIP
> On Monday 02 May 2016 22:13:22 Matt Corallo via bitcoin-dev wrote:
> 
> Thanks for putting in the time to make a spec!
> 
> It looks good already, but I do think some more improvements can be made.
> 
> 
>> ===Intended Protocol Flow===
> I'm not a fan of the solution that a CNode should keep state and talk to
> its remote nodes differently while announcing new blocks.
> Its too complicated and ultimately counter-productive.
> 
> The problem is that an individual node needs to predict network behaviour in
> advance. With the downside that if it guesses wrong that both nodes end up
> paying for the wrong guess.
> This is not a good way to design a p2p layer.

Nodes don't need to predict much in advance, and the cost for predicting
wrong is 0 if your peers receive blocks with a few hundred ms between
them (as we should expect) and you haven't set the announce bit on more
than a few peers (as the spec requires for this reason). As for
complexity of keeping state, think of it as a version flag in much the
same way sendheaders operates.

It seems I forgot to add a suggested peer-preforwarding-selection
algorithm in the text, but the intended use-case is to set the bit on
peers which recently provided you blocks faster than other peers, up to
only one or three peers. This is both simple and should be incredibly
effective.

[This has now been clarified in the BIP text]

> I would suggest that a new block is announced to all nodes equally and then
> individual nodes can respond with a request of either a 'compact' or a
> normal block.
> This is much more in line with the current design as well.
> 
> Detection if remote nodes support compact blocks, for the purpose of
> requesting a compact-block, can be done either via a network-bit or just a
> protocol version. Or something else entirely, if you have better
> suggestions.

In line with recent trends, neither service bits nor protocol versions
are particularly well-suited for this purpose. Protocol versions are
impossible to handle sanely across different nodes on the network, as
they cannot indicate optional features. Service bits, while somewhat
more appropriate for this purpose, are a very limited resource which is
generally better suited to indicating significant new features which
nodes might need for correct operation, and thus might wish to actively
seek out when making connections. I'm not sure anyone is suggesting that
here, and absent that recent agreement preferred message-based feature
indication instead of version-message-extension.

>> Variable-length integers: bytes are a MSB base-128 encoding of the
>> number.
>> The high bit in each byte signifies whether another digit follows.
>> [snip bitwise spec]
> 
> I suggest just referring to UTF-8 which describes this just fine.
> it is good practice to refer to existing specs when possible and not copy
> the details.

Hmm? There is no UTF anywhere in this protocol. Indeed this section
needs to be rewritten, as indicated. I'd recommend you read the code
until I update the section with better text if you're confused.

>> ====Short transaction IDs====
>> Short transaction IDs are used to represent a transaction without
>> sending a full 256-bit hash. They are calculated by:
>> # single-SHA256 hashing the block header with the nonce appended (in
>> little-endian)
>> # XORing each 8-byte chunk of the double-SHA256 transaction hash with
>> each corresponding 8-byte chunk of the hash from the previous step
>> # Adding each of the XORed 8-byte chunks together (in little-endian)
>> iteratively to find the short transaction ID
> 
> I don't think this is needed. Just use the first 8 bytes.
> The reason to do xor-ing doesn't hold up and extra complexity is unneeded.
> Especially since you mention some lines down;
> 
>> The short transaction ID calculation is designed to take absolutely
>> minimal processing time during block compaction to avoid introducing
>> serious DoS vulnerabilities

I'm confused as to what, specifically, you're proposing this be changed
to. I'm pretty sure the proposed protocol is about as simple as you can
get while retaining some reasonable collision resistance. I might,
however, decide to switch to siphash with a very low round count, given
that it's probably faster than the cache-fill-time taken by just
iterating over the mempool. Needs a bit further investigation.

> ==Acknowledgements==
> 
> I think you need to acknowledge some more people, or just remove this
> paragraph.
> 
> Cheers

Greg was the only large contributor to the document (and was a very
large contributor, as mentioned - the work is based hugely on a protocol
recommendation he wrote up several years ago) don't see why this should
mean he doesn't get credit.

[For those interested, I'm referring here to
https://en.bitcoin.it/wiki/User:Gmaxwell/block_network_coding. This
BIP/the implementation is a precursor to an implementation that looks
similar to what Greg proposes there which can be found on my udp-wip
branch, which is based on and uses the data structures involved here.]


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [bitcoin-dev] Compact Block Relay BIP
  2016-05-02 22:13 Matt Corallo
  2016-05-03  5:02 ` Gregory Maxwell
@ 2016-05-08  0:40 ` Johnathan Corgan
  2016-05-08  3:24   ` Matt Corallo
  2016-05-09 17:06 ` Pieter Wuille
  2 siblings, 1 reply; 22+ messages in thread
From: Johnathan Corgan @ 2016-05-08  0:40 UTC (permalink / raw)
  To: Matt Corallo; +Cc: Bitcoin Dev, Luke Dashjr

[-- Attachment #1: Type: text/plain, Size: 25286 bytes --]

There was some confusion over the following email which was posted to the list
which appears to have been cancelled before a decision could be reached.

Please note the email seems inflammatory in the "acknowledgement" section and
really should have been rewritten to contain specific details of the objection
and corrections expected.

To be clear posts to the mailing list are either approved, or rejected for not
meeting the posting standards. This allows the author to make a quick correction
and resubmit. All rejections are cc'd to
https://lists.ozlabs.org/pipermail/bitcoin-dev-moderation/
for transparency. Sometimes moderators get delayed - this week has been a busy
with lots of distractions one for everyone :)

I'm copying the entire message below:

---------- Forwarded message ----------
From: Tom <tomz@freedommail•ch>
To: bitcoin-dev@lists•linuxfoundation.org, Matt Corallo
<lf-lists@mattcorallo•com>
Cc:
Date: Fri, 06 May 2016 13:31:15 +0100
Subject: Re: [bitcoin-dev] Compact Block Relay BIP
On Monday 02 May 2016 22:13:22 Matt Corallo via bitcoin-dev wrote:

Thanks for putting in the time to make a spec!

It looks good already, but I do think some more improvements can be made.


> ===Intended Protocol Flow===
I'm not a fan of the solution that a CNode should keep state and talk to
its remote nodes differently while announcing new blocks.
Its too complicated and ultimately counter-productive.

The problem is that an individual node needs to predict network behaviour in
advance. With the downside that if it guesses wrong that both nodes end up
paying for the wrong guess.
This is not a good way to design a p2p layer.



I would suggest that a new block is announced to all nodes equally and then
individual nodes can respond with a request of either a 'compact' or a
normal block.
This is much more in line with the current design as well.

Detection if remote nodes support compact blocks, for the purpose of
requesting a compact-block, can be done either via a network-bit or just a
protocol version. Or something else entirely, if you have better
suggestions.



> Variable-length integers: bytes are a MSB base-128 encoding of the
> number.
> The high bit in each byte signifies whether another digit follows.
> [snip bitwise spec]

I suggest just referring to UTF-8 which describes this just fine.
it is good practice to refer to existing specs when possible and not copy
the details.

> ====Short transaction IDs====
> Short transaction IDs are used to represent a transaction without
> sending a full 256-bit hash. They are calculated by:
> # single-SHA256 hashing the block header with the nonce appended (in
> little-endian)
> # XORing each 8-byte chunk of the double-SHA256 transaction hash with
> each corresponding 8-byte chunk of the hash from the previous step
> # Adding each of the XORed 8-byte chunks together (in little-endian)
> iteratively to find the short transaction ID

I don't think this is needed. Just use the first 8 bytes.
The reason to do xor-ing doesn't hold up and extra complexity is unneeded.
Especially since you mention some lines down;

> The short transaction ID calculation is designed to take absolutely
> minimal processing time during block compaction to avoid introducing
> serious DoS vulnerabilities


==Acknowledgements==

I think you need to acknowledge some more people, or just remove this
paragraph.

Cheers


---------- Forwarded message ----------
From: bitcoin-dev-request@lists•linuxfoundation.org
To:
Cc:
Date: Fri, 06 May 2016 12:31:23 +0000
Subject: confirm 37d25406a07ab77823fba5f9b450438c410ccd75
If you reply to this message, keeping the Subject: header intact,
Mailman will discard the held message.  Do this if the message is
spam.  If you reply to this message and include an Approved: header
with the list password in it, the message will be approved for posting
to the list.  The Approved: header can also appear in the first line
of the body of the reply.


On Mon, May 2, 2016 at 3:13 PM, Matt Corallo via bitcoin-dev <
bitcoin-dev@lists•linuxfoundation.org> wrote:

> Hi all,
>
> The following is a BIP-formatted design spec for compact block relay
> designed to limit on wire bytes during block relay. You can find the
> latest version of this document at
> https://github.com/TheBlueMatt/bips/blob/master/bip-TODO.mediawiki.
>
> There are several TODO items left on the document as indicated.
> Additionally, the implementation linked at the bottom of the document
> has a few remaining TODO items as well:
>
>  * Only request compact-block-announcement from one or two peers at a
> time, as the spec requires.
>  * Request new blocks using MSG_CMPCT_BLOCK where appropriate.
>  * Fill prefilledtxn with more than just the coinbase, as noted by the
> spec, up to 10K in transactions.
>
> Luke (CC'd): Can you assign a BIP number?
>
> Thanks,
> Matt
>
> <pre>
>   BIP: TODO
>   Title: Compact block relay
>   Author: Matt Corallo <bip@bluematt•me>
>   Status: Draft
>   Type: Standards Track
>   Created: 2016-04-27
> </pre>
>
> ==Abstract==
>
> Compact blocks on the wire as a way to save bandwidth for nodes on the
> P2P network.
>
> The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
> "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
> document are to be interpreted as described in RFC 2119.
>
> ==Motivation==
>
> Historically, the Bitcoin P2P protocol has not been very bandwidth
> efficient for block relay. Every transaction in a block is included when
> relayed, even though a large number of the transactions in a given block
> are already available to nodes before the block is relayed. This causes
> moderate inbound bandwidth spikes for nodes when receiving blocks, but
> can cause very significant outbound bandwidth spikes for some nodes
> which receive a block before their peers. When such spikes occur, buffer
> bloat can make consumer-grade internet connections temporarily unusable,
> and can delay the relay of blocks to remote peers who may choose to wait
> instead of redundantly requesting the same block from other, less
> congested, peers.
>
> Thus, decreasing the bandwidth used during block relay is very useful
> for many individuals running nodes.
>
> While the goal of this work is explicitly not to reduce block transfer
> latency, it does, as a side effect reduce block transfer latencies in
> some rather significant ways. Additionally, this work forms a foundation
> for future work explicitly targeting low-latency block transfer.
>
> ==Specification==
>
> ===Intended Protocol Flow===
> TODO: Diagrams
>
> The protocol is intended to be used in two ways, depending on the peers
> and bandwidth available, as discussed [[#Implementation_Details|later]].
> The "high-bandwidth" mode, which nodes may only enable for a few of
> their peers, is enabled by setting the first boolean to 1 in a
> "sendcmpct" message. In this mode, peers send new block announcements
> with the short transaction IDs already, possibly even before fully
> validating the block. In some cases no further round-trip is needed, and
> the receiver can reconstruct the block and process it as usual
> immediately. When some transactions were not available from local
> sources (ie mempool), a getblocktxn/blocktxn roundtrip is neccessary,
> bringing the best-case latency to the same 1.5*RTT minimum time that
> nodes take today, though with significantly less bandwidth usage.
>
> The "low-bandwidth" mode is enabled by setting the first boolean to 0 in
> a "sendcmpct" message. In this mode, peers send new block announcements
> with the usual inv/headers announcements (as per BIP130, and after fully
> validating the block). The receiving peer may then request the block
> using a MSG_CMPCT_BLOCK getdata reqeuest, which will receive a response
> of the header and short transaction IDs. In some cases no further
> round-trip is needed, and the receiver can reconstruct the block and
> process it as usual, taking the same 1.5*RTT minimum time that nodes
> take today, though with significantly less bandwidth usage. When some
> transactions were not available from local sources (ie mempool), a
> getblocktxn/blocktxn roundtrip is neccessary, bringing the best-case
> latency to 2.5*RTT, again with significantly less bandwidth usage than
> today. Because TCP often exhibits worse transfer latency for larger data
> sizes (as a multiple of RTT), total latency is expected to be reduced
> even when full the 2.5*RTT transfer mechanism is used.
>
> ===New data structures===
> Several new data structures are added to the P2P network to relay
> compact blocks: PrefilledTransaction, HeaderAndShortIDs,
> BlockTransactionsRequest, and BlockTransactions. Additionally, we
> introduce a new variable-length integer encoding for use in these data
> structures.
>
> For the purposes of this section, CompactSize refers to the
> variable-length integer encoding used across the existing P2P protocol
> to encode array lengths, among other things, in 1, 3, 5 or 9 bytes.
>
> ====New VarInt====
> TODO: I just copied this out of the src...Something that is
> wiki-formatted and more descriptive should be used here isntead.
>
> Variable-length integers: bytes are a MSB base-128 encoding of the number.
> The high bit in each byte signifies whether another digit follows. To make
> sure the encoding is one-to-one, one is subtracted from all but the last
> digit.
> Thus, the byte sequence a[] with length len, where all but the last byte
> has bit 128 set, encodes the number:
>
> (a[len-1] & 0x7F) + sum(i=1..len-1, 128^i*((a[len-i-1] & 0x7F)+1))
>
> Properties:
> * Very small (0-127: 1 byte, 128-16511: 2 bytes, 16512-2113663: 3 bytes)
> * Every integer has exactly one encoding
> * Encoding does not depend on size of original integer type
> * No redundancy: every (infinite) byte sequence corresponds to a list
>   of encoded integers.
>
> 0:         [0x00]  256:        [0x81 0x00]
> 1:         [0x01]  16383:      [0xFE 0x7F]
> 127:       [0x7F]  16384:      [0xFF 0x00]
> 128:  [0x80 0x00]  16511: [0x80 0xFF 0x7F]
> 255:  [0x80 0x7F]  65535: [0x82 0xFD 0x7F]
> 2^32:           [0x8E 0xFE 0xFE 0xFF 0x00]
>
> Several uses of New VarInts below are "differentially encoded". For
> these, instead of using raw indexes, the number encoded is the
> difference between the current index and the previous index, minus one.
> For example, a first index of 0 implies a real index of 0, a second
> index of 0 thereafter refers to a real index of 1, etc.
>
> ====PrefilledTransaction====
> A PrefilledTransaction structure is used in HeaderAndShortIDs to provide
> a list of a few transactions explicitly.
>
> {|
> |Field Name||Type||Size||Encoding||Purpose
> |-
> |index||New VarInt||1-3 bytes||[[#New_VarInt|New VarInt]],
> differentially encoded since the last PrefilledTransaction in a
> list||The index into the block at which this transaction is
> |-
> |tx||Transaction||variable||As encoded in "tx" messages||The transaction
> which is in the block at index index.
> |}
>
> ====HeaderAndShortIDs====
> A HeaderAndShortIDs structure is used to relay a block header, the short
> transactions IDs used for matching already-available transactions, and a
> select few transactions which we expect a peer may be missing.
>
> {|
> |Field Name||Type||Size||Encoding||Purpose
> |-
> |header||Block header||80 bytes||First 80 bytes of the block as defined
> by the encoding used by "block" messages||The header of the block being
> provided
> |-
> |nonce||uint64_t||8 bytes||Little Endian||A nonce for use in short
> transaction ID calculations
> |-
> |shortids_length||CompactSize||1, 3, 5, or 9 bytes||As used elsewhere to
> encode array lengths||The number of short transaction IDs in shortids
> |-
> |shortids||List of uint64_ts||8*shortids_length bytes||Little
> Endian||The short transaction IDs calculated from the transactions which
> were not provided explicitly in prefilledtxn
> |-
> |prefilledtxn_length||CompactSize||1, 3, 5, or 9 bytes||As used
> elsewhere to encode array lengths||The number of prefilled transactions
> in prefilledtxn
> |-
> |prefilledtxn||List of PrefilledTransactions||variable
> size*prefilledtxn_length||As defined by PrefilledTransaction definition,
> above||Used to provide the coinbase transaction and a select few which
> we expect a peer may be missing
> |}
>
> ====BlockTransactionsRequest====
> A BlockTransactionsRequest structure is used to list transaction indexes
> in a block being requested.
>
> {|
> |Field Name||Type||Size||Encoding||Purpose
> |-
> |blockhash||Binary blob||32 bytes||The output from a double-SHA256 of
> the block header, as used elsewhere||The blockhash of the block which
> the transactions being requested are in
> |-
> |indexes_length||New VarInt||1-3 bytes||As defined in [[#New_VarInt|New
> VarInt]]||The number of transactions being requested
> |-
> |indexes||List of New VarInts||1-3 bytes*indexes_length||As defined in
> [[#New_VarInt|New VarInt]], differentially encoded||The indexes of the
> transactions being requested in the block
> |}
>
> ====BlockTransactions====
> A BlockTransactions structure is used to provide some of the
> transactions in a block, as requested.
>
> {|
> |Field Name||Type||Size||Encoding||Purpose
> |-
> |blockhash||Binary blob||32 bytes||The output from a double-SHA256 of
> the block header, as used elsewhere||The blockhash of the block which
> the transactions being provided are in
> |-
> |transactions_length||New VarInt||1-3 bytes||As defined in
> [[#New_VarInt|New VarInt]]||The number of transactions provided
> |-
> |transactions||List of Transactions||variable||As encoded in "tx"
> messages||The transactions provided
> |}
>
> ====Short transaction IDs====
> Short transaction IDs are used to represent a transaction without
> sending a full 256-bit hash. They are calculated by:
> # single-SHA256 hashing the block header with the nonce appended (in
> little-endian)
> # XORing each 8-byte chunk of the double-SHA256 transaction hash with
> each corresponding 8-byte chunk of the hash from the previous step
> # Adding each of the XORed 8-byte chunks together (in little-endian)
> iteratively to find the short transaction ID
>
> ===New messages===
> A new inv type (MSG_CMPCT_BLOCK == 4) and several new protocol messages
> are added: sendcmpct, cmpctblock, getblocktxn, and blocktxn.
>
> ====sendcmpct====
> # The sendcmpct message is defined as a message containing a 1-byte
> integer followed by a 8-byte integer where pchCommand == "sendcmpct".
> # The first integer SHALL be interpreted as a boolean (and MUST have a
> value of either 1 or 0)
> # The second integer SHALL be interpreted as a little-endian version
> number. Nodes sending a sendcmpct message MUST currently set this value
> to 1.
> # Upon receipt of a "sendcmpct" message with the first and second
> integers set to 1, the node SHOULD announce new blocks by sending a
> cmpctblock message.
> # Upon receipt of a "sendcmpct" message with the first integer set to 0,
> the node SHOULD NOT announce new blocks by sending a cmpctblock message,
> but SHOULD announce new blocks by sending invs or headers, as defined by
> BIP130.
> # Upon receipt of a "sendcmpct" message with the second integer set to
> something other than 1, nodes SHOULD treat the peer as if they had not
> received the message (as it indicates the peer will provide an
> unexpected encoding in cmpctblock, and/or other, messages)
> # Nodes SHOULD check for a protocol version of >= 70014 before sending
> sendcmpct messages.
> # Nodes MUST NOT send a request for a MSG_CMPCT_BLOCK object to a peer
> before having received a sendcmpct message from that peer.
>
> ====MSG_CMPCT_BLOCK====
> # getdata messages may now contain requests for MSG_CMPCT_BLOCK objects.
> # Upon receipt of a getdata containing a request for a MSG_CMPCT_BLOCK
> object with the hash of a block which was recently announced and after
> having sent the requesting peer a sendcmpct message, nodes MUST respond
> with a cmpctblock message containing appropriate data representing the
> block being requested.
> # MSG_CMPCT_BLOCK inv objects MUST NOT appear anywhere except for in
> getdata messages.
>
> ====cmpctblock====
> # The cmpctblock message is defined as as a message containing a
> serialized HeaderAndShortIDs message and pchCommand == "cmpctblock".
> # Upon receipt of a cmpctblock message after sending a sendcmpct
> message, nodes SHOULD calculate the short transaction ID for each
> unconfirmed transaction they have available (ie in their mempool) and
> compare each to each short transaction ID in the cmpctblock message.
> # After finding already-available transactions, nodes which do not have
> all transactions available to reconstruct the full block SHOULD request
> the missing transactions using a getblocktxn message.
> # A node MUST NOT send a cmpctblock message unless they are able to
> respond to a getblocktxn message which requests every transaction in the
> block.
> # A node MUST NOT send a cmpctblock message without having validated
> that the header properly commits to each transaction in the block, and
> properly builds on top of the existing chain with a valid proof-of-work.
> A node MAY send a cmpctblock before validating that each transaction in
> the block validly spends existing UTXO set entries.
>
> ====getblocktxn====
> # The getblocktxn message is defined as as a message containing a
> serialized BlockTransactionsRequest message and pchCommand ==
> "getblocktxn".
> # Upon receipt of a properly-formatted getblocktxnmessage, nodes which
> recently provided the sender of such a message a cmpctblock for the
> block hash identified in this message MUST respond with an appropriate
> blocktxn message. Such a blocktxn message MUST contain exactly and only
> each transaction which is present in the appropriate block at the index
> specified in the getblocktxn indexes list, in the order requested.
>
> ====blocktxn====
> # The blocktxn message is defined as as a message containing a
> serialized BlockTransactions message and pchCommand == "blocktxn".
> # Upon receipt of a properly-formatted requested blocktxn message, nodes
> SHOULD attempt to reconstruct the full block by:
> ## Taking the prefilledtxn transactions from the original cmpctblock and
> placing them in the marked positions.
> ## For each short transaction ID from the original cmpctblock, in order,
> find the corresponding transaction either from the blocktxn message or
> from other sources and place it in the first available position in the
> block.
> # Once the block has been reconstructed, it shall be processed as
> normal, keeping in mind that short transaction IDs are expected to
> occasionally collide, and that nodes MUST NOT be penalized for such
> collisions, wherever they appear.
>
> ===Implementation Notes===
> # For nodes which have sufficient inbound bandwidth, sending a sendcmpct
> message with the first integer set to 1 to up to three peers is
> RECOMMENDED. If possible, it is RECOMMENDED that those peers be selected
> based on their past performance in providing blocks quickly. This will
> allow them to receive some blocks in only 0.5*RTT between them and the
> sending peer. It will also reduce their block transfer latency in other
> cases due to the smaller amount of data transmitted. Nodes MUST NOT send
> such sendcmpct messages to all peers, as it encourages wasting outbound
> bandwidth across the network.
>
> # All nodes SHOULD send a sendcmpct message to all appropriate peers.
> This will reduce their outbound bandwidth usage by allowing their peers
> to request compact blocks instead of full blocks.
>
> # Nodes with limited inbound bandwidth SHOULD request blocks using
> MSG_CMPCT_BLOCK/getblocktxn requests, when possible. While this
> increases worst-case message round-trips, it is expected to reduce
> overall transfer latency as TCP is more likely to exhibit poor
> throughput on low-bandwidth nodes.
>
> # Nodes sending cmpctblock messages SHOULD make an attempt to not place
> too many transactions into prefilledtxn (ie should limit prefilledtxn to
> only around 10KB of transactions). When in doubt, nodes SHOULD only
> include the coinbase transaction in prefilledtxn.
>
> # Nodes MAY pick one nonce per block they wish to send, and only build a
> cmpctblock message once for all peers which they wish to send a given
> block to. Nodes SHOULD NOT use the same nonce across multiple different
> blocks.
>
> # Nodes MAY impose additional requirements on when they announce new
> blocks by sending cmpctblock messages. For example, nodes with limited
> outbound bandwidth MAY choose to announce new blocks using inv/header
> messages (as per BIP130) to conserve outbound bandwidth.
>
> # Note that the MSG_CMPCT_BLOCK section does not require that nodes
> respond to MSG_CMPCT_BLOCK getdata requests for blocks which they did
> not recently announce. This allows nodes to calculate cmpctblock
> messages at announce-time instead of at request-time. Thus, nodes MUST
> NOT request blocks using MSG_CMPCT_BLOCK getdatas unless it is in
> response to an inv/headers block announcement (as per BIP130), and MUST
> NOT request blocks using MSG_CMPCT_BLOCK getdatas in response to headers
> messages which were, themselves, responses to getheaders requests.
>
> # While the current version sends transactions with the same encodings
> as is used in tx messages and elsewhere in the protocol, the version
> field in sendcmpct is intended to allow this to change in the future.
> For this reason, it is recommended that the code used to decode
> PrefilledTransaction and BlockTransactions messages be prepared to take
> a different transaction encoding, if and when the version field in
> sendcmpct changes in a future BIP.
>
> ==Justification==
>
> ====Protocol design====
> There have been many proposals to save wire bytes when relaying blocks.
> Many of them have a two-fold goal of reducing block relay time and thus
> rely on the use of significant processing power in order to avoid
> introducing additional worst-case RTTs. Because this work is not focused
> primarily on reducing block relay time, its design is much simpler (ie
> does not rely on set reconciliation protocols). Still, in testing at the
> time of writing, nodes are able to relay blocks without the extra
> getblocktxn/blocktxn RTT around 90% of the time. With a smart
> compact-block-announcement policy, it is thus expected that this work
> might allow blocks to be relayed between nodes in 0.5*RTT instead of
> 1.5*RTT at least 75% of the time.
>
> ====Use of New VarInts====
> Bitcoin has long had a variable-length integer implementation (referred
> to as CompactSize in this document), making a second a strange protocol
> quirk. However, in this protocol most of our variable-length integers
> are between 0 and 2000. For both encodings, small numbers (<100) are
> encoded as 1-byte. For numbers over 250, the CompactSize encoding begins
> to use 3 bytes instead of 1, whereas the New VarInt encoding uses 2.
> Because the primary motivation for this work is to save bytes during
> block relay, the extra byte of saving per transaction-difference is
> considered worth the extra design complexity.
>
> ====Short transaction ID calculation====
> The short transaction ID calculation is designed to take absolutely
> minimal processing time during block compaction to avoid introducing
> serious DoS vulnerabilities such as those introduced by the
> bloom-filtering in BIP 37. As such, it is possible for a node to
> construct one compact-block representation of a block for relay to
> multiple peers. Additionally, only one cryptographic hash (2 SHA rounds)
> is used when calculating the short transaction IDs for an entire block.
>
> The XOR-and-add method is used for calculating short transaction IDs
> primarily because it is fast and is reasonably able to limit the ability
> of an attacker who does not know the block hash or nonce to cause
> collisions in short transaction IDs. If an attacker were able to cause
> such collisions, filling mempools (and, thus, blocks) with them would
> cause poor network propagation of new (or non-attacker, in the case of a
> miner) blocks.
>
> The 8-byte nonce in short transaction ID calculation is used to
> introduce additional entropy on a per-node level. While the use of 8
> bytes is sufficient for an attacker to maliciously cause short
> transaction ID collisions in their own block relay, this would have less
> of an effect than if such an attacker were relaying headers/invs and not
> responding to requests for the full block.
>
> ==Backward compatibility==
>
> Older clients remain fully compatible and interoperable after this change.
>
> ==Implementation==
>
> https://github.com/TheBlueMatt/bitcoin/tree/udp
>
> ==Acknowledgements==
>
> Thanks to Gregory Maxwell for the initial suggestion as well as a lot of
> back-and-forth design and significant testing.
>
> ==Copyright==
>
> This document is placed in the public domain.
>
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists•linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>



-- 
Johnathan Corgan
Corgan Labs - SDR Training and Development Services
http://corganlabs.com

[-- Attachment #2: Type: text/html, Size: 28659 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [bitcoin-dev] Compact Block Relay BIP
  2016-05-03  5:02 ` Gregory Maxwell
@ 2016-05-06  3:09   ` Matt Corallo
  0 siblings, 0 replies; 22+ messages in thread
From: Matt Corallo @ 2016-05-06  3:09 UTC (permalink / raw)
  To: Gregory Maxwell; +Cc: Bitcoin Dev

Thanks Greg for the testing!

Note that to those who are reviewing the doc, a few minor tweaks to
wording and clarification have been made to the git version, so please
review there.

On 05/03/16 05:02, Gregory Maxwell wrote:
> On Mon, May 2, 2016 at 10:13 PM, Matt Corallo via bitcoin-dev
> <bitcoin-dev@lists•linuxfoundation.org> wrote:
>> Hi all,
>>
>> The following is a BIP-formatted design spec for compact block relay
>> designed to limit on wire bytes during block relay. You can find the
>> latest version of this document at
>> https://github.com/TheBlueMatt/bips/blob/master/bip-TODO.mediawiki.
> 
> Thanks Matt!
> 
> I've been testing this for a couple weeks (in various forms).  I've
> been getting over 96% reduction in block-bytes sent. I don't have a
> good metric for it, but bandwidth spikes are greatly reduced. The
> largest blocktxn message I've seen on a node that has been up for at
> least a day is 475736 bytes. 94% of the blocks less than 100kb must be
> sent in total.
> 
> In the opportunistic mode my measurements are showing 73% of blocks
> transferred with 0.5 RTT even without prediction, 87% if up to 4
> additional transactions are predicted, and 91% for 30 transactions (my
> rough estimate for the 10k maximum prediction suggested in the BIP.
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [bitcoin-dev] Compact Block Relay BIP
  2016-05-02 22:13 Matt Corallo
@ 2016-05-03  5:02 ` Gregory Maxwell
  2016-05-06  3:09   ` Matt Corallo
  2016-05-08  0:40 ` Johnathan Corgan
  2016-05-09 17:06 ` Pieter Wuille
  2 siblings, 1 reply; 22+ messages in thread
From: Gregory Maxwell @ 2016-05-03  5:02 UTC (permalink / raw)
  To: Matt Corallo; +Cc: Bitcoin Dev, Luke Dashjr

On Mon, May 2, 2016 at 10:13 PM, Matt Corallo via bitcoin-dev
<bitcoin-dev@lists•linuxfoundation.org> wrote:
> Hi all,
>
> The following is a BIP-formatted design spec for compact block relay
> designed to limit on wire bytes during block relay. You can find the
> latest version of this document at
> https://github.com/TheBlueMatt/bips/blob/master/bip-TODO.mediawiki.

Thanks Matt!

I've been testing this for a couple weeks (in various forms).  I've
been getting over 96% reduction in block-bytes sent. I don't have a
good metric for it, but bandwidth spikes are greatly reduced. The
largest blocktxn message I've seen on a node that has been up for at
least a day is 475736 bytes. 94% of the blocks less than 100kb must be
sent in total.

In the opportunistic mode my measurements are showing 73% of blocks
transferred with 0.5 RTT even without prediction, 87% if up to 4
additional transactions are predicted, and 91% for 30 transactions (my
rough estimate for the 10k maximum prediction suggested in the BIP.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [bitcoin-dev] Compact Block Relay BIP
@ 2016-05-02 22:13 Matt Corallo
  2016-05-03  5:02 ` Gregory Maxwell
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Matt Corallo @ 2016-05-02 22:13 UTC (permalink / raw)
  To: Bitcoin Dev; +Cc: Luke Dashjr

Hi all,

The following is a BIP-formatted design spec for compact block relay
designed to limit on wire bytes during block relay. You can find the
latest version of this document at
https://github.com/TheBlueMatt/bips/blob/master/bip-TODO.mediawiki.

There are several TODO items left on the document as indicated.
Additionally, the implementation linked at the bottom of the document
has a few remaining TODO items as well:

 * Only request compact-block-announcement from one or two peers at a
time, as the spec requires.
 * Request new blocks using MSG_CMPCT_BLOCK where appropriate.
 * Fill prefilledtxn with more than just the coinbase, as noted by the
spec, up to 10K in transactions.

Luke (CC'd): Can you assign a BIP number?

Thanks,
Matt

<pre>
  BIP: TODO
  Title: Compact block relay
  Author: Matt Corallo <bip@bluematt•me>
  Status: Draft
  Type: Standards Track
  Created: 2016-04-27
</pre>

==Abstract==

Compact blocks on the wire as a way to save bandwidth for nodes on the
P2P network.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119.

==Motivation==

Historically, the Bitcoin P2P protocol has not been very bandwidth
efficient for block relay. Every transaction in a block is included when
relayed, even though a large number of the transactions in a given block
are already available to nodes before the block is relayed. This causes
moderate inbound bandwidth spikes for nodes when receiving blocks, but
can cause very significant outbound bandwidth spikes for some nodes
which receive a block before their peers. When such spikes occur, buffer
bloat can make consumer-grade internet connections temporarily unusable,
and can delay the relay of blocks to remote peers who may choose to wait
instead of redundantly requesting the same block from other, less
congested, peers.

Thus, decreasing the bandwidth used during block relay is very useful
for many individuals running nodes.

While the goal of this work is explicitly not to reduce block transfer
latency, it does, as a side effect reduce block transfer latencies in
some rather significant ways. Additionally, this work forms a foundation
for future work explicitly targeting low-latency block transfer.

==Specification==

===Intended Protocol Flow===
TODO: Diagrams

The protocol is intended to be used in two ways, depending on the peers
and bandwidth available, as discussed [[#Implementation_Details|later]].
The "high-bandwidth" mode, which nodes may only enable for a few of
their peers, is enabled by setting the first boolean to 1 in a
"sendcmpct" message. In this mode, peers send new block announcements
with the short transaction IDs already, possibly even before fully
validating the block. In some cases no further round-trip is needed, and
the receiver can reconstruct the block and process it as usual
immediately. When some transactions were not available from local
sources (ie mempool), a getblocktxn/blocktxn roundtrip is neccessary,
bringing the best-case latency to the same 1.5*RTT minimum time that
nodes take today, though with significantly less bandwidth usage.

The "low-bandwidth" mode is enabled by setting the first boolean to 0 in
a "sendcmpct" message. In this mode, peers send new block announcements
with the usual inv/headers announcements (as per BIP130, and after fully
validating the block). The receiving peer may then request the block
using a MSG_CMPCT_BLOCK getdata reqeuest, which will receive a response
of the header and short transaction IDs. In some cases no further
round-trip is needed, and the receiver can reconstruct the block and
process it as usual, taking the same 1.5*RTT minimum time that nodes
take today, though with significantly less bandwidth usage. When some
transactions were not available from local sources (ie mempool), a
getblocktxn/blocktxn roundtrip is neccessary, bringing the best-case
latency to 2.5*RTT, again with significantly less bandwidth usage than
today. Because TCP often exhibits worse transfer latency for larger data
sizes (as a multiple of RTT), total latency is expected to be reduced
even when full the 2.5*RTT transfer mechanism is used.

===New data structures===
Several new data structures are added to the P2P network to relay
compact blocks: PrefilledTransaction, HeaderAndShortIDs,
BlockTransactionsRequest, and BlockTransactions. Additionally, we
introduce a new variable-length integer encoding for use in these data
structures.

For the purposes of this section, CompactSize refers to the
variable-length integer encoding used across the existing P2P protocol
to encode array lengths, among other things, in 1, 3, 5 or 9 bytes.

====New VarInt====
TODO: I just copied this out of the src...Something that is
wiki-formatted and more descriptive should be used here isntead.

Variable-length integers: bytes are a MSB base-128 encoding of the number.
The high bit in each byte signifies whether another digit follows. To make
sure the encoding is one-to-one, one is subtracted from all but the last
digit.
Thus, the byte sequence a[] with length len, where all but the last byte
has bit 128 set, encodes the number:

(a[len-1] & 0x7F) + sum(i=1..len-1, 128^i*((a[len-i-1] & 0x7F)+1))

Properties:
* Very small (0-127: 1 byte, 128-16511: 2 bytes, 16512-2113663: 3 bytes)
* Every integer has exactly one encoding
* Encoding does not depend on size of original integer type
* No redundancy: every (infinite) byte sequence corresponds to a list
  of encoded integers.

0:         [0x00]  256:        [0x81 0x00]
1:         [0x01]  16383:      [0xFE 0x7F]
127:       [0x7F]  16384:      [0xFF 0x00]
128:  [0x80 0x00]  16511: [0x80 0xFF 0x7F]
255:  [0x80 0x7F]  65535: [0x82 0xFD 0x7F]
2^32:           [0x8E 0xFE 0xFE 0xFF 0x00]

Several uses of New VarInts below are "differentially encoded". For
these, instead of using raw indexes, the number encoded is the
difference between the current index and the previous index, minus one.
For example, a first index of 0 implies a real index of 0, a second
index of 0 thereafter refers to a real index of 1, etc.

====PrefilledTransaction====
A PrefilledTransaction structure is used in HeaderAndShortIDs to provide
a list of a few transactions explicitly.

{|
|Field Name||Type||Size||Encoding||Purpose
|-
|index||New VarInt||1-3 bytes||[[#New_VarInt|New VarInt]],
differentially encoded since the last PrefilledTransaction in a
list||The index into the block at which this transaction is
|-
|tx||Transaction||variable||As encoded in "tx" messages||The transaction
which is in the block at index index.
|}

====HeaderAndShortIDs====
A HeaderAndShortIDs structure is used to relay a block header, the short
transactions IDs used for matching already-available transactions, and a
select few transactions which we expect a peer may be missing.

{|
|Field Name||Type||Size||Encoding||Purpose
|-
|header||Block header||80 bytes||First 80 bytes of the block as defined
by the encoding used by "block" messages||The header of the block being
provided
|-
|nonce||uint64_t||8 bytes||Little Endian||A nonce for use in short
transaction ID calculations
|-
|shortids_length||CompactSize||1, 3, 5, or 9 bytes||As used elsewhere to
encode array lengths||The number of short transaction IDs in shortids
|-
|shortids||List of uint64_ts||8*shortids_length bytes||Little
Endian||The short transaction IDs calculated from the transactions which
were not provided explicitly in prefilledtxn
|-
|prefilledtxn_length||CompactSize||1, 3, 5, or 9 bytes||As used
elsewhere to encode array lengths||The number of prefilled transactions
in prefilledtxn
|-
|prefilledtxn||List of PrefilledTransactions||variable
size*prefilledtxn_length||As defined by PrefilledTransaction definition,
above||Used to provide the coinbase transaction and a select few which
we expect a peer may be missing
|}

====BlockTransactionsRequest====
A BlockTransactionsRequest structure is used to list transaction indexes
in a block being requested.

{|
|Field Name||Type||Size||Encoding||Purpose
|-
|blockhash||Binary blob||32 bytes||The output from a double-SHA256 of
the block header, as used elsewhere||The blockhash of the block which
the transactions being requested are in
|-
|indexes_length||New VarInt||1-3 bytes||As defined in [[#New_VarInt|New
VarInt]]||The number of transactions being requested
|-
|indexes||List of New VarInts||1-3 bytes*indexes_length||As defined in
[[#New_VarInt|New VarInt]], differentially encoded||The indexes of the
transactions being requested in the block
|}

====BlockTransactions====
A BlockTransactions structure is used to provide some of the
transactions in a block, as requested.

{|
|Field Name||Type||Size||Encoding||Purpose
|-
|blockhash||Binary blob||32 bytes||The output from a double-SHA256 of
the block header, as used elsewhere||The blockhash of the block which
the transactions being provided are in
|-
|transactions_length||New VarInt||1-3 bytes||As defined in
[[#New_VarInt|New VarInt]]||The number of transactions provided
|-
|transactions||List of Transactions||variable||As encoded in "tx"
messages||The transactions provided
|}

====Short transaction IDs====
Short transaction IDs are used to represent a transaction without
sending a full 256-bit hash. They are calculated by:
# single-SHA256 hashing the block header with the nonce appended (in
little-endian)
# XORing each 8-byte chunk of the double-SHA256 transaction hash with
each corresponding 8-byte chunk of the hash from the previous step
# Adding each of the XORed 8-byte chunks together (in little-endian)
iteratively to find the short transaction ID

===New messages===
A new inv type (MSG_CMPCT_BLOCK == 4) and several new protocol messages
are added: sendcmpct, cmpctblock, getblocktxn, and blocktxn.

====sendcmpct====
# The sendcmpct message is defined as a message containing a 1-byte
integer followed by a 8-byte integer where pchCommand == "sendcmpct".
# The first integer SHALL be interpreted as a boolean (and MUST have a
value of either 1 or 0)
# The second integer SHALL be interpreted as a little-endian version
number. Nodes sending a sendcmpct message MUST currently set this value
to 1.
# Upon receipt of a "sendcmpct" message with the first and second
integers set to 1, the node SHOULD announce new blocks by sending a
cmpctblock message.
# Upon receipt of a "sendcmpct" message with the first integer set to 0,
the node SHOULD NOT announce new blocks by sending a cmpctblock message,
but SHOULD announce new blocks by sending invs or headers, as defined by
BIP130.
# Upon receipt of a "sendcmpct" message with the second integer set to
something other than 1, nodes SHOULD treat the peer as if they had not
received the message (as it indicates the peer will provide an
unexpected encoding in cmpctblock, and/or other, messages)
# Nodes SHOULD check for a protocol version of >= 70014 before sending
sendcmpct messages.
# Nodes MUST NOT send a request for a MSG_CMPCT_BLOCK object to a peer
before having received a sendcmpct message from that peer.

====MSG_CMPCT_BLOCK====
# getdata messages may now contain requests for MSG_CMPCT_BLOCK objects.
# Upon receipt of a getdata containing a request for a MSG_CMPCT_BLOCK
object with the hash of a block which was recently announced and after
having sent the requesting peer a sendcmpct message, nodes MUST respond
with a cmpctblock message containing appropriate data representing the
block being requested.
# MSG_CMPCT_BLOCK inv objects MUST NOT appear anywhere except for in
getdata messages.

====cmpctblock====
# The cmpctblock message is defined as as a message containing a
serialized HeaderAndShortIDs message and pchCommand == "cmpctblock".
# Upon receipt of a cmpctblock message after sending a sendcmpct
message, nodes SHOULD calculate the short transaction ID for each
unconfirmed transaction they have available (ie in their mempool) and
compare each to each short transaction ID in the cmpctblock message.
# After finding already-available transactions, nodes which do not have
all transactions available to reconstruct the full block SHOULD request
the missing transactions using a getblocktxn message.
# A node MUST NOT send a cmpctblock message unless they are able to
respond to a getblocktxn message which requests every transaction in the
block.
# A node MUST NOT send a cmpctblock message without having validated
that the header properly commits to each transaction in the block, and
properly builds on top of the existing chain with a valid proof-of-work.
A node MAY send a cmpctblock before validating that each transaction in
the block validly spends existing UTXO set entries.

====getblocktxn====
# The getblocktxn message is defined as as a message containing a
serialized BlockTransactionsRequest message and pchCommand == "getblocktxn".
# Upon receipt of a properly-formatted getblocktxnmessage, nodes which
recently provided the sender of such a message a cmpctblock for the
block hash identified in this message MUST respond with an appropriate
blocktxn message. Such a blocktxn message MUST contain exactly and only
each transaction which is present in the appropriate block at the index
specified in the getblocktxn indexes list, in the order requested.

====blocktxn====
# The blocktxn message is defined as as a message containing a
serialized BlockTransactions message and pchCommand == "blocktxn".
# Upon receipt of a properly-formatted requested blocktxn message, nodes
SHOULD attempt to reconstruct the full block by:
## Taking the prefilledtxn transactions from the original cmpctblock and
placing them in the marked positions.
## For each short transaction ID from the original cmpctblock, in order,
find the corresponding transaction either from the blocktxn message or
from other sources and place it in the first available position in the
block.
# Once the block has been reconstructed, it shall be processed as
normal, keeping in mind that short transaction IDs are expected to
occasionally collide, and that nodes MUST NOT be penalized for such
collisions, wherever they appear.

===Implementation Notes===
# For nodes which have sufficient inbound bandwidth, sending a sendcmpct
message with the first integer set to 1 to up to three peers is
RECOMMENDED. If possible, it is RECOMMENDED that those peers be selected
based on their past performance in providing blocks quickly. This will
allow them to receive some blocks in only 0.5*RTT between them and the
sending peer. It will also reduce their block transfer latency in other
cases due to the smaller amount of data transmitted. Nodes MUST NOT send
such sendcmpct messages to all peers, as it encourages wasting outbound
bandwidth across the network.

# All nodes SHOULD send a sendcmpct message to all appropriate peers.
This will reduce their outbound bandwidth usage by allowing their peers
to request compact blocks instead of full blocks.

# Nodes with limited inbound bandwidth SHOULD request blocks using
MSG_CMPCT_BLOCK/getblocktxn requests, when possible. While this
increases worst-case message round-trips, it is expected to reduce
overall transfer latency as TCP is more likely to exhibit poor
throughput on low-bandwidth nodes.

# Nodes sending cmpctblock messages SHOULD make an attempt to not place
too many transactions into prefilledtxn (ie should limit prefilledtxn to
only around 10KB of transactions). When in doubt, nodes SHOULD only
include the coinbase transaction in prefilledtxn.

# Nodes MAY pick one nonce per block they wish to send, and only build a
cmpctblock message once for all peers which they wish to send a given
block to. Nodes SHOULD NOT use the same nonce across multiple different
blocks.

# Nodes MAY impose additional requirements on when they announce new
blocks by sending cmpctblock messages. For example, nodes with limited
outbound bandwidth MAY choose to announce new blocks using inv/header
messages (as per BIP130) to conserve outbound bandwidth.

# Note that the MSG_CMPCT_BLOCK section does not require that nodes
respond to MSG_CMPCT_BLOCK getdata requests for blocks which they did
not recently announce. This allows nodes to calculate cmpctblock
messages at announce-time instead of at request-time. Thus, nodes MUST
NOT request blocks using MSG_CMPCT_BLOCK getdatas unless it is in
response to an inv/headers block announcement (as per BIP130), and MUST
NOT request blocks using MSG_CMPCT_BLOCK getdatas in response to headers
messages which were, themselves, responses to getheaders requests.

# While the current version sends transactions with the same encodings
as is used in tx messages and elsewhere in the protocol, the version
field in sendcmpct is intended to allow this to change in the future.
For this reason, it is recommended that the code used to decode
PrefilledTransaction and BlockTransactions messages be prepared to take
a different transaction encoding, if and when the version field in
sendcmpct changes in a future BIP.

==Justification==

====Protocol design====
There have been many proposals to save wire bytes when relaying blocks.
Many of them have a two-fold goal of reducing block relay time and thus
rely on the use of significant processing power in order to avoid
introducing additional worst-case RTTs. Because this work is not focused
primarily on reducing block relay time, its design is much simpler (ie
does not rely on set reconciliation protocols). Still, in testing at the
time of writing, nodes are able to relay blocks without the extra
getblocktxn/blocktxn RTT around 90% of the time. With a smart
compact-block-announcement policy, it is thus expected that this work
might allow blocks to be relayed between nodes in 0.5*RTT instead of
1.5*RTT at least 75% of the time.

====Use of New VarInts====
Bitcoin has long had a variable-length integer implementation (referred
to as CompactSize in this document), making a second a strange protocol
quirk. However, in this protocol most of our variable-length integers
are between 0 and 2000. For both encodings, small numbers (<100) are
encoded as 1-byte. For numbers over 250, the CompactSize encoding begins
to use 3 bytes instead of 1, whereas the New VarInt encoding uses 2.
Because the primary motivation for this work is to save bytes during
block relay, the extra byte of saving per transaction-difference is
considered worth the extra design complexity.

====Short transaction ID calculation====
The short transaction ID calculation is designed to take absolutely
minimal processing time during block compaction to avoid introducing
serious DoS vulnerabilities such as those introduced by the
bloom-filtering in BIP 37. As such, it is possible for a node to
construct one compact-block representation of a block for relay to
multiple peers. Additionally, only one cryptographic hash (2 SHA rounds)
is used when calculating the short transaction IDs for an entire block.

The XOR-and-add method is used for calculating short transaction IDs
primarily because it is fast and is reasonably able to limit the ability
of an attacker who does not know the block hash or nonce to cause
collisions in short transaction IDs. If an attacker were able to cause
such collisions, filling mempools (and, thus, blocks) with them would
cause poor network propagation of new (or non-attacker, in the case of a
miner) blocks.

The 8-byte nonce in short transaction ID calculation is used to
introduce additional entropy on a per-node level. While the use of 8
bytes is sufficient for an attacker to maliciously cause short
transaction ID collisions in their own block relay, this would have less
of an effect than if such an attacker were relaying headers/invs and not
responding to requests for the full block.

==Backward compatibility==

Older clients remain fully compatible and interoperable after this change.

==Implementation==

https://github.com/TheBlueMatt/bitcoin/tree/udp

==Acknowledgements==

Thanks to Gregory Maxwell for the initial suggestion as well as a lot of
back-and-forth design and significant testing.

==Copyright==

This document is placed in the public domain.



^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2016-05-18  1:49 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-08 10:25 [bitcoin-dev] Compact Block Relay BIP Nicolas Dorier
  -- strict thread matches above, loose matches on Subject: below --
2016-05-02 22:13 Matt Corallo
2016-05-03  5:02 ` Gregory Maxwell
2016-05-06  3:09   ` Matt Corallo
2016-05-08  0:40 ` Johnathan Corgan
2016-05-08  3:24   ` Matt Corallo
2016-05-09  9:35     ` Tom Zander
2016-05-09 10:43       ` Gregory Maxwell
2016-05-09 11:32         ` Tom
2016-05-09 13:40           ` Peter Todd
2016-05-09 13:57             ` Tom
2016-05-09 14:04               ` Bryan Bishop
     [not found]           ` <CAAS2fgR01=SfpAdHhFd_DFa9VNiL=e1g4FiguVRywVVSqFe9rA@mail.gmail.com>
2016-05-09 12:12             ` [bitcoin-dev] Fwd: " Gregory Maxwell
2016-05-09 23:37               ` [bitcoin-dev] " Peter R
2016-05-10  1:42                 ` Peter R
2016-05-10  2:12                 ` Gregory Maxwell
2016-05-09 17:06 ` Pieter Wuille
2016-05-09 18:34   ` Peter R
2016-05-10  5:28   ` Rusty Russell
2016-05-10 10:07     ` Gregory Maxwell
2016-05-10 21:23       ` Rusty Russell
2016-05-11  1:12         ` Matt Corallo
2016-05-18  1:49   ` Matt Corallo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox