public inbox for bitcoindev@googlegroups.com
 help / color / mirror / Atom feed
* [bitcoin-dev] INV overhead and batched INVs to reduce full node traffic
@ 2016-02-26  5:35 Jonathan Toomim
  2016-02-26  5:56 ` Gregory Maxwell
  0 siblings, 1 reply; 4+ messages in thread
From: Jonathan Toomim @ 2016-02-26  5:35 UTC (permalink / raw)
  To: Bitcoin Dev


[-- Attachment #1.1: Type: text/plain, Size: 1782 bytes --]

The INV scheme used by Bitcoin is not very efficient at all. Once you take into account Bitcoin, TCP (including ACKs), IP, and ethernet overheads, each INV takes 193 bytes, according to wireshark. That's 127 bytes for the INV message and 66 bytes for the ACK. All of this is for 32 bytes of payload, for an "efficiency" of 16.5% (i.e. 83.5% overhead). For a 400 byte transaction with 20 peers, this can result in 3860 bytes sent in INVs for only 400 bytes of actual data.

An improvement that I've been thinking about implementing (after Blocktorrent) is an option for batched INVs. Including the hashes for two txes per IP packet instead of one would increase the INV size to 229 bytes for 64 bytes of payload -- that is, you add 36 bytes to the packet for every 32 bytes of actual payload. This is a marginal efficiency of 88.8% for each hash after the first. This is *much* better.

Waiting a short period of time to accumulate several hashes together and send them as a batched INV could easily reduce the traffic of running bitcoin nodes by a factor of 2, and possibly even more than that. However, if too many people used it, such a technique would slow down the propagation of transactions across the bitcoin network slightly, which might make some people unhappy. The ill effects could likely be mitigated by choosing a different batch size for each peer based on each peer's preferences. Each node could choose one or two peers to which they send INVs in batches of one or two, four more peers in which they send batches of two to four, and the rest in batches of four to eight, for example.

(This is a continuation of a conversation started on https://bitcointalk.org/index.php?topic=1377345 <https://bitcointalk.org/index.php?topic=1377345.new#new>.)

Jonathan

[-- Attachment #1.2: Type: text/html, Size: 2235 bytes --]

[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 496 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [bitcoin-dev] INV overhead and batched INVs to reduce full node traffic
  2016-02-26  5:35 [bitcoin-dev] INV overhead and batched INVs to reduce full node traffic Jonathan Toomim
@ 2016-02-26  5:56 ` Gregory Maxwell
  2016-02-26  7:50   ` Jonathan Toomim
  0 siblings, 1 reply; 4+ messages in thread
From: Gregory Maxwell @ 2016-02-26  5:56 UTC (permalink / raw)
  To: Jonathan Toomim; +Cc: Bitcoin Dev

On Fri, Feb 26, 2016 at 5:35 AM, Jonathan Toomim via bitcoin-dev
<bitcoin-dev@lists•linuxfoundation.org> wrote:
> An improvement that I've been thinking about implementing (after
> Blocktorrent) is an option for batched INVs. Including the hashes for two
> txes per IP packet instead of one would increase the INV size to 229 bytes
> for 64 bytes of payload -- that is, you add 36 bytes to the packet for every
> 32 bytes of actual payload. This is a marginal efficiency of 88.8% for each
> hash after the first. This is *much* better.
>
> Waiting a short period of time to accumulate several hashes together and
> send them as a batched INV could easily reduce the traffic of running
> bitcoin nodes by a factor of 2,

Copying my response to you from BitcoinTalk
(https://bitcointalk.org/index.php?topic=1377345.msg14013294#msg14013294):

Uh. Bitcoin has done this since the very early days. The batching was
temporarily somewhat hobbled between 0.10 and 0.12 (especially when
you had any abusive frequently pinging peers attached), but is now
fully functional again and it now manages to batch many transactions
per INV pretty effectively. Turn on net message debugging and you'll
see the many INVs that are much larger than the minimum. The average
batching size (ignoring the trickle cut-through) is about 5 seconds
long-- and usually gets about 10 transactions per INV. My measurements
were with these fixes in effect; I expect the blocksonly savings would
have been higher otherwise.

2016-02-26 05:47:08 sending: inv (1261 bytes) peer=33900
2016-02-26 05:47:08 sending: inv (109 bytes) peer=32460
2016-02-26 05:47:08 sending: inv (37 bytes) peer=34501
2016-02-26 05:47:08 sending: inv (217 bytes) peer=33897
2016-02-26 05:47:08 sending: inv (145 bytes) peer=41863
2016-02-26 05:47:08 sending: inv (37 bytes) peer=35725
2016-02-26 05:47:08 sending: inv (73 bytes) peer=20567
2016-02-26 05:47:08 sending: inv (289 bytes) peer=44703
2016-02-26 05:47:08 sending: inv (73 bytes) peer=13408
2016-02-26 05:47:09 sending: inv (649 bytes) peer=41279
2016-02-26 05:47:09 sending: inv (145 bytes) peer=42612
2016-02-26 05:47:09 sending: inv (325 bytes) peer=34525
2016-02-26 05:47:09 sending: inv (181 bytes) peer=41174
2016-02-26 05:47:09 sending: inv (469 bytes) peer=41460
2016-02-26 05:47:10 sending: inv (973 bytes) peer=133
2016-02-26 05:47:10 sending: inv (361 bytes) peer=20541

Twiddling here doesn't change the asymptotic efficiency though; which
is what my post is about.

[I'm also somewhat surprised that you were unaware of this; one of the
patches "classic" was talking about patching out was the one restoring
the batching... due to a transaction deanonymization service (or
troll) claiming it interfered with their operations.]


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [bitcoin-dev] INV overhead and batched INVs to reduce full node traffic
  2016-02-26  5:56 ` Gregory Maxwell
@ 2016-02-26  7:50   ` Jonathan Toomim
  2016-02-27  9:08     ` Jonathan Toomim
  0 siblings, 1 reply; 4+ messages in thread
From: Jonathan Toomim @ 2016-02-26  7:50 UTC (permalink / raw)
  To: Gregory Maxwell; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 600 bytes --]


> On Feb 25, 2016, at 9:56 PM, Gregory Maxwell <greg@xiph•org> wrote:
> The batching was
> temporarily somewhat hobbled between 0.10 and 0.12 (especially when
> you had any abusive frequently pinging peers attached), but is now
> fully functional again and it now manages to batch many transactions
> per INV pretty effectively. T

Thanks for the response. I've been mostly using and working on 0.11-series versions, which very rarely send out INV batches. In my examination, about 85% of the packets had a single hash in it. Nice to know this is one of the other improvements in 0.12.



[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 496 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [bitcoin-dev] INV overhead and batched INVs to reduce full node traffic
  2016-02-26  7:50   ` Jonathan Toomim
@ 2016-02-27  9:08     ` Jonathan Toomim
  0 siblings, 0 replies; 4+ messages in thread
From: Jonathan Toomim @ 2016-02-27  9:08 UTC (permalink / raw)
  To: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 1741 bytes --]

Well, here's another idea: we could shorten the tx hashes to about 4 to 6 bytes instead of 32.

Let's say we have a 1 GB mempool with 2M transactions in it. A 4 byte shorthash would have a 0.046% chance of resulting in a collision with another transaction in our mempool, assuming a random distribution of hash values.

Of course, an attacker might construct transactions specifically for collisions. To protect against that, we set up a different salt value for each connection, and for the INV message, we use a 4 to 6 byte salted hash instead of the full thing. In case a peer does have a collision with one salt value, there are still 7 other peers with different salt values. The probability that they all fail is about 2.2e-27 with a 4-byte hash for a single peer. If we have 500,000 full nodes and 1M transactions per 10 minutes, the chance is 1.1e-15 that even one peer misses even one transaction.

This strategy would come with about 12 bytes of additional memory overhead per peer per tx, or maybe a little more. In exchange for that 12 bytes per peer*tx, we would save up to 28 bytes per peer*tx of network bandwidth. In typical conditions (e.g. 100-ish MB mempool, 16 peers, 2 MB blocks, 500 B serialized tx size), that could result in 1.792 MB net traffic saved per block (7.7 GB/month) at the expense of 12 MB of RAM. Overall, this technique might have the ability to reduce INV traffic by 5-8x in the asymptotic case, or maybe 2-3x for a realistic case.

I know short hashes like this have been proposed many times before for block propagation (e.g. by Gavin in his O(1) scaling gist, or in XTB). Has anyone else thought of using them like this in INV messages? Can anyone think of any major problems with the idea?

[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 496 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-02-27  9:08 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-26  5:35 [bitcoin-dev] INV overhead and batched INVs to reduce full node traffic Jonathan Toomim
2016-02-26  5:56 ` Gregory Maxwell
2016-02-26  7:50   ` Jonathan Toomim
2016-02-27  9:08     ` Jonathan Toomim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox