public inbox for bitcoindev@googlegroups.com
 help / color / mirror / Atom feed
From: Mike Hearn <mike@plan99•net>
To: Peter Todd <pete@petertodd•org>
Cc: Bitcoin Dev <bitcoin-development@lists•sourceforge.net>
Subject: Re: [Bitcoin-development] Bloom bait
Date: Tue, 10 Jun 2014 18:38:23 +0800	[thread overview]
Message-ID: <CANEZrP33n+UBE+0Zb_mh=+qjJA+C+Nny9quC5B0HpuLC1XygMA@mail.gmail.com> (raw)
In-Reply-To: <20140608213534.GA4191@savin>

[-- Attachment #1: Type: text/plain, Size: 3107 bytes --]

>
> As I explained in the email you're replying to and didn't quote, bloom
> filters has O(n) cost per query, so sending different bloom filters to
> different peers for privacy reasons costs the network significant disk
> IO resources. If I were to actually implement it it'd look like a DoS
> attack on the network.
>

DoS attack? Nice try.

Performance is subtle, disk iops especially so. I suspect you'd find - if
you implemented it - that for the kinds of loads Bitcoin is processing both
today and tomorrow prefix filtering either doesn't save any disk seeks or
actively makes it worse.

Consider a client that is syncing the last 24 hours of chain. bitcoind
pre-allocates space for blocks in large chunks, so most blocks are laid out
sequentially on disk. Almost all the cost of a disk read is rotational
latency. Once the head is in place data can be read in very fast and modern
kernels will attempt to adaptively read ahead in order to exploit this,
especially if a program seems to be working through a disk file
sequentially. The work of Bloom filtering parts of the chain for this
client boils down to a handful of disk seeks at best and the rest of the
work is all CPU/memory bound as the block is parsed into objects and tested
against the filter. A smarter filtering implementation than ours could do
SAX-style parsing of the block and avoid the overhead of turning it all
into objects.

Now consider a prefix filtering implementation. You need to calculate a
sorted list of all the data elements and tx hashes in the block, that maps
to the location in the block where the tx data can be found. These
per-block indexes take up extra disk space and, realistically, would likely
be implemented using LevelDB as that's a tool which is designed for
creating and using these kinds of tables, so then you're both loading the
block data itself (blocks are sized about right currently to always fit in
the default kernel readahead window) AND also seeking through the indexes,
and building them too. A smart implementation might try and pack the index
next to each block so it's possible to load both at once with a single
seek, but that would probably be more work, as it'd force building of the
index to be synchronous with saving the block to disk thus slowing down
block relay. In contrast a LevelDB based index would do the bulk of the
index-building work on a separate core.

At *some* block size and client load the additional data storage and
increased pressure on the page cache would probably make it worthwhile. But
I find it unlikely to be true at current traffic levels, or double or
triple today's levels. So I'd rather we spend our very limited collective
time on finding ways to increase usage rather than worrying about resources
which are not presently scarce.

(as an aside, some of the above analysis would be invalidated if most nodes
end up running on SSDs, but I doubt most are. It'd be neat to export
storage tech via some kind of stats message - LevelDB is designed for HDDs
not SSDs so at some point a new storage subsystem might make sense if the
network switched over).

[-- Attachment #2: Type: text/html, Size: 3555 bytes --]

  reply	other threads:[~2014-06-10 10:38 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-06  8:19 [Bitcoin-development] NODE_BLOOM service bit Peter Todd
2014-06-06  8:48 ` Adam Back
2014-06-06  9:03   ` Gregory Maxwell
2014-06-06  9:11     ` Peter Todd
2014-06-06  9:04   ` Peter Todd
2014-06-06 10:45     ` Adam Back
2014-06-06 16:46       ` [Bitcoin-development] Bloom bait Peter Todd
2014-06-06 16:58         ` Gregory Maxwell
2014-06-06 17:05           ` Peter Todd
2014-06-06 17:10             ` Gregory Maxwell
2014-06-06 17:45               ` Peter Todd
2014-06-07 11:22                 ` Mike Hearn
2014-06-07 19:44                   ` Alan Reiner
2014-06-08 21:45                     ` Peter Todd
2014-06-10 10:41                       ` Mike Hearn
2014-06-08 21:35                   ` Peter Todd
2014-06-10 10:38                     ` Mike Hearn [this message]
2014-06-10 13:02                       ` Jeff Garzik
2014-06-10 17:08                       ` Peter Todd
2014-06-11  8:57                         ` Mike Hearn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CANEZrP33n+UBE+0Zb_mh=+qjJA+C+Nny9quC5B0HpuLC1XygMA@mail.gmail.com' \
    --to=mike@plan99$(echo .)net \
    --cc=bitcoin-development@lists$(echo .)sourceforge.net \
    --cc=pete@petertodd$(echo .)org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox