public inbox for bitcoindev@googlegroups.com
 help / color / mirror / Atom feed
From: Peter Todd <pete@petertodd•org>
To: Mike Hearn <mike@plan99•net>, Jeff Garzik <jgarzik@bitpay•com>
Cc: Bitcoin Dev <bitcoin-development@lists•sourceforge.net>
Subject: Re: [Bitcoin-development] Bloom bait
Date: Tue, 10 Jun 2014 13:08:46 -0400	[thread overview]
Message-ID: <20140610170846.GB21293@savin> (raw)
In-Reply-To: <CAJHLa0OJffXU_LkmVhCJ80mphc5zEPuDZzuSKvpLkrNUWU7Y1w@mail.gmail.com> <CANEZrP33n+UBE+0Zb_mh=+qjJA+C+Nny9quC5B0HpuLC1XygMA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 4389 bytes --]

On Tue, Jun 10, 2014 at 06:38:23PM +0800, Mike Hearn wrote:
> >
> > As I explained in the email you're replying to and didn't quote, bloom
> > filters has O(n) cost per query, so sending different bloom filters to
> > different peers for privacy reasons costs the network significant disk
> > IO resources. If I were to actually implement it it'd look like a DoS
> > attack on the network.
> >
> 
> DoS attack? Nice try.

Suppose I wrote an single address lookup tool for Android that connected
to multiple peers and used bloom filters to find the history of a
specific address. Of course, I don't want to use too much bandwidth
being on mobile, so I'll use as specific a bloom filter as possible. I
might even connect to multiple peers to speed up the lookup.

Is this any different from my bloom filter IO attack code? Nope. Hence,
splitting up bloom filter requests for better privacy will certainly
look like a DoS attack and will certainly greatly increase the load on
the network.


> Now consider a prefix filtering implementation. You need to calculate a
> sorted list of all the data elements and tx hashes in the block, that maps
> to the location in the block where the tx data can be found. These
> per-block indexes take up extra disk space and, realistically, would likely
> be implemented using LevelDB as that's a tool which is designed for
> creating and using these kinds of tables, so then you're both loading the
> block data itself (blocks are sized about right currently to always fit in
> the default kernel readahead window) AND also seeking through the indexes,
> and building them too. A smart implementation might try and pack the index
> next to each block so it's possible to load both at once with a single
> seek, but that would probably be more work, as it'd force building of the
> index to be synchronous with saving the block to disk thus slowing down
> block relay. In contrast a LevelDB based index would do the bulk of the
> index-building work on a separate core.

That's exactly the kinds of optimizations obelisk is implementing to
make its prefix lookup database fast. Also those optimizations are
situation dependent, for instance "packing the index next to each block"
is irrelevant if you put archival blockchain data on a slow HD, and
indexes on a fast SSD, something some obelisk servers do.

More to the point, your showing quite clearly there isn't just one
optimal way to do it. Applying a bloom filter, or a prefix filter, or
some as yet unknown filter, to blockchain data is a service and that
service has different tradeoffs compared to just serving up archival
block history. There is zero reason not to make that service something
you advertise with NODE_BLOOM - after all, you already have the code in
bitcoinj to do the exact same thing by checking the advertised protocol
version.


On Tue, Jun 10, 2014 at 09:02:00AM -0400, Jeff Garzik wrote:
> Most of this description of disk activity is true, but it omits one
> key point:  Total cached data (working set).  It is a binary, first
> order question:  are you hitting pagecache, or the disk?  When nodes
> act as archival data sources, the pagecache pressure is immense.  When
> nodes just primarily serve recent blocks, that data is being served
> out of pagecache. As I directly observed running public nodes, the
> disks were running constantly, impacting all clients, even clients
> downloading only recent blocks.
> 
> Luckily, headers are served out of RAM, so that part of the sync is always fast.
> 
> NODE_BLOOM -- and block download in general -- will tend to be slower
> than it could be, due to the working set almost always being larger
> than available pagecache.  Fix that problem, NODE_BLOOM will always
> operate out of pagecache, and disk activity will not be an issue.
> 
> Once you start hitting the disk, you've already lost.

Yup. I discussed this with Matt Corallo at the financial crypto
conference a few months back and he made the same point. Unfortunately
we'll need an upgrade to let nodes advertise ranges of blocks to begin
to fix that issue, and even then it still shows quite clearly how it's
not optimal if we force everyone to share blockchain data in the same
way.

-- 
'peter'[:-1]@petertodd.org
000000000000000023c7fc084ed84b891cc2fa90e4a34708d6b2370d3ec1c85d

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 685 bytes --]

  parent reply	other threads:[~2014-06-10 17:07 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-06  8:19 [Bitcoin-development] NODE_BLOOM service bit Peter Todd
2014-06-06  8:48 ` Adam Back
2014-06-06  9:03   ` Gregory Maxwell
2014-06-06  9:11     ` Peter Todd
2014-06-06  9:04   ` Peter Todd
2014-06-06 10:45     ` Adam Back
2014-06-06 16:46       ` [Bitcoin-development] Bloom bait Peter Todd
2014-06-06 16:58         ` Gregory Maxwell
2014-06-06 17:05           ` Peter Todd
2014-06-06 17:10             ` Gregory Maxwell
2014-06-06 17:45               ` Peter Todd
2014-06-07 11:22                 ` Mike Hearn
2014-06-07 19:44                   ` Alan Reiner
2014-06-08 21:45                     ` Peter Todd
2014-06-10 10:41                       ` Mike Hearn
2014-06-08 21:35                   ` Peter Todd
2014-06-10 10:38                     ` Mike Hearn
2014-06-10 13:02                       ` Jeff Garzik
2014-06-10 17:08                       ` Peter Todd [this message]
2014-06-11  8:57                         ` Mike Hearn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140610170846.GB21293@savin \
    --to=pete@petertodd$(echo .)org \
    --cc=bitcoin-development@lists$(echo .)sourceforge.net \
    --cc=jgarzik@bitpay$(echo .)com \
    --cc=mike@plan99$(echo .)net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox