public inbox for bitcoindev@googlegroups.com
 help / color / mirror / Atom feed
From: Andreas Petersson <andreas@petersson•at>
To: bitcoin-development@lists•sourceforge.net
Subject: Re: [Bitcoin-development] New P2P commands for diagnostics, SPV clients
Date: Mon, 23 Jul 2012 09:54:48 +0200	[thread overview]
Message-ID: <500D0348.4010604@petersson.at> (raw)
In-Reply-To: <CANEZrP0ws6bGk5qmDCmRPMbwyNX+3W5BRNzZPn_Av-nqFPAqOw@mail.gmail.com>

Some concerns regarding Bloom Filters. I talked with Stefan Thomas on 
the Hackathon Berlin about this.
I tried to follow the discussion closely but i have not taken a look at 
the code yet (is there already an implementation?) so please correct me 
if i got something wrong.

The way the Bloom filters are planned now this requires a complicated 
setup. basically the client will ask the server to replay the whole 
blockchain, but filtered.
This is not optimal for the following reasons:
This will require the server to do a full scan of his data and only 
filter out non-matching entries.

Really lightweight clients (like Bitcoincard), clients with shared 
private keys (electrum-style), or brainwallets - will ask the following 
question quite often to "supernodes": Given my public keys/addresses, 
what is the list of unspent outputs. i think it would make sense to 
include such a command, instead or in addition to the filterload/filterinit.

And perhaps more severe: as far as i understand classic bloom filters, 
the server has no method of indexing his data for the expected requests. 
There is simply no data structure (or maybe it has to be invented) which 
allows the use of an index for queries by bloom filters of *varying 
length* and a generic hashing method.

im not sure what a really efficient data structure for these kinds of 
query is. but i think it should be possible to find a good compromise 
between query flexibility, server load, client privacy.

one possible scheme, looks like this:

the client takes his list of addesses he is interested in. he hashes all 
of them to a fixed-length bit array (bloom filter) of length 64KiB (for 
example), and combines them with | to add more 1's with each address.
the server maintains a binary tree data structure of unspent outputs 
arranged by the Bloom filter bits.
to build the tree, the server would need to calculate the 64KiB bits for 
each address and arrange them in a binary tree. that way he can easily 
traverse the tree for a given bloom query.
if a client whishes to query more broadly he can calculate the bloom 
filter to 64KiB and after that fill up the last 50% of the Bits with 1. 
or 95%. the trailing 1 bits even don't need to be transmitted to the 
server when a client is querying. of course, if the client is more 
privacy-concerned he could also fill up random bits with 1, which would 
not change much actually.

the value of 64KiB is just out of thin air.
according to my experimentation using BloomFilter from Guava - 
currently, also 8KiB would be sufficient to hava a 3% false positive 
rate for the 40000 active addresses we have right now.

someone more familiar with hashing should please give his opinion if 
cutting a bloom filter in half has any bad consequences.

Andreas



  reply	other threads:[~2012-07-23  7:54 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-13 20:46 Jeff Garzik
2012-06-14 11:52 ` Mike Hearn
2012-06-15 11:52   ` Mike Hearn
2012-06-15 13:19   ` Matt Corallo
2012-06-15 13:23     ` Mike Hearn
2012-06-15 14:39       ` Matt Corallo
2012-06-16  8:27         ` Mike Hearn
2012-06-19 19:09           ` Matt Corallo
2012-07-21 11:45             ` Mike Hearn
2012-07-23  7:54               ` Andreas Petersson [this message]
2012-07-23 16:40                 ` Matt Corallo
2012-07-24  8:16                 ` Mike Hearn
2012-06-15 13:26   ` Jeff Garzik
2012-06-15 13:43     ` Mike Hearn
2012-06-15 14:56       ` Matt Corallo
2012-06-15 15:32       ` Jeff Garzik
2012-06-15 16:20         ` Matt Corallo
2012-06-15 18:42       ` Amir Taaki
2012-06-16  8:25         ` Mike Hearn
2012-06-15 15:43   ` Simon Barber
2012-06-15 16:40     ` Jeff Garzik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=500D0348.4010604@petersson.at \
    --to=andreas@petersson$(echo .)at \
    --cc=bitcoin-development@lists$(echo .)sourceforge.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox