[Bitcoin-development] New P2P commands for diagnostics, SPV clients

public inbox for bitcoindev@googlegroups.com
 help / color / mirror / Atom feed

* [Bitcoin-development] New P2P commands for diagnostics, SPV clients
@ 2012-06-13 20:46 Jeff Garzik
  2012-06-14 11:52 ` Mike Hearn
  0 siblings, 1 reply; 21+ messages in thread
From: Jeff Garzik @ 2012-06-13 20:46 UTC (permalink / raw)
  To: Bitcoin Development

An IRC discussion today covered additional needs of lightweight
clients.  Here is a draft of proposed new P2P commands, and associated
behavior changes.  This is not meant to be a formal, detailed
specification but rather rough picture of the preferred direction.

     -----

filterinit(false positive rate, number of elements): initialize
per-connection bloom filter to the given parameters.  if the
parameters create a too-large table, the operation fails.  returns a
'filterparams' message, with bloom filter construction details.

filterload(data): input a serialized bloom filter table metadata and data.

filterclear(): remove any filtering associated with current connection.

filteradd(hash data): add a single hash to the bloom filter.  WARNING:
although easier to use, has privacy implications. filterload shrouds
the hash list; filteradd does not.  it is also less efficient to send
a stream of filteradd's to the remote node.

mempool():  list TX's in remote node's memory pool.

     -----

'filterload' and 'filteradd' enable special behavior changes for
'mempool' and existing P2P commands, whereby only transactions
matching the bloom filter will be announced to the connection, and
only matching transactions will be sent inside serialized blocks.

A lightweight ("SPV") client would issue 'filterload', sync up with
blocks, then use 'mempool' to sync up to current TX's.  The
'filterload' command also ensures that the client is only sent 'inv'
messages etc. for the TX's it is probably interested in.

The 'mempool' command is thought to be useful as a diagnostic, even if
a bloom filter is not applied to its output.

A bloom filter match would need to notice activity on existing coins
(via CTxIn->prevout) and activity on a bitcoin address (via CTxOut).

-- 
Jeff Garzik
exMULTI, Inc.
jgarzik@exmulti•com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bitcoin-development] New P2P commands for diagnostics, SPV clients
  2012-06-13 20:46 [Bitcoin-development] New P2P commands for diagnostics, SPV clients Jeff Garzik
@ 2012-06-14 11:52 ` Mike Hearn
  2012-06-15 11:52   ` Mike Hearn
                     ` (3 more replies)
  0 siblings, 4 replies; 21+ messages in thread
From: Mike Hearn @ 2012-06-14 11:52 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Bitcoin Development

> filterinit(false positive rate, number of elements): initialize
> filterload(data): input a serialized bloom filter table metadata and data.

Why not combine these two?

> 'filterload' and 'filteradd' enable special behavior changes for
> 'mempool' and existing P2P commands, whereby only transactions
> matching the bloom filter will be announced to the connection, and
> only matching transactions will be sent inside serialized blocks.

Need to specify the format of how these arrive. It means that when a
new block is found instead of inv<->getdata<->block we'd see something
like  inv<->getdata<->merkleblock where a "merkleblock" structure is a
header + list of transactions + list of merkle branches linking them
to the root. I think CMerkleTx already knows how to serialize this,
but it redundantly includes the block hash which would not be
necessary for a merkleblock message.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bitcoin-development] New P2P commands for diagnostics, SPV clients
  2012-06-14 11:52 ` Mike Hearn
@ 2012-06-15 11:52   ` Mike Hearn
  2012-06-15 13:19   ` Matt Corallo
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 21+ messages in thread
From: Mike Hearn @ 2012-06-15 11:52 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Bitcoin Development

> Need to specify the format of how these arrive. It means that when a
> new block is found instead of inv<->getdata<->block we'd see something
> like  inv<->getdata<->merkleblock where a "merkleblock" structure is a
> header + list of transactions + list of merkle branches linking them
> to the root.

Thinking about it some more and re-reading the Scalability wiki page,
I remembered that a nice bandwidth optimization to the protocol is to
distribute blocks as header+list of tx hashes. If a node has already
seen that tx before (eg, it's in the mempool) there is no need to send
it again.

With the new command to download the contents of the mempool on
startup, this means that blocks could potentially propagate across the
network faster as download time is taken out of the equation, and
indeed, with the signature cache the hard work of verifying is already
done. So this could also help reduce orphan blocks and spurious chain
splits.

Are you planning on implementing any of this Jeff? I think we have the
opportunity to kill a few birds with one or two stones.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bitcoin-development] New P2P commands for diagnostics, SPV clients
  2012-06-14 11:52 ` Mike Hearn
  2012-06-15 11:52   ` Mike Hearn
@ 2012-06-15 13:19   ` Matt Corallo
  2012-06-15 13:23     ` Mike Hearn
  2012-06-15 13:26   ` Jeff Garzik
  2012-06-15 15:43   ` Simon Barber
  3 siblings, 1 reply; 21+ messages in thread
From: Matt Corallo @ 2012-06-15 13:19 UTC (permalink / raw)
  To: Mike Hearn; +Cc: Bitcoin Development

On Thu, 2012-06-14 at 13:52 +0200, Mike Hearn wrote:
> > filterinit(false positive rate, number of elements): initialize
> > filterload(data): input a serialized bloom filter table metadata and data.
> 
> Why not combine these two?
I believe its because it allows the node which will have to use the
bloom filter to scan transactions to chose how much effort it wants to
put into each transaction on behalf of the SPV client.  Though its
generally a small amount of CPU time/memory, if we end up with a drastic
split between SPV nodes and only a few large network nodes, those nodes
may wish to limit the CPU/memory usage each node is allowed to use,
which may be important if you are serving 1000 SPV peers.  It offers a
sort of negotiation between SPV client and full node instead of letting
the client specify it outright.
> 
> > 'filterload' and 'filteradd' enable special behavior changes for
> > 'mempool' and existing P2P commands, whereby only transactions
> > matching the bloom filter will be announced to the connection, and
> > only matching transactions will be sent inside serialized blocks.
> 
> Need to specify the format of how these arrive. It means that when a
> new block is found instead of inv<->getdata<->block we'd see something
> like  inv<->getdata<->merkleblock where a "merkleblock" structure is a
> header + list of transactions + list of merkle branches linking them
> to the root. I think CMerkleTx already knows how to serialize this,
> but it redundantly includes the block hash which would not be
> necessary for a merkleblock message.
A series of CMerkleTx's might also end up redundantly encoding branches
of the merkle tree, so, yes as a part of the BIP/implementation, I would
say we probably want a CFilteredBlock or similar




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bitcoin-development] New P2P commands for diagnostics, SPV clients
  2012-06-15 13:19   ` Matt Corallo
@ 2012-06-15 13:23     ` Mike Hearn
  2012-06-15 14:39       ` Matt Corallo
  0 siblings, 1 reply; 21+ messages in thread
From: Mike Hearn @ 2012-06-15 13:23 UTC (permalink / raw)
  To: Matt Corallo; +Cc: Bitcoin Development

> > Why not combine these two?
>
> I believe its because it allows the node which will have to use the
> bloom filter to scan transactions to chose how much effort it wants to
> put into each transaction on behalf of the SPV client.

If that's the case then the negotiation protocol needs to be specified
too. It seems heavy though. If a node is getting overloaded it could
just disconnect intensive peers or refuse new connections.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bitcoin-development] New P2P commands for diagnostics, SPV clients
  2012-06-14 11:52 ` Mike Hearn
  2012-06-15 11:52   ` Mike Hearn
  2012-06-15 13:19   ` Matt Corallo
@ 2012-06-15 13:26   ` Jeff Garzik
  2012-06-15 13:43     ` Mike Hearn
  2012-06-15 15:43   ` Simon Barber
  3 siblings, 1 reply; 21+ messages in thread
From: Jeff Garzik @ 2012-06-15 13:26 UTC (permalink / raw)
  To: Mike Hearn; +Cc: Bitcoin Development

On Thu, Jun 14, 2012 at 7:52 AM, Mike Hearn <mike@plan99•net> wrote:
>> filterinit(false positive rate, number of elements): initialize
>> filterload(data): input a serialized bloom filter table metadata and data.
>
> Why not combine these two?

This is a fair point that sipa raised.

Consensus concluded that 'filterload' includes all necessary metadata
required to initialize a bloom filter.  That implies 'filterinit'
would only be needed for 'filteradd'.  If we don't think 'filteradd'
has a compelling use case, filterinit + filteradd can be dropped.

>> 'filterload' and 'filteradd' enable special behavior changes for
>> 'mempool' and existing P2P commands, whereby only transactions
>> matching the bloom filter will be announced to the connection, and
>> only matching transactions will be sent inside serialized blocks.
>
> Need to specify the format of how these arrive. It means that when a
> new block is found instead of inv<->getdata<->block we'd see something
> like  inv<->getdata<->merkleblock where a "merkleblock" structure is a
> header + list of transactions + list of merkle branches linking them
> to the root. I think CMerkleTx already knows how to serialize this,
> but it redundantly includes the block hash which would not be
> necessary for a merkleblock message.

Yes, the format is something that must be hashed out (no pun
intended).  Need input from potential users about what information
they might need.

-- 
Jeff Garzik
exMULTI, Inc.
jgarzik@exmulti•com



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bitcoin-development] New P2P commands for diagnostics, SPV clients
  2012-06-15 13:26   ` Jeff Garzik
@ 2012-06-15 13:43     ` Mike Hearn
  2012-06-15 14:56       ` Matt Corallo
                         ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Mike Hearn @ 2012-06-15 13:43 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Bitcoin Development

> Yes, the format is something that must be hashed out (no pun
> intended).  Need input from potential users about what information
> they might need.

Matts point that a branch-per-transaction may duplicate data is well
made, that said, I suspect a format that tries to fix this would be
much more complicated.

How about see this project as a three part change?

First step - add the mempool command and make nodes sync up their
mempools on startup.

Second step - if protocol version >= X, the "block" message consists
of a header + num transactions + vector<hash>  instead of the full
transactions themselves.

On receiving such a block, we go look to see which transactions we're
missing from the mempool and request them with getdata. Each time we
receive a tx message we check to see if it was one we were missing
from a block. Once all transactions in the block message are in
memory, we go ahead and assemble the block, then verify as per normal.
This should speed up block propagation. Miners have an incentive to
upgrade because it should reduce wasted work.

Third step - new message, getmerkletx takes a vector<hash> and returns
a merkletx message: "merkle branch missing the root + transaction data
itself" for each requested transaction. The filtering commands are
added, so the block message now only lists transaction hashes that
match the filter which can then be requested with getmerkletx.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bitcoin-development] New P2P commands for diagnostics, SPV clients
  2012-06-15 13:23     ` Mike Hearn
@ 2012-06-15 14:39       ` Matt Corallo
  2012-06-16  8:27         ` Mike Hearn
  0 siblings, 1 reply; 21+ messages in thread
From: Matt Corallo @ 2012-06-15 14:39 UTC (permalink / raw)
  To: Mike Hearn; +Cc: Bitcoin Development

On Fri, 2012-06-15 at 15:23 +0200, Mike Hearn wrote:
> > > Why not combine these two?
> >
> > I believe its because it allows the node which will have to use the
> > bloom filter to scan transactions to chose how much effort it wants to
> > put into each transaction on behalf of the SPV client.
> 
> If that's the case then the negotiation protocol needs to be specified
> too. It seems heavy though. If a node is getting overloaded it could
> just disconnect intensive peers or refuse new connections.
IMHO it already is.  A node requests a filter using filterinit by
specifying the false positive rate it wants and a guessed number of
items.  The node which will have to hold that filter then responds with
the closest filter to what the SPV node requested that it is willing to
provide.  If the SPV node responds with a filterload command, it has
accepted the offer, otherwise it will simply disconnect and find a
better full node.  
I'd much rather have an overloaded node respond with 50% fp rate filters
as an option if there aren't many full nodes available than simply
disconnect SPV clients.
At least thats my thinking, but you may be right that it is too heavy
for too little gain.




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bitcoin-development] New P2P commands for diagnostics, SPV clients
  2012-06-15 13:43     ` Mike Hearn
@ 2012-06-15 14:56       ` Matt Corallo
  2012-06-15 15:32       ` Jeff Garzik
  2012-06-15 18:42       ` Amir Taaki
  2 siblings, 0 replies; 21+ messages in thread
From: Matt Corallo @ 2012-06-15 14:56 UTC (permalink / raw)
  To: Mike Hearn; +Cc: Bitcoin Development

On Fri, 2012-06-15 at 15:43 +0200, Mike Hearn wrote:
> > Yes, the format is something that must be hashed out (no pun
> > intended).  Need input from potential users about what information
> > they might need.
> 
> Matts point that a branch-per-transaction may duplicate data is well
> made, that said, I suspect a format that tries to fix this would be
> much more complicated.
> 
> How about see this project as a three part change?
> 
> First step - add the mempool command and make nodes sync up their
> mempools on startup.
ACK
> 
> Second step - if protocol version >= X, the "block" message consists
> of a header + num transactions + vector<hash>  instead of the full
> transactions themselves.
If vector<hash> is sorted in the order of the merkle tree, you dont need
to forward the merkle tree to non-filtered nodes, further saving some
small amount of bandwidth.  For filtered nodes, you would still need to
forward merkle branches anyway.
> 
> On receiving such a block, we go look to see which transactions we're
> missing from the mempool and request them with getdata. Each time we
> receive a tx message we check to see if it was one we were missing
> from a block. Once all transactions in the block message are in
> memory, we go ahead and assemble the block, then verify as per normal.
> This should speed up block propagation. Miners have an incentive to
> upgrade because it should reduce wasted work.
ACK
> 
> Third step - new message, getmerkletx takes a vector<hash> and returns
> a merkletx message: "merkle branch missing the root + transaction data
> itself" for each requested transaction. The filtering commands are
> added, so the block message now only lists transaction hashes that
> match the filter which can then be requested with getmerkletx.
I really dont think it would be /that/ difficult to make it getmerkletxs
vector<hashes>. And then respond with a partial merkle tree to those
transactions.

Matt




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bitcoin-development] New P2P commands for diagnostics, SPV clients
  2012-06-15 13:43     ` Mike Hearn
  2012-06-15 14:56       ` Matt Corallo
@ 2012-06-15 15:32       ` Jeff Garzik
  2012-06-15 16:20         ` Matt Corallo
  2012-06-15 18:42       ` Amir Taaki
  2 siblings, 1 reply; 21+ messages in thread
From: Jeff Garzik @ 2012-06-15 15:32 UTC (permalink / raw)
  To: Mike Hearn; +Cc: Bitcoin Development

On Fri, Jun 15, 2012 at 9:43 AM, Mike Hearn <mike@plan99•net> wrote:
> How about see this project as a three part change?
>
> First step - add the mempool command and make nodes sync up their
> mempools on startup.

Here's the "mempool" implementation:
https://github.com/bitcoin/bitcoin/pull/1470

SPV nodes would definitely want to sync up their mempool upon startup.
 As for full nodes... I like the organic growth and random nature of
the mempools.  On the fence, WRT full node mempool sync at startup.

-- 
Jeff Garzik
exMULTI, Inc.
jgarzik@exmulti•com



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bitcoin-development] New P2P commands for diagnostics, SPV clients
  2012-06-14 11:52 ` Mike Hearn
                     ` (2 preceding siblings ...)
  2012-06-15 13:26   ` Jeff Garzik
@ 2012-06-15 15:43   ` Simon Barber
  2012-06-15 16:40     ` Jeff Garzik
  3 siblings, 1 reply; 21+ messages in thread
From: Simon Barber @ 2012-06-15 15:43 UTC (permalink / raw)
  To: Mike Hearn; +Cc: Bitcoin Development

separate filterinit / filterload - so you can do a new filterload later 
on if your list changes, without the privacy implications of filteradd.

Simon


On Thu 14 Jun 2012 04:52:29 AM PDT, Mike Hearn wrote:
>> filterinit(false positive rate, number of elements): initialize
>> filterload(data): input a serialized bloom filter table metadata and data.
>
> Why not combine these two?
>
>> 'filterload' and 'filteradd' enable special behavior changes for
>> 'mempool' and existing P2P commands, whereby only transactions
>> matching the bloom filter will be announced to the connection, and
>> only matching transactions will be sent inside serialized blocks.
>
> Need to specify the format of how these arrive. It means that when a
> new block is found instead of inv<->getdata<->block we'd see something
> like  inv<->getdata<->merkleblock where a "merkleblock" structure is a
> header + list of transactions + list of merkle branches linking them
> to the root. I think CMerkleTx already knows how to serialize this,
> but it redundantly includes the block hash which would not be
> necessary for a merkleblock message.
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Bitcoin-development mailing list
> Bitcoin-development@lists•sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bitcoin-development



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bitcoin-development] New P2P commands for diagnostics, SPV clients
  2012-06-15 15:32       ` Jeff Garzik
@ 2012-06-15 16:20         ` Matt Corallo
  0 siblings, 0 replies; 21+ messages in thread
From: Matt Corallo @ 2012-06-15 16:20 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Bitcoin Development

On Fri, 2012-06-15 at 11:32 -0400, Jeff Garzik wrote:
>  As for full nodes... I like the organic growth and random nature of
> the mempools.  On the fence, WRT full node mempool sync at startup.
> 
I dont particularly care either way, but I have a feeling miners will
really want that so that they can get fee-paying transactions right
away.

Matt




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bitcoin-development] New P2P commands for diagnostics, SPV clients
  2012-06-15 15:43   ` Simon Barber
@ 2012-06-15 16:40     ` Jeff Garzik
  0 siblings, 0 replies; 21+ messages in thread
From: Jeff Garzik @ 2012-06-15 16:40 UTC (permalink / raw)
  To: Simon Barber; +Cc: Bitcoin Development

On Fri, Jun 15, 2012 at 11:43 AM, Simon Barber <simon@superduper•net> wrote:
> separate filterinit / filterload - so you can do a new filterload later on
> if your list changes, without the privacy implications of filteradd.

filterload loads a whole new bloom filter from scratch, in one atomic
operation.  Params set, table sized, data input into table.  A
separate filterinit does not make sense for filterload.

-- 
Jeff Garzik
exMULTI, Inc.
jgarzik@exmulti•com



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bitcoin-development] New P2P commands for diagnostics, SPV clients
  2012-06-15 13:43     ` Mike Hearn
  2012-06-15 14:56       ` Matt Corallo
  2012-06-15 15:32       ` Jeff Garzik
@ 2012-06-15 18:42       ` Amir Taaki
  2012-06-16  8:25         ` Mike Hearn
  2 siblings, 1 reply; 21+ messages in thread
From: Amir Taaki @ 2012-06-15 18:42 UTC (permalink / raw)
  To: bitcoin-development

Why though? The bottleneck is not network traffic but disk space usage/blockchain validation time.

----- Original Message -----
From: Mike Hearn <mike@plan99•net>
To: Jeff Garzik <jgarzik@exmulti•com>
Cc: Bitcoin Development <bitcoin-development@lists•sourceforge.net>
Sent: Friday, June 15, 2012 3:43 PM
Subject: Re: [Bitcoin-development] New P2P commands for diagnostics, SPV clients

> Yes, the format is something that must be hashed out (no pun
> intended).  Need input from potential users about what information
> they might need.

Matts point that a branch-per-transaction may duplicate data is well
made, that said, I suspect a format that tries to fix this would be
much more complicated.

How about see this project as a three part change?

First step - add the mempool command and make nodes sync up their
mempools on startup.

Second step - if protocol version >= X, the "block" message consists
of a header + num transactions + vector<hash>  instead of the full
transactions themselves.

On receiving such a block, we go look to see which transactions we're
missing from the mempool and request them with getdata. Each time we
receive a tx message we check to see if it was one we were missing
from a block. Once all transactions in the block message are in
memory, we go ahead and assemble the block, then verify as per normal.
This should speed up block propagation. Miners have an incentive to
upgrade because it should reduce wasted work.

Third step - new message, getmerkletx takes a vector<hash> and returns
a merkletx message: "merkle branch missing the root + transaction data
itself" for each requested transaction. The filtering commands are
added, so the block message now only lists transaction hashes that
match the filter which can then be requested with getmerkletx.

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Bitcoin-development mailing list
Bitcoin-development@lists•sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bitcoin-development] New P2P commands for diagnostics, SPV clients
  2012-06-15 18:42       ` Amir Taaki
@ 2012-06-16  8:25         ` Mike Hearn
  0 siblings, 0 replies; 21+ messages in thread
From: Mike Hearn @ 2012-06-16  8:25 UTC (permalink / raw)
  To: Amir Taaki; +Cc: bitcoin-development

The bottleneck for the android Bitcoin Wallet app is rapidly becoming
bandwidth and parse time.

On Fri, Jun 15, 2012 at 8:42 PM, Amir Taaki <zgenjix@yahoo•com> wrote:
> Why though? The bottleneck is not network traffic but disk space usage/blockchain validation time.
>
>
>
> ----- Original Message -----
> From: Mike Hearn <mike@plan99•net>
> To: Jeff Garzik <jgarzik@exmulti•com>
> Cc: Bitcoin Development <bitcoin-development@lists•sourceforge.net>
> Sent: Friday, June 15, 2012 3:43 PM
> Subject: Re: [Bitcoin-development] New P2P commands for diagnostics, SPV clients
>
>> Yes, the format is something that must be hashed out (no pun
>> intended).  Need input from potential users about what information
>> they might need.
>
> Matts point that a branch-per-transaction may duplicate data is well
> made, that said, I suspect a format that tries to fix this would be
> much more complicated.
>
> How about see this project as a three part change?
>
> First step - add the mempool command and make nodes sync up their
> mempools on startup.
>
> Second step - if protocol version >= X, the "block" message consists
> of a header + num transactions + vector<hash>  instead of the full
> transactions themselves.
>
> On receiving such a block, we go look to see which transactions we're
> missing from the mempool and request them with getdata. Each time we
> receive a tx message we check to see if it was one we were missing
> from a block. Once all transactions in the block message are in
> memory, we go ahead and assemble the block, then verify as per normal.
> This should speed up block propagation. Miners have an incentive to
> upgrade because it should reduce wasted work.
>
> Third step - new message, getmerkletx takes a vector<hash> and returns
> a merkletx message: "merkle branch missing the root + transaction data
> itself" for each requested transaction. The filtering commands are
> added, so the block message now only lists transaction hashes that
> match the filter which can then be requested with getmerkletx.
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Bitcoin-development mailing list
> Bitcoin-development@lists•sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bitcoin-development
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Bitcoin-development mailing list
> Bitcoin-development@lists•sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bitcoin-development



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bitcoin-development] New P2P commands for diagnostics, SPV clients
  2012-06-15 14:39       ` Matt Corallo
@ 2012-06-16  8:27         ` Mike Hearn
  2012-06-19 19:09           ` Matt Corallo
  0 siblings, 1 reply; 21+ messages in thread
From: Mike Hearn @ 2012-06-16  8:27 UTC (permalink / raw)
  To: Matt Corallo; +Cc: Bitcoin Development

> I'd much rather have an overloaded node respond with 50% fp rate filters
> as an option if there aren't many full nodes available than simply
> disconnect SPV clients.

I don't think the bloom filter settings have any impact on server-side
load ... a node still has to check every transaction against the
filter regardless of how that filter is configured, which means the
same amount of disk io and processing.

How can you reduce load on a peer by negotiating different filter settings?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bitcoin-development] New P2P commands for diagnostics, SPV clients
  2012-06-16  8:27         ` Mike Hearn
@ 2012-06-19 19:09           ` Matt Corallo
  2012-07-21 11:45             ` Mike Hearn
  0 siblings, 1 reply; 21+ messages in thread
From: Matt Corallo @ 2012-06-19 19:09 UTC (permalink / raw)
  To: Mike Hearn; +Cc: Bitcoin Development

On Sat, 2012-06-16 at 10:27 +0200, Mike Hearn wrote:
> > I'd much rather have an overloaded node respond with 50% fp rate filters
> > as an option if there aren't many full nodes available than simply
> > disconnect SPV clients.
> 
> I don't think the bloom filter settings have any impact on server-side
> load ... a node still has to check every transaction against the
> filter regardless of how that filter is configured, which means the
> same amount of disk io and processing.
> 
> How can you reduce load on a peer by negotiating different filter settings?
Agreed, I was largely giving a reason why one may want to negotiate the
filter settings in response to your question as to why it was done.  As
long as there are sane limits (you cant make a 1GB filter by specifying
0% fp and some crazy number of entires), filter negotiation largely isnt
worth it (also prevents any floats from appearing in the p2p protocol,
though in either case it shouldn't be able to cause issues).

Matt




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bitcoin-development] New P2P commands for diagnostics, SPV clients
  2012-06-19 19:09           ` Matt Corallo
@ 2012-07-21 11:45             ` Mike Hearn
  2012-07-23  7:54               ` Andreas Petersson
  0 siblings, 1 reply; 21+ messages in thread
From: Mike Hearn @ 2012-07-21 11:45 UTC (permalink / raw)
  To: Matt Corallo; +Cc: Bitcoin Development

One thing that occurred to me recently is that it'd be useful if
filters could contain exact matches as well as Bloom filters.

Specifically I'm thinking of things like my bond network proposal
where some outputs may be marked as special using script fragments
like "BOND" <data or hash of data> 2DROP.

This would allow systems that are only interested in data and
transactions relevant to bonds to exact-filter the chain on that
marker, and then when a transaction is discovered, add the hash of
that transaction to a parallel Bloom filter, ensuring you can see any
transactions that connect to it.

The spec as provided by Jeff doesn't specify how filters are matched
against transactions. I propose the following algorithm:

For each TX:
- Check if the hash of the tx itself matches the filter
- For each input:
  - For each script data element check if it is found in the filter
  - Check if the COutPoint.hash value is in the filter (let's you
select txns that connect to arbitrary txns of interest)
- For each output
  - For each script data element check if it is found in the filter

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bitcoin-development] New P2P commands for diagnostics, SPV clients
  2012-07-21 11:45             ` Mike Hearn
@ 2012-07-23  7:54               ` Andreas Petersson
  2012-07-23 16:40                 ` Matt Corallo
  2012-07-24  8:16                 ` Mike Hearn
  0 siblings, 2 replies; 21+ messages in thread
From: Andreas Petersson @ 2012-07-23  7:54 UTC (permalink / raw)
  To: bitcoin-development

Some concerns regarding Bloom Filters. I talked with Stefan Thomas on 
the Hackathon Berlin about this.
I tried to follow the discussion closely but i have not taken a look at 
the code yet (is there already an implementation?) so please correct me 
if i got something wrong.

The way the Bloom filters are planned now this requires a complicated 
setup. basically the client will ask the server to replay the whole 
blockchain, but filtered.
This is not optimal for the following reasons:
This will require the server to do a full scan of his data and only 
filter out non-matching entries.

Really lightweight clients (like Bitcoincard), clients with shared 
private keys (electrum-style), or brainwallets - will ask the following 
question quite often to "supernodes": Given my public keys/addresses, 
what is the list of unspent outputs. i think it would make sense to 
include such a command, instead or in addition to the filterload/filterinit.

And perhaps more severe: as far as i understand classic bloom filters, 
the server has no method of indexing his data for the expected requests. 
There is simply no data structure (or maybe it has to be invented) which 
allows the use of an index for queries by bloom filters of *varying 
length* and a generic hashing method.

im not sure what a really efficient data structure for these kinds of 
query is. but i think it should be possible to find a good compromise 
between query flexibility, server load, client privacy.

one possible scheme, looks like this:

the client takes his list of addesses he is interested in. he hashes all 
of them to a fixed-length bit array (bloom filter) of length 64KiB (for 
example), and combines them with | to add more 1's with each address.
the server maintains a binary tree data structure of unspent outputs 
arranged by the Bloom filter bits.
to build the tree, the server would need to calculate the 64KiB bits for 
each address and arrange them in a binary tree. that way he can easily 
traverse the tree for a given bloom query.
if a client whishes to query more broadly he can calculate the bloom 
filter to 64KiB and after that fill up the last 50% of the Bits with 1. 
or 95%. the trailing 1 bits even don't need to be transmitted to the 
server when a client is querying. of course, if the client is more 
privacy-concerned he could also fill up random bits with 1, which would 
not change much actually.

the value of 64KiB is just out of thin air.
according to my experimentation using BloomFilter from Guava - 
currently, also 8KiB would be sufficient to hava a 3% false positive 
rate for the 40000 active addresses we have right now.

someone more familiar with hashing should please give his opinion if 
cutting a bloom filter in half has any bad consequences.

Andreas

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bitcoin-development] New P2P commands for diagnostics, SPV clients
  2012-07-23  7:54               ` Andreas Petersson
@ 2012-07-23 16:40                 ` Matt Corallo
  2012-07-24  8:16                 ` Mike Hearn
  1 sibling, 0 replies; 21+ messages in thread
From: Matt Corallo @ 2012-07-23 16:40 UTC (permalink / raw)
  To: bitcoin-development

On Mon, 2012-07-23 at 09:54 +0200, Andreas Petersson wrote:
> Some concerns regarding Bloom Filters. I talked with Stefan Thomas on 
> the Hackathon Berlin about this.
> I tried to follow the discussion closely but i have not taken a look at 
> the code yet (is there already an implementation?) so please correct me 
> if i got something wrong.
AFAIK there was no implementation.  I pushed one for bitcoinj+bitcoind
today that compiles, but I haven't tested it much further (though its
really quite a simple implementation):
https://github.com/TheBlueMatt/bitcoin/commits/bloom
http://code.google.com/r/bluemattme-bitcoinj/source/list?name=bloomfilter

> The way the Bloom filters are planned now this requires a complicated 
> setup. basically the client will ask the server to replay the whole 
> blockchain, but filtered.
> This is not optimal for the following reasons:
> This will require the server to do a full scan of his data and only 
> filter out non-matching entries.
My implementation has yet to implement block filtering, for now its only
tx inv filtering.  However, its really not that complicated and doing a
scan of any individual transaction is very fast.  So during the download
phase, it really isn't much of any extra load on block chain providers
(aside from having to load inputs in the current implementation, but
that could be optimized some).

> Really lightweight clients (like Bitcoincard), clients with shared 
> private keys (electrum-style), or brainwallets - will ask the following 
> question quite often to "supernodes": Given my public keys/addresses, 
> what is the list of unspent outputs. i think it would make sense to 
> include such a command, instead or in addition to the filterload/filterinit.

> And perhaps more severe: as far as i understand classic bloom filters, 
> the server has no method of indexing his data for the expected requests. 
> There is simply no data structure (or maybe it has to be invented) which 
> allows the use of an index for queries by bloom filters of *varying 
> length* and a generic hashing method.

> im not sure what a really efficient data structure for these kinds of 
> query is. but i think it should be possible to find a good compromise 
> between query flexibility, server load, client privacy.

> one possible scheme, looks like this:

> the client takes his list of addesses he is interested in. he hashes all 
> of them to a fixed-length bit array (bloom filter) of length 64KiB (for 
> example), and combines them with | to add more 1's with each address.
> the server maintains a binary tree data structure of unspent outputs 
> arranged by the Bloom filter bits.
> to build the tree, the server would need to calculate the 64KiB bits for 
> each address and arrange them in a binary tree. that way he can easily 
> traverse the tree for a given bloom query.
> if a client whishes to query more broadly he can calculate the bloom 
> filter to 64KiB and after that fill up the last 50% of the Bits with 1. 
> or 95%. the trailing 1 bits even don't need to be transmitted to the 
> server when a client is querying. of course, if the client is more 
> privacy-concerned he could also fill up random bits with 1, which would 
> not change much actually.
> 
> the value of 64KiB is just out of thin air.
> according to my experimentation using BloomFilter from Guava - 
> currently, also 8KiB would be sufficient to hava a 3% false positive 
> rate for the 40000 active addresses we have right now.
> 
> someone more familiar with hashing should please give his opinion if 
> cutting a bloom filter in half has any bad consequences.
> 
> Andreas
Though I like the idea of having a "give me all unspent outputs for my
pubkeys" command, I see quite a future for clients somewhere between "I
store nothing but pubkeys and don't know about the chain" and "I store a
full chain" and the bloom filters as described are pretty useful for
many clients in that in between.  Also, for clients that are "Really
lightweight clients" (given that they don't know about the chain) should
probably just stick with an electrum-style server-client system with
specialized servers (IMHO) instead of relying on P2P nodes to provide
them with a list of unspent outputs/etc.

In response to Mike's what-the-filter-should-match:
The way it is now, it just checks each input+output to see if the
hash160 of the dest addr, hash160 of the pubkey or hash160 of the p2sh
sh matches the filter as-is.
From the list provided, I think adding a check to allow adding a
specific outpoint to a filter would be nice.
However, in terms of data elements in txes, Im not so sure.
Its by no means a bad idea, and it wouldnt be too much overhead, but
making filters match a very broad set of criteria seems like a bit much
to me, but maybe others disagree?

Matt




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bitcoin-development] New P2P commands for diagnostics, SPV clients
  2012-07-23  7:54               ` Andreas Petersson
  2012-07-23 16:40                 ` Matt Corallo
@ 2012-07-24  8:16                 ` Mike Hearn
  1 sibling, 0 replies; 21+ messages in thread
From: Mike Hearn @ 2012-07-24  8:16 UTC (permalink / raw)
  To: Andreas Petersson; +Cc: bitcoin-development

> Really lightweight clients (like Bitcoincard), clients with shared
> private keys (electrum-style), or brainwallets - will ask the following
> question quite often to "supernodes": Given my public keys/addresses,
> what is the list of unspent outputs. i think it would make sense to
> include such a command, instead or in addition to the filterload/filterinit.

Ultra-lightweight clients like Electrum or smart cards have a
fundamentally different security model to SPV clients, which mean they
cannot connect directly to the P2P network no matter what commands or
db indexes are added.

This seems to be a common point of confusion. Andreas brought up
something similar in a chat yesterday.

To connect to the P2P network, you MUST understand how to walk the
block chain and handle re-orgs. This is not optional. The reason is
that you are connected to random arbitrary nodes who can and maybe
will lie to you. The block chain is a self-proving data structure, a
node cannot lie about it or make you believe garbage unless they can
outrun the rest of the miners combined.

If all you're doing is asking a remote node to tell you about what
coins are available, that node can simply say "guess what, you're a
millionaire!" and you have no way to discover it's wrong. This can be
dangerous in the case where you think you've received a payment but
actually did not, eg, because your internet connection got tampered
with in some way. SPV clients have the same issue for zero-confirmed
transactions, but once you see confirmations at high speeds you can be
pretty sure the network accepted the transaction. For clients that
don't understand the block chain confirmations don't have any meaning.

That's why Electrum requires a trusted server and connects to it via SSL.

> And perhaps more severe: as far as i understand classic bloom filters,
> the server has no method of indexing his data for the expected requests.

It doesn't matter. CPU wise Bloom filtering of blocks is very cheap
and can be trivially parallelised in the unlikely event it's
necessary. The expensive part of serving a Bloom filtered chain to an
SPV client is simply moving the disk head into the right position and
waiting for the platter to rotate. Blocks are stored sequentially and
modern hard disks transfer data once positioned at gigabit speeds so
requesting 1 or 2000 blocks is not significantly different.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2012-07-24  8:16 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-13 20:46 [Bitcoin-development] New P2P commands for diagnostics, SPV clients Jeff Garzik
2012-06-14 11:52 ` Mike Hearn
2012-06-15 11:52   ` Mike Hearn
2012-06-15 13:19   ` Matt Corallo
2012-06-15 13:23     ` Mike Hearn
2012-06-15 14:39       ` Matt Corallo
2012-06-16  8:27         ` Mike Hearn
2012-06-19 19:09           ` Matt Corallo
2012-07-21 11:45             ` Mike Hearn
2012-07-23  7:54               ` Andreas Petersson
2012-07-23 16:40                 ` Matt Corallo
2012-07-24  8:16                 ` Mike Hearn
2012-06-15 13:26   ` Jeff Garzik
2012-06-15 13:43     ` Mike Hearn
2012-06-15 14:56       ` Matt Corallo
2012-06-15 15:32       ` Jeff Garzik
2012-06-15 16:20         ` Matt Corallo
2012-06-15 18:42       ` Amir Taaki
2012-06-16  8:25         ` Mike Hearn
2012-06-15 15:43   ` Simon Barber
2012-06-15 16:40     ` Jeff Garzik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox