[Bitcoin-development] Squashing redundant tx data in blocks on the wire

public inbox for bitcoindev@googlegroups.com
 help / color / mirror / Atom feed

* [Bitcoin-development] Squashing redundant tx data in blocks on the wire
@ 2014-07-17 21:35 Kaz Wesley
  2014-07-17 22:46 ` Gavin Andresen
                   ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Kaz Wesley @ 2014-07-17 21:35 UTC (permalink / raw)
  To: bitcoin-development

[-- Attachment #1: Type: text/plain, Size: 4159 bytes --]

OVERVIEW

To improve block propagation, add a new block message that doesn't include
transactions the peer is known to have. The message must never require an
additional round trip due to any transactions the peer doesn't have, but
should
be compatible with peers sometimes forgetting transactions they have known.

APPROACH

For peers advertising support for squashed blocks: a node tracks what txes
it
knows each peer has seen (inv received, tx sent, tx appeared in competing
block
known to peer). Nodes push block contents as txes-not-already-known +
txids-known.

A node should be able to forget invs it has seen without invalidating what
peers
know about its known txes. To allow for this, a node assembles a bloom
filter of
a set of txes it is going to forget, and sends it to peers. The node can
erase
the txes as soon as no blocks requested before the filter was pushed are in
flight (relying on the assumption that messages can be expected to be
processed
in order).

When a node receives a forgotten-filter, it ORs it into its
forgotten-filter for
that peer. Any transactions matching the forgotten-filter are always
included in
full with a block. If the filter is getting full, the node can just clear it
along with peer.setTxKnown.

COSTS

Bloom filtering:
Since the bloom filter is likely to grow slowly and can be dropped when it
is
becoming full, a cheap set of hash functions and element size can be used to
keep overhead more restricted than the bloom filtering done for spv. It's
important for testing txes against the filter to be fast so that it doesn't
delay pushing the block more than the squashing helps.
Nodes currently forget txes rarely, so the bloom filters would only need to
be
used at all under conditions that are not currently common -- but I think
they're important to include to allow for different node behavior in this
regard
in the future.

Tracking txes known to peers:
A multimap of txid->peerId would obviate the current setCurrentlyKnown, and
would not take much more space since each additional peer adds about 1
peerId
per txid (setCurrentlyKnown keeps a uint256 per peer per txid, although it
tracks somewhat fewer txid per node).

Potential vulnerabilities:
- Since the bloom filters will have lower maximum overhead than the current
SPV
  filters and can be dropped at will, this shouldn't enable any resource
  exhaustion attacks that aren't already possible.
- A squashed block with bogus or missing data would be easily detected not
to
  produce the correct merkle root for its BlockHeader.

BENEFITS

Assuming a fairly typical 500 tx block with transaction sizes averaging 300b
(both on the low side), for a 150kb block:

% pruned | block size reduction | relative size reduction
-------- | -------------------- | -----------------------
100      | 134 kB               | 89%
50       | 67 kB                | 45%
25       | 33.5 kB              | 17%

I've been doing some logging, and when my node pushes a block to a peer it
seems
to typically know that a peer has seen most of the txes in the block. Even
in
the case of a small block with only 25% known-known transactions, total
network
bandwidth saved is greater than the bloom filters transmitted unless a node
is
forgetting transactions so rapidly that it pushes new maximum-size
forget-filters every block.

So this is a net gain even in total bandwidth usage, but most importantly
it's
an improvement in block propagation rate and in how block propagation rate
scales with additional transactions.

IMPLEMENTATION QUESTIONS

How should block squashing capability be advertised -- new service bit?

Bloom filters:
- How fast to test against could a suitable bloom filter be made?
- How much memory would each filter need to take, at maximum?
- Can the inputs all being 32 byte hashes be used to optimize filter hash
  calculations?

ROADMAP

If there's support for this proposal, I can begin working on the specific
implementation details, such as the bloom filters, message format, and
capability advertisment, and draft a BIP once I have a concrete proposal for
what those would look like and a corresponding precise cost/benefit
analysis.

--kaz

[-- Attachment #2: Type: text/html, Size: 5348 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Bitcoin-development] Squashing redundant tx data in blocks on the wire
  2014-07-17 21:35 [Bitcoin-development] Squashing redundant tx data in blocks on the wire Kaz Wesley
@ 2014-07-17 22:46 ` Gavin Andresen
  2014-07-17 23:26   ` Kaz Wesley
  2014-07-18 13:53   ` Jeff Garzik
  2014-07-17 23:34 ` Gregory Maxwell
       [not found] ` <CABsx9T2PSa3MpfMMDCb8ACVF5vDOZOFLEK9zfP9PakgHA4U16w@mail.gmail.com>
  2 siblings, 2 replies; 24+ messages in thread
From: Gavin Andresen @ 2014-07-17 22:46 UTC (permalink / raw)
  To: Kaz Wesley; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 942 bytes --]

A couple of half-baked thoughts:

On Thu, Jul 17, 2014 at 5:35 PM, Kaz Wesley <keziahw@gmail•com> wrote:

> If there's support for this proposal, I can begin working on the specific
> implementation details, such as the bloom filters, message format, and
> capability advertisment, and draft a BIP once I have a concrete proposal
> for
> what those would look like and a corresponding precise cost/benefit
> analysis.
>

I'd encourage you to code up a prototype first (or at the same time), in
whatever programming language / networking library you're most familiar
with.

Maybe not even using the existing p2p protocol; there could be a
mining-only very-fast-block-propagation network separate from the existing
p2p network.

Combining your optimizations with "broadcast as many near-miss blocks as
bandwidth will allow" on a mining backbone network should allow insanely
fast propagation of most newly solved blocks.

-- 
--
Gavin Andresen

[-- Attachment #2: Type: text/html, Size: 1632 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Bitcoin-development] Squashing redundant tx data in blocks on the wire
  2014-07-17 22:46 ` Gavin Andresen
@ 2014-07-17 23:26   ` Kaz Wesley
  2014-07-18 13:53   ` Jeff Garzik
  1 sibling, 0 replies; 24+ messages in thread
From: Kaz Wesley @ 2014-07-17 23:26 UTC (permalink / raw)
  To: Gavin Andresen; +Cc: Bitcoin Dev

I'm moving this design document to a gist so that I can integrate
changes as they come up:
https://gist.github.com/kazcw/43c97d3924326beca87d
One thing that I think is an important improvement over my initial
idea is that the bloom filters don't need to be kept around and built
up, they can just be one-shot and clear any matching entries from the
set of known-knowns upon arrival -- provided a node is careful to
ensure the txes it wants to forget are known-known-known (which isn't
as bad as it sounds) to the peer it's telling it's forgetting them
when the forget-filter arrives.

On Thu, Jul 17, 2014 at 3:46 PM, Gavin Andresen <gavinandresen@gmail•com> wrote:
>
> A couple of half-baked thoughts:
>
> On Thu, Jul 17, 2014 at 5:35 PM, Kaz Wesley <keziahw@gmail•com> wrote:
>>
>> If there's support for this proposal, I can begin working on the specific
>> implementation details, such as the bloom filters, message format, and
>> capability advertisment, and draft a BIP once I have a concrete proposal for
>> what those would look like and a corresponding precise cost/benefit analysis.
>
>
> I'd encourage you to code up a prototype first (or at the same time), in whatever programming language / networking library you're most familiar with.
>
> Maybe not even using the existing p2p protocol; there could be a mining-only very-fast-block-propagation network separate from the existing p2p network.
>
> Combining your optimizations with "broadcast as many near-miss blocks as bandwidth will allow" on a mining backbone network should allow insanely fast propagation of most newly solved blocks.
>
> --
> --
> Gavin Andresen

Thanks Gavin, I am planning on working out the design details as I
work on a prototype. I have the beginnings of a previous shot at
implementing this in bitcoind to start from but my new design has some
important improvements to add to that.

-kaz

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Bitcoin-development] Squashing redundant tx data in blocks on the wire
  2014-07-17 21:35 [Bitcoin-development] Squashing redundant tx data in blocks on the wire Kaz Wesley
  2014-07-17 22:46 ` Gavin Andresen
@ 2014-07-17 23:34 ` Gregory Maxwell
       [not found] ` <CABsx9T2PSa3MpfMMDCb8ACVF5vDOZOFLEK9zfP9PakgHA4U16w@mail.gmail.com>
  2 siblings, 0 replies; 24+ messages in thread
From: Gregory Maxwell @ 2014-07-17 23:34 UTC (permalink / raw)
  To: Kaz Wesley; +Cc: Bitcoin Development

On Thu, Jul 17, 2014 at 2:35 PM, Kaz Wesley <keziahw@gmail•com> wrote:
> A node should be able to forget invs it has seen without invalidating what
> peers
> know about its known txes. To allow for this, a node assembles a bloom
> filter of

Another option would be to just guarantee to keep at least the last N
sent in each direction to bound memory usage. N could be negotiated.

Going more complex than that may not have wins enough to justify it...
would be good to measure it.


(If you're not aware of it, check out—
https://en.bitcoin.it/wiki/User:Gmaxwell/block_network_coding for a
more complex idea)



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Bitcoin-development] Squashing redundant tx data in blocks on the wire
  2014-07-17 22:46 ` Gavin Andresen
  2014-07-17 23:26   ` Kaz Wesley
@ 2014-07-18 13:53   ` Jeff Garzik
  2014-07-18 14:53     ` Gavin Andresen
  1 sibling, 1 reply; 24+ messages in thread
From: Jeff Garzik @ 2014-07-18 13:53 UTC (permalink / raw)
  To: Gavin Andresen; +Cc: Bitcoin Dev

On Thu, Jul 17, 2014 at 6:46 PM, Gavin Andresen <gavinandresen@gmail•com> wrote:
> I'd encourage you to code up a prototype first (or at the same time), in
> whatever programming language / networking library you're most familiar
> with.

+1

> Maybe not even using the existing p2p protocol; there could be a mining-only
> very-fast-block-propagation network separate from the existing p2p network.
>
> Combining your optimizations with "broadcast as many near-miss blocks as
> bandwidth will allow" on a mining backbone network should allow insanely
> fast propagation of most newly solved blocks.

Yes, I would encourage thinking along these lines.  That was the
motivation of the UDP P2P protocol extension I wrote:
https://bitcointalk.org/index.php?topic=156769.0

The intention was to experiment with sending block header + tx list +
coinbase, via UDP best effort broadcast.

Incentives:

If your neighbors receiving this message already have the TXs in the
TX list, then the block is complete, and may be relayed further.

If your neighbors do not have all TXs in the block, they must fetch
them at additional time/latency cost.

Thus, you have an incentive to relay blocks containing TXs already
distributed out into network mempools and cached in the signature
cache.

We want to capture that incentive in whatever protocol is eventually
used.  Miners have a TX fee incentive to include many transactions.
In theory, they want to include as many TX as possible.  It will help
us scale quite a bit to solve this problem.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Bitcoin-development] Squashing redundant tx data in blocks on the wire
  2014-07-18 13:53   ` Jeff Garzik
@ 2014-07-18 14:53     ` Gavin Andresen
  2014-07-18 15:06       ` Jeff Garzik
  2014-07-19  8:06       ` Wladimir
  0 siblings, 2 replies; 24+ messages in thread
From: Gavin Andresen @ 2014-07-18 14:53 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 1373 bytes --]

Two more half-baked thoughts:

We should be able to assume that the majority of transaction data (except
for coinbase) has already been propagated. As Jeff said, incentivizing
nodes to propagate transactions is a very good thing (the signature cache
already gives a small incentive to miners to propagate and not 'hoard'
transactions).

So the only information that theoretically needs to be propagated is which
transactions a miner is including in their block, and in what order they
are included.

But if there was some agreed-upon canonical ordering, then it should
theoretically be possible to take shortcuts in the "what order".

You'd start with setof(transactions I think everybody knows about)
Select some subset, based on miner's policy
Sort that subset with the canonical ordering algorithm
Very efficiently broadcast, taking all sorts of shortcuts assuming most of
your peers already know the set you started with and expect the same
canonical ordering (see gmaxwell's thoughts on block encoding).

Second half-baked thought:
I wonder if broadcasting your transaction selection policy ("11KB of free
transactions, sorted by priority, then 111K of fee-paying transactions,
sorted by fee") might make it possible to save even more bandwidth by
letting your peers create a very good approximation of your block with just
that information....

-- 
--
Gavin Andresen

[-- Attachment #2: Type: text/html, Size: 1716 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Bitcoin-development] Squashing redundant tx data in blocks on the wire
  2014-07-18 14:53     ` Gavin Andresen
@ 2014-07-18 15:06       ` Jeff Garzik
  2014-07-18 17:39         ` Kaz Wesley
  2014-07-19  8:06       ` Wladimir
  1 sibling, 1 reply; 24+ messages in thread
From: Jeff Garzik @ 2014-07-18 15:06 UTC (permalink / raw)
  To: Gavin Andresen; +Cc: Bitcoin Dev

On Fri, Jul 18, 2014 at 10:53 AM, Gavin Andresen
<gavinandresen@gmail•com> wrote:
> But if there was some agreed-upon canonical ordering, then it should
> theoretically be possible to take shortcuts in the "what order".
>
> You'd start with setof(transactions I think everybody knows about)
> Select some subset, based on miner's policy
> Sort that subset with the canonical ordering algorithm
> Very efficiently broadcast, taking all sorts of shortcuts assuming most of
> your peers already know the set you started with and expect the same
> canonical ordering (see gmaxwell's thoughts on block encoding).

Related implementation detail:  Having pursued this train of thought,
I noted that you don't want to include too-young transactions that you
received in the past few seconds, because those are likely still
propagating around the network.

> Second half-baked thought:
> I wonder if broadcasting your transaction selection policy ("11KB of free
> transactions, sorted by priority, then 111K of fee-paying transactions,
> sorted by fee") might make it possible to save even more bandwidth by
> letting your peers create a very good approximation of your block with just
> that information....

Absolutely.  One path I would like to see pursued is multiple
p2pool-esque chains.  Each with their own policy, perhaps with their
own administrative team.  ie. you could have a fully decentralized
p2pool-like chain, or multiple such chains, each with a stated
policy/reward pattern.  Or, GHash/BTCGuild/Eligius could run a
semi-centrally managed chain ultimately guaranteed not only by
protocol but by administrators' digital signatures.

In each case, advertising technical attributes about your pool [chain]
policy would give nodes the better ability to predict what is in an
upcoming block.

And the flip side of that, such predictions are never perfect.  Need
to make sure the fallback case, while undoubtedly more costly than the
Fast Path, is not overly painful.

-- 
Jeff Garzik
Bitcoin core developer and open source evangelist
BitPay, Inc.      https://bitpay.com/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Bitcoin-development] Squashing redundant tx data in blocks on the wire
  2014-07-18 15:06       ` Jeff Garzik
@ 2014-07-18 17:39         ` Kaz Wesley
  2014-07-18 17:48           ` Jeff Garzik
  0 siblings, 1 reply; 24+ messages in thread
From: Kaz Wesley @ 2014-07-18 17:39 UTC (permalink / raw)
  To: Bitcoin Dev

Peers exchanging mempool priority policies is great; that accomplishes
the flexibility in what txes to remember that I was going for with the
forget-filters, but much more neatly, with less overhead and some side
benefits.

Here's what I'm picturing now:
- exchange priority policies in peer introductions
- assign unique sequential IDs in the order the transactions were
inved (per peer)
- receiving a getdata for a tx updates last-known-peer-received inv to
all invs up to the one referenced
- include ID-last-received, last-known-peer-received in sparse block
- reference txes in sparse block by index in receiver's
prioritiziation with peer's sent invs up to ID-last-received and
sender's prior invs up to last-known-peer-received

Possible new messages:
- sparseblock
- invack message a node can send at times when it's received a bunch
of invs it already has, so it hasn't acked with a getdata in a while
- gettx: getdata, but using new sequential ID to save 28 bytes per tx

It seems important for ordering policies to be able to be specified in
as much detail as possible. Parameters that should be available:
- total inputs
- total outputs
- bytes
- coin days destroyed
- net UTXO size change
- sigops
- is data carrier
- is output raw multisig
- age in mempool
- what else?
This parameter set should be extensible to allow for unforeseen future factors.

Ordering policies should allow arbitrary algebraic combinations of
their parameters, as well as thresholds. Boolean combinations of
sub-policies would also be desirable. This could be implemented with a
tx-script-like stack-based language, in which each supported tx
property is pushed onto the stack by a particular opcode, and
+-*//min/max/boolean operators combine them to yield the sort key.

Difficult parameters:
* Coin-days-destroyed: changes, peers need agreement on when (if?)
it's recalculated. Probably can just not recalculate, but peers still
need agreement on "time seen" to get CDD.
* Age in mempool: seems intractable in terms of time, but could be
done easily in terms of "how many txes old is this sequential ID"

One potential pitfall: this allows for an environment of completely
heterogeneous mempool policies. I think that's a good thing, but we
need to avoid a situation where only least-common-denominator
transactions make it farther than a hop or two, and we don't want
nodes to have a strong preference for connecting to like-minded peers
since clustering reduces overall connectivity. It may be worthwhile to
add a parallel mechanism for relay policies, to differentiate between
what a node would keep in its mempool vs. what it wouldn't even relay
and doesn't want to see at all. Relay policies could be specified just
like prioritization policies, but with the final stack value evaluated
in a boolean context.

An interesting additional use of policy-scripts would be a
standardized way for miners to include a policy script in a coinbase,
allowing miners a mechanism to advertise things like their relative
price of sigops vs bytes. Nodes may then choose to take this
information into account in order to optimize their mempool policies
for likelihood of consistency with future blocks. Since policy scripts
provide only relative information on prices of different transaction
properties rather than an absolute fee, this should not allow miners
to "vote fees up", although care would need to be taken they wouldn't
be able to drive up prices by claiming common transaction types are at
the high end of the fee scale.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Bitcoin-development] Squashing redundant tx data in blocks on the wire
  2014-07-18 17:39         ` Kaz Wesley
@ 2014-07-18 17:48           ` Jeff Garzik
  2014-07-18 17:53             ` Kaz Wesley
  0 siblings, 1 reply; 24+ messages in thread
From: Jeff Garzik @ 2014-07-18 17:48 UTC (permalink / raw)
  To: Kaz Wesley; +Cc: Bitcoin Dev

On a flood-fill network, you don't want to create a storm of "I
already have this" replies.

On Fri, Jul 18, 2014 at 1:39 PM, Kaz Wesley <keziahw@gmail•com> wrote:
> Peers exchanging mempool priority policies is great; that accomplishes
> the flexibility in what txes to remember that I was going for with the
> forget-filters, but much more neatly, with less overhead and some side
> benefits.
>
> Here's what I'm picturing now:
> - exchange priority policies in peer introductions
> - assign unique sequential IDs in the order the transactions were
> inved (per peer)
> - receiving a getdata for a tx updates last-known-peer-received inv to
> all invs up to the one referenced
> - include ID-last-received, last-known-peer-received in sparse block
> - reference txes in sparse block by index in receiver's
> prioritiziation with peer's sent invs up to ID-last-received and
> sender's prior invs up to last-known-peer-received
>
> Possible new messages:
> - sparseblock
> - invack message a node can send at times when it's received a bunch
> of invs it already has, so it hasn't acked with a getdata in a while
> - gettx: getdata, but using new sequential ID to save 28 bytes per tx
>
> It seems important for ordering policies to be able to be specified in
> as much detail as possible. Parameters that should be available:
> - total inputs
> - total outputs
> - bytes
> - coin days destroyed
> - net UTXO size change
> - sigops
> - is data carrier
> - is output raw multisig
> - age in mempool
> - what else?
> This parameter set should be extensible to allow for unforeseen future factors.
>
> Ordering policies should allow arbitrary algebraic combinations of
> their parameters, as well as thresholds. Boolean combinations of
> sub-policies would also be desirable. This could be implemented with a
> tx-script-like stack-based language, in which each supported tx
> property is pushed onto the stack by a particular opcode, and
> +-*//min/max/boolean operators combine them to yield the sort key.
>
> Difficult parameters:
> * Coin-days-destroyed: changes, peers need agreement on when (if?)
> it's recalculated. Probably can just not recalculate, but peers still
> need agreement on "time seen" to get CDD.
> * Age in mempool: seems intractable in terms of time, but could be
> done easily in terms of "how many txes old is this sequential ID"
>
> One potential pitfall: this allows for an environment of completely
> heterogeneous mempool policies. I think that's a good thing, but we
> need to avoid a situation where only least-common-denominator
> transactions make it farther than a hop or two, and we don't want
> nodes to have a strong preference for connecting to like-minded peers
> since clustering reduces overall connectivity. It may be worthwhile to
> add a parallel mechanism for relay policies, to differentiate between
> what a node would keep in its mempool vs. what it wouldn't even relay
> and doesn't want to see at all. Relay policies could be specified just
> like prioritization policies, but with the final stack value evaluated
> in a boolean context.
>
> An interesting additional use of policy-scripts would be a
> standardized way for miners to include a policy script in a coinbase,
> allowing miners a mechanism to advertise things like their relative
> price of sigops vs bytes. Nodes may then choose to take this
> information into account in order to optimize their mempool policies
> for likelihood of consistency with future blocks. Since policy scripts
> provide only relative information on prices of different transaction
> properties rather than an absolute fee, this should not allow miners
> to "vote fees up", although care would need to be taken they wouldn't
> be able to drive up prices by claiming common transaction types are at
> the high end of the fee scale.
>
> ------------------------------------------------------------------------------
> Want fast and easy access to all the code in your enterprise? Index and
> search up to 200,000 lines of code with a free copy of Black Duck
> Code Sight - the same software that powers the world's largest code
> search on Ohloh, the Black Duck Open Hub! Try it now.
> http://p.sf.net/sfu/bds
> _______________________________________________
> Bitcoin-development mailing list
> Bitcoin-development@lists•sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bitcoin-development



-- 
Jeff Garzik
Bitcoin core developer and open source evangelist
BitPay, Inc.      https://bitpay.com/



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Bitcoin-development] Squashing redundant tx data in blocks on the wire
  2014-07-18 17:48           ` Jeff Garzik
@ 2014-07-18 17:53             ` Kaz Wesley
  2014-07-18 19:51               ` Kaz Wesley
  0 siblings, 1 reply; 24+ messages in thread
From: Kaz Wesley @ 2014-07-18 17:53 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Bitcoin Dev

That's true, but I think it can be balanced with the usefulness of
knowing what messages a node has received. An invack would be sent if
N invs have been received without any resulting getdata; since we're
keeping track of peer inv order, one message can cover an arbitrarily
large series of invs.

On Fri, Jul 18, 2014 at 10:48 AM, Jeff Garzik <jgarzik@bitpay•com> wrote:
> On a flood-fill network, you don't want to create a storm of "I
> already have this" replies.
>
> On Fri, Jul 18, 2014 at 1:39 PM, Kaz Wesley <keziahw@gmail•com> wrote:
>> Peers exchanging mempool priority policies is great; that accomplishes
>> the flexibility in what txes to remember that I was going for with the
>> forget-filters, but much more neatly, with less overhead and some side
>> benefits.
>>
>> Here's what I'm picturing now:
>> - exchange priority policies in peer introductions
>> - assign unique sequential IDs in the order the transactions were
>> inved (per peer)
>> - receiving a getdata for a tx updates last-known-peer-received inv to
>> all invs up to the one referenced
>> - include ID-last-received, last-known-peer-received in sparse block
>> - reference txes in sparse block by index in receiver's
>> prioritiziation with peer's sent invs up to ID-last-received and
>> sender's prior invs up to last-known-peer-received
>>
>> Possible new messages:
>> - sparseblock
>> - invack message a node can send at times when it's received a bunch
>> of invs it already has, so it hasn't acked with a getdata in a while
>> - gettx: getdata, but using new sequential ID to save 28 bytes per tx
>>
>> It seems important for ordering policies to be able to be specified in
>> as much detail as possible. Parameters that should be available:
>> - total inputs
>> - total outputs
>> - bytes
>> - coin days destroyed
>> - net UTXO size change
>> - sigops
>> - is data carrier
>> - is output raw multisig
>> - age in mempool
>> - what else?
>> This parameter set should be extensible to allow for unforeseen future factors.
>>
>> Ordering policies should allow arbitrary algebraic combinations of
>> their parameters, as well as thresholds. Boolean combinations of
>> sub-policies would also be desirable. This could be implemented with a
>> tx-script-like stack-based language, in which each supported tx
>> property is pushed onto the stack by a particular opcode, and
>> +-*//min/max/boolean operators combine them to yield the sort key.
>>
>> Difficult parameters:
>> * Coin-days-destroyed: changes, peers need agreement on when (if?)
>> it's recalculated. Probably can just not recalculate, but peers still
>> need agreement on "time seen" to get CDD.
>> * Age in mempool: seems intractable in terms of time, but could be
>> done easily in terms of "how many txes old is this sequential ID"
>>
>> One potential pitfall: this allows for an environment of completely
>> heterogeneous mempool policies. I think that's a good thing, but we
>> need to avoid a situation where only least-common-denominator
>> transactions make it farther than a hop or two, and we don't want
>> nodes to have a strong preference for connecting to like-minded peers
>> since clustering reduces overall connectivity. It may be worthwhile to
>> add a parallel mechanism for relay policies, to differentiate between
>> what a node would keep in its mempool vs. what it wouldn't even relay
>> and doesn't want to see at all. Relay policies could be specified just
>> like prioritization policies, but with the final stack value evaluated
>> in a boolean context.
>>
>> An interesting additional use of policy-scripts would be a
>> standardized way for miners to include a policy script in a coinbase,
>> allowing miners a mechanism to advertise things like their relative
>> price of sigops vs bytes. Nodes may then choose to take this
>> information into account in order to optimize their mempool policies
>> for likelihood of consistency with future blocks. Since policy scripts
>> provide only relative information on prices of different transaction
>> properties rather than an absolute fee, this should not allow miners
>> to "vote fees up", although care would need to be taken they wouldn't
>> be able to drive up prices by claiming common transaction types are at
>> the high end of the fee scale.
>>
>> ------------------------------------------------------------------------------
>> Want fast and easy access to all the code in your enterprise? Index and
>> search up to 200,000 lines of code with a free copy of Black Duck
>> Code Sight - the same software that powers the world's largest code
>> search on Ohloh, the Black Duck Open Hub! Try it now.
>> http://p.sf.net/sfu/bds
>> _______________________________________________
>> Bitcoin-development mailing list
>> Bitcoin-development@lists•sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/bitcoin-development
>
>
>
> --
> Jeff Garzik
> Bitcoin core developer and open source evangelist
> BitPay, Inc.      https://bitpay.com/



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Bitcoin-development] Squashing redundant tx data in blocks on the wire
  2014-07-18 17:53             ` Kaz Wesley
@ 2014-07-18 19:51               ` Kaz Wesley
  2014-07-18 19:55                 ` Jeff Garzik
  0 siblings, 1 reply; 24+ messages in thread
From: Kaz Wesley @ 2014-07-18 19:51 UTC (permalink / raw)
  To: Bitcoin Dev

I've updated the gist, and added an additional proposal that I think
meshes well:
https://gist.github.com/kazcw/43c97d3924326beca87d#ultra-fast-block-validation

sparseblocks + UFBV would tighten the new-block process to this (when
txes have been received in advance):
- receive block (~2kB for 1000 tx)
- check whether block contains txes known to belong to conflict-sets,
and if so whether more than one tx from a single conflict-set has been
included (a few operations on very small sets)
- relay block (~2kB)

The benefits of these changes only occur when the transactions have
been seen in advance, but incentivizing ahead-of-block transaction
propogation is a plus, as Jeff mentioned; working on a block without
first ensuring peers have its transactions would be very expensive
from a miner's point of view.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Bitcoin-development] Squashing redundant tx data in blocks on the wire
  2014-07-18 19:51               ` Kaz Wesley
@ 2014-07-18 19:55                 ` Jeff Garzik
  2014-07-19  0:54                   ` Emin Gün Sirer
  0 siblings, 1 reply; 24+ messages in thread
From: Jeff Garzik @ 2014-07-18 19:55 UTC (permalink / raw)
  To: Kaz Wesley; +Cc: Bitcoin Dev

Related:  We must handle some legitimate miner-privately-mined cases,
such as miner payout TXs (outside coinbase) or side chain conditional
TXs[1].

[1] https://bitcointalk.org/index.php?topic=676703.msg7682680#msg7682680

On Fri, Jul 18, 2014 at 3:51 PM, Kaz Wesley <keziahw@gmail•com> wrote:
> I've updated the gist, and added an additional proposal that I think
> meshes well:
> https://gist.github.com/kazcw/43c97d3924326beca87d#ultra-fast-block-validation
>
> sparseblocks + UFBV would tighten the new-block process to this (when
> txes have been received in advance):
> - receive block (~2kB for 1000 tx)
> - check whether block contains txes known to belong to conflict-sets,
> and if so whether more than one tx from a single conflict-set has been
> included (a few operations on very small sets)
> - relay block (~2kB)
>
> The benefits of these changes only occur when the transactions have
> been seen in advance, but incentivizing ahead-of-block transaction
> propogation is a plus, as Jeff mentioned; working on a block without
> first ensuring peers have its transactions would be very expensive
> from a miner's point of view.
>
> ------------------------------------------------------------------------------
> Want fast and easy access to all the code in your enterprise? Index and
> search up to 200,000 lines of code with a free copy of Black Duck
> Code Sight - the same software that powers the world's largest code
> search on Ohloh, the Black Duck Open Hub! Try it now.
> http://p.sf.net/sfu/bds
> _______________________________________________
> Bitcoin-development mailing list
> Bitcoin-development@lists•sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bitcoin-development



-- 
Jeff Garzik
Bitcoin core developer and open source evangelist
BitPay, Inc.      https://bitpay.com/



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Bitcoin-development] Squashing redundant tx data in blocks on the wire
  2014-07-18 19:55                 ` Jeff Garzik
@ 2014-07-19  0:54                   ` Emin Gün Sirer
  2014-07-19  1:25                     ` Gregory Maxwell
  0 siblings, 1 reply; 24+ messages in thread
From: Emin Gün Sirer @ 2014-07-19  0:54 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 5227 bytes --]

I thought I'd chime in and point out some research results that might help.
Even if they don't, there is a cool underlying technique that some of you
might find interesting.

The problem being tackled here is very similar to "set reconciliation,"
where
peer A thinks that the set of transactions that should be in the block is
S_A,
and peer B has actually included set S_B, and S_A and S_B are expected
to not differ much. Ideally, one would like the communication complexity
between A and B to be O(delta), not O(S_B) as it is right now. And ideally,
one would like B to send a single message to A, and for A to figure out the
difference between the two sets, without any lengthy back and forth
communication. In essence, I would like to give you some magical packet
that is pretty small and communicates just the delta between what you and
I know.

This paper from Cornell describes a scheme for achieving this:
   Yaron Minsky, Ari Trachtenberg, Richard Zippel: Set reconciliation with
nearly optimal communication complexity. IEEE Transactions on Information
Theory 49(9): 2213-2218 (2003)
   http://ipsit.bu.edu/documents/ieee-it3-web.pdf

Those of you looking for a TL;DR should read the intro and then skip to
page 8 for the example. The underlying trick is very cool, comes from the
peer-to-peer/gossip literature, and it is underused. It'd be really cool if
it
could be applied to this problem to reduce the size of the packets.

This approach has three benefits over the Bloom filter approach (if I
understand the Bloom filter idea correctly):

(1) Bloom filters require packets that are still O(S_A),

(2) Bloom filters are probabilistic, so require extra complications
when there is a hash collision. In the worst case, A might get confused
about which transaction B actually included, which would lead to a
fork. (I am not sure if I followed the Bloom filter idea fully -- this may
not happen with the proposal, but it's a possibility with a naive Bloom
filter implementation)

(3) Bloom filters are interactive, so when A detects that B has included
some transactions that A does not know about, it has to send a message
to figure out what those transactions are.

Set reconciliation is O(delta), non-probabilistic, and non-interactive. The
naive version requires that one have some idea of the size of the delta,
but I think the paper has some discussion of how to handle the delta
estimate.

I have not gone through the full exercise of actually applying this trick to
the Bitcoin p2p protocol yet, but wanted to draw your attention to it.
If someone is interested in applying this stuff to Bitcoin, I'd be happy
to communicate further off list.

- egs

On Fri, Jul 18, 2014 at 12:55 PM, Jeff Garzik <jgarzik@bitpay•com> wrote:

> Related:  We must handle some legitimate miner-privately-mined cases,
> such as miner payout TXs (outside coinbase) or side chain conditional
> TXs[1].
>
> [1] https://bitcointalk.org/index.php?topic=676703.msg7682680#msg7682680
>
> On Fri, Jul 18, 2014 at 3:51 PM, Kaz Wesley <keziahw@gmail•com> wrote:
> > I've updated the gist, and added an additional proposal that I think
> > meshes well:
> >
> https://gist.github.com/kazcw/43c97d3924326beca87d#ultra-fast-block-validation
> >
> > sparseblocks + UFBV would tighten the new-block process to this (when
> > txes have been received in advance):
> > - receive block (~2kB for 1000 tx)
> > - check whether block contains txes known to belong to conflict-sets,
> > and if so whether more than one tx from a single conflict-set has been
> > included (a few operations on very small sets)
> > - relay block (~2kB)
> >
> > The benefits of these changes only occur when the transactions have
> > been seen in advance, but incentivizing ahead-of-block transaction
> > propogation is a plus, as Jeff mentioned; working on a block without
> > first ensuring peers have its transactions would be very expensive
> > from a miner's point of view.
> >
> >
> ------------------------------------------------------------------------------
> > Want fast and easy access to all the code in your enterprise? Index and
> > search up to 200,000 lines of code with a free copy of Black Duck
> > Code Sight - the same software that powers the world's largest code
> > search on Ohloh, the Black Duck Open Hub! Try it now.
> > http://p.sf.net/sfu/bds
> > _______________________________________________
> > Bitcoin-development mailing list
> > Bitcoin-development@lists•sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/bitcoin-development
>
>
>
> --
> Jeff Garzik
> Bitcoin core developer and open source evangelist
> BitPay, Inc.      https://bitpay.com/
>
>
> ------------------------------------------------------------------------------
> Want fast and easy access to all the code in your enterprise? Index and
> search up to 200,000 lines of code with a free copy of Black Duck
> Code Sight - the same software that powers the world's largest code
> search on Ohloh, the Black Duck Open Hub! Try it now.
> http://p.sf.net/sfu/bds
> _______________________________________________
> Bitcoin-development mailing list
> Bitcoin-development@lists•sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bitcoin-development
>

[-- Attachment #2: Type: text/html, Size: 7602 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Bitcoin-development] Squashing redundant tx data in blocks on the wire
  2014-07-19  0:54                   ` Emin Gün Sirer
@ 2014-07-19  1:25                     ` Gregory Maxwell
  2014-07-19  3:06                       ` Emin Gün Sirer
  0 siblings, 1 reply; 24+ messages in thread
From: Gregory Maxwell @ 2014-07-19  1:25 UTC (permalink / raw)
  To: Emin Gün Sirer; +Cc: Bitcoin Dev

On Fri, Jul 18, 2014 at 5:54 PM, Emin Gün Sirer <el33th4x0r@gmail•com> wrote:
> The problem being tackled here is very similar to "set reconciliation,"
> where
> peer A thinks that the set of transactions that should be in the block is
> S_A,

Most things I've seen working in this space are attempting to minimize
the data transfered. At least for the miner-interested case the round
complexity is much more important because a single RTT is enough to
basically send the whole block on a lot of very relevant paths.

I know much better is possible (see up-thread where I linked to an old
proposal to use forward error correction to transfer with low data
transfer (but not optimal) and negligible probability of needing a
round-trip, with a tradeoff for more overhead for lower roundtrip
probability).

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Bitcoin-development] Squashing redundant tx data in blocks on the wire
  2014-07-19  1:25                     ` Gregory Maxwell
@ 2014-07-19  3:06                       ` Emin Gün Sirer
  2014-07-19  6:48                         ` Gregory Maxwell
  0 siblings, 1 reply; 24+ messages in thread
From: Emin Gün Sirer @ 2014-07-19  3:06 UTC (permalink / raw)
  To: Gregory Maxwell; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 1205 bytes --]

> Most things I've seen working in this space are attempting to minimize
> the data transfered. At least for the miner-interested case the round
> complexity is much more important because a single RTT is enough to
> basically send the whole block on a lot of very relevant paths.

Agreed. Yaron's scheme is magical because it is non-interactive. I send you
a packet of O(expected-delta) and you immediately figure out the delta
without further back and forth communication, each requiring an RTT.

> I know much better is possible (see up-thread where I linked to an old
> proposal to use forward error correction to transfer with low data
> transfer (but not optimal) and negligible probability of needing a
> round-trip, with a tradeoff for more overhead for lower roundtrip
> probability).

FEC schemes are both fairly complex, because the set is constantly
changing, and (if i understand your suggestion correctly) they add
additional metadata overhead (albeit mostly during tx propagation). Set
reconciliation is near optimal.

In any case, I have no horse here (I think changing the client so it's
multithreaded is the best way to go), but Yaron's work is pretty cool and
may be applicable.

- egs

[-- Attachment #2: Type: text/html, Size: 1381 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Bitcoin-development] Squashing redundant tx data in blocks on the wire
  2014-07-19  3:06                       ` Emin Gün Sirer
@ 2014-07-19  6:48                         ` Gregory Maxwell
  0 siblings, 0 replies; 24+ messages in thread
From: Gregory Maxwell @ 2014-07-19  6:48 UTC (permalink / raw)
  To: Emin Gün Sirer; +Cc: Bitcoin Dev

On Fri, Jul 18, 2014 at 8:06 PM, Emin Gün Sirer <el33th4x0r@gmail•com> wrote:
>
>> Most things I've seen working in this space are attempting to minimize
>> the data transfered. At least for the miner-interested case the round
>> complexity is much more important because a single RTT is enough to
>> basically send the whole block on a lot of very relevant paths.
>
> Agreed. Yaron's scheme is magical because it is non-interactive. I send you
> a packet of O(expected-delta) and you immediately figure out the delta
> without further back and forth communication, each requiring an RTT.

Oh that does sound interesting— its the property I was trying to
approximate with the FEC..  It achieves the one-shot, but there is
overhead. One plus we have is that we can do some tricks to make some
computational soundness arguments that we'd actually get average
performance on average (e.g. that someone can't author transactions in
such a way as to jam the process).

> In any case, I have no horse here (I think changing the client so it's
> multithreaded is the best way to go), but Yaron's work is pretty cool and
> may be applicable.


Thank you, I've certantly queued the paper for reading.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Bitcoin-development] Squashing redundant tx data in blocks on the wire
  2014-07-18 14:53     ` Gavin Andresen
  2014-07-18 15:06       ` Jeff Garzik
@ 2014-07-19  8:06       ` Wladimir
  1 sibling, 0 replies; 24+ messages in thread
From: Wladimir @ 2014-07-19  8:06 UTC (permalink / raw)
  To: Gavin Andresen; +Cc: Bitcoin Dev

On Fri, Jul 18, 2014 at 4:53 PM, Gavin Andresen <gavinandresen@gmail•com> wrote:
> Two more half-baked thoughts:
>
> We should be able to assume that the majority of transaction data (except
> for coinbase) has already been propagated. As Jeff said, incentivizing nodes
> to propagate transactions is a very good thing (the signature cache already
> gives a small incentive to miners to propagate and not 'hoard'
> transactions).

Maybe a stupid idea - but couldn't we make that assumption a surety by
starting the 'set synchronization process' as soon as the miner starts
crunching on a certain block, instead of when it broadcasts it? So the
peers are prepared, and the actual block broadcast is just the header
+ coinbase tx.

Wladimir



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Bitcoin-development] Squashing redundant tx data in blocks on the wire
       [not found]   ` <CAPkFh0vKFnKRE-sd-Z9t1zB73VLPsiaQ3o=OYgBqqtUE4_rTaw@mail.gmail.com>
@ 2014-07-31 20:47     ` Kaz Wesley
  2014-07-31 21:29       ` Gregory Maxwell
  0 siblings, 1 reply; 24+ messages in thread
From: Kaz Wesley @ 2014-07-31 20:47 UTC (permalink / raw)
  To: Emin Gün Sirer; +Cc: Bitcoin Dev

I don't see how set reconciliation alone would be practical for
condensed block exchange -- if the keys are txids it'd require a round
trip to request the missing tx; if we could somehow get the "What's
the Difference" approach to effectively operate on full transactions
instead, the log(keysize) factor overhead would make any transactions
not mutually-known very expensive to exchange (at keysize=32b, data
would need to be 80% mutually-known just to break even). There's also
the complication and/or overhead of establishing an "expected block"
to reconcile with the actual block.

The approach of remembering what invs have been transmitted both
directions along each connection is less elegant; it requires
remembering a lot of communication history, introducing a major point
of statefulness to the protocol, and custom-compacting blocks for each
peer. But it is also very effective at squeezing bytes, cheap in cpu
cycles, and the implementation is fairly simple. The wealth of mutual
knowledge already available in the current protocol allows
accomplishing the goal of exchanging blocks efficiently by solving a
much easier problem than its context-less cousin. I have my doubts
that it is possible for even an optimal contextless solution to do as
well as channel memory in terms of bytes exchanged or computational
complexity -- you can't beat making use of the available information.

I have an implementation of inv-history-tracking that uses a 2b/tx
alternative to getdata for tx, and I've had that running between two
nodes for ~2 weeks now. I've been working on a better implementation
of that plus the sparseblock messages, and I'll have the sparseblock
prototype (suitable for something like Gregory's remember-last-N
approach) up and running in a couple of days or so. The prototype
handles assigning compact identifiers to transactions and using those
in block messages; there's a lot of bit-packing sort of tweaks that
can be done that I'm not including in the initial prototype. The
prototype will be able to log history-hit rates, so if we run a few
sparseblocks nodes connected to each other for a while we should get a
good idea of how much efficiency gain this provides, and how it can be
improved. This approach even without the intensive bit packing has a
total vtx transmission size of 2*nTxKnown + 1*nTxUnknown +
nBytesTxUnknown, where only a small window of very recent transactions
and any transactions that have fallen out of the history limit would
be mutually known but not known to be known.

It would be possible to nearly eliminate even that overhead for both
known and unknown transactions with compact descriptions of block tx
inclusion and ordering policies as Gavin brought up, for which
something like scripts defining priority formulas would be a possible
implementation (https://gist.github.com/kazcw/43c97d3924326beca87d#ordering-policy
-- n.b. most of the rest of the gist is currently outdated). But since
priority scripts are themselves more complicated than the rest of the
sparseblock implementation, and basic sparseblocks achieve the vast
majority of bandwidth savings, I think it's worth implementing
sparseblocks without priority scripts now and then using priority
scripts for sparseblocks2 along with all the other things they can do
later.

Set reconciliation does look like a great way to synchronize mempools.
I've been thinking, contextless low-cost mempool exchange would enable
a node to have one or more "roaming" peer slots -- connect to a node,
fill in each other's mempools, move on to another peer. It seems like
this would go a long way to mitigate potential pathological network
topologies -- it would make it very difficult to sybil attack a node
(barring an attacker in a position to spoof IP addresses), and if a
serious bug or DoS attack caused the network to start to partition
itself due to DoS bans, it only takes occasional roamers crossing the
partition to keep both sides generally in sync.
Efficient mempool synchronization would also increase the efficacy of
channel-memory sparseblocks: it picks up transactions too old to have
been exchanged via invs, and could also allow nodes to know exactly
what transactions their peers have discarded.

On Thu, Jul 31, 2014 at 8:31 AM, Gavin Andresen <gavinandresen@gmail•com> wrote:
> I've been reading up on set reconciliation algorithms, and thinking about
> constant-bandwidth propagation of new block announcements.
>
> Emin:  the approach in this paper:
> What's the Difference? Efficient Set Reconciliation without Prior Context
>  http://conferences.sigcomm.org/sigcomm/2011/papers/sigcomm/p218.pdf
>
> ... looks like it would be much better suited for Bitcoin's use case,
> because
>
> a) it looks much easier to implement (no polynomial math)
> b) CPU/latency versus bandwidth tradeoff looks better (somewhat higher
> bandwidth than Yaron's method, but much lower CPU/latency cost)
>
> Kaz: how much time do you have to work on this?  Perhaps we can get a
> road-map for a prototype and split up work. I actually think the first step
> might be to gather a couple days worth of tx / block message broadcasts from
> a few network nodes and then create a test/benchmark harness that will tell
> us how much overlap there is in what nodes know and how much faster our
> newer algorithms are.
>
> --
> --
> Gavin Andresen

On Thu, Jul 31, 2014 at 12:10 PM, Emin Gün Sirer <el33th4x0r@gmail•com> wrote:
> Hi Gavin,
>
> Great find. I read (and sadly forgot) this paper back in 2011 and indeed,
> I'd also pick this approach over Yaron's. My reasons:
>
> a. this paper provides a more concrete implementation roadmap that has been
> explored thoroughly. While I understand Yaron's technique, there are still
> various design decisions (e.g. estimating the set difference involves a
> doubling process; representing TX ids seems to take up a lot of space) that
> are left unspec'ed for an implementation. (In general, Yaron is a
> theoretician at heart and Varghese is a very practical guy, there will be
> far fewer unforeseen issues with a Varghese tried and tested algorithm).
>
> b. on the surface, this paper seems to use a bit more bandwidth in that it
> requires O(size of set diff * log of keyspace) vs. Yaron's claim of O(size
> of set diff). But I believe that in practice Yaron's method would also have
> an O(log of keyspace) multiplier in place because the roots of his
> polynomial (the TX ids) have to be represented as numbers, and that requires
> log-of-keyspace bits. I suspect that Yaron is cleverly folding that factor
> into the constant factor of O notation, and Varghese et al are being polite
> by not pointing it out too overtly. So I suspect that the two schemes are
> actually identical in terms of space complexity.
>
> c. this technique is far more efficient in decoding than Yaron's, which
> requires Gaussian elimination, which in turn is O(d^3).
>
> If Bitcoin adopts this technique, it'll be adopting one of the best known
> techniques from the research community.
>
> BTW, don't hesitate to ping me with researchy issues in the future; I'll
> gladly point the effort in the right direction if I can.
>
> - egs
>
>
>
> On Thu, Jul 31, 2014 at 6:31 PM, Gavin Andresen <gavinandresen@gmail•com>
> wrote:
>>
>> I've been reading up on set reconciliation algorithms, and thinking about
>> constant-bandwidth propagation of new block announcements.
>>
>> Emin:  the approach in this paper:
>> What's the Difference? Efficient Set Reconciliation without Prior Context
>>  http://conferences.sigcomm.org/sigcomm/2011/papers/sigcomm/p218.pdf
>>
>> ... looks like it would be much better suited for Bitcoin's use case,
>> because
>>
>> a) it looks much easier to implement (no polynomial math)
>> b) CPU/latency versus bandwidth tradeoff looks better (somewhat higher
>> bandwidth than Yaron's method, but much lower CPU/latency cost)
>>
>> Kaz: how much time do you have to work on this?  Perhaps we can get a
>> road-map for a prototype and split up work. I actually think the first step
>> might be to gather a couple days worth of tx / block message broadcasts from
>> a few network nodes and then create a test/benchmark harness that will tell
>> us how much overlap there is in what nodes know and how much faster our
>> newer algorithms are.
>>
>> --
>> --
>> Gavin Andresen
>
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Bitcoin-development] Squashing redundant tx data in blocks on the wire
  2014-07-31 20:47     ` Kaz Wesley
@ 2014-07-31 21:29       ` Gregory Maxwell
  2014-07-31 21:41         ` Kaz Wesley
  0 siblings, 1 reply; 24+ messages in thread
From: Gregory Maxwell @ 2014-07-31 21:29 UTC (permalink / raw)
  To: Kaz Wesley; +Cc: Bitcoin Dev

On Thu, Jul 31, 2014 at 1:47 PM, Kaz Wesley <keziahw@gmail•com> wrote:
> trip to request the missing tx; if we could somehow get the "What's
> the Difference" approach to effectively operate on full transactions
> instead

I explain how to do this on the network block coding page.

Given that you know the sizes and orders of the transactions (e.g.
from a reconciliation step first), the sender sends non-syndromic
forward error correcting code data somewhat larger than their estimate
of how much data the user is missing.  Then you drop the data you know
into place and then recover the missing blocks using the fec.

There is no overhead in this approach except for FEC blocks that are
incompletely missing (and so must be completely discarded), and the
need to have the transmitted the transaction list and sizes first.
(note, that just more bandwidth, not an additional round trip).

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Bitcoin-development] Squashing redundant tx data in blocks on the wire
  2014-07-31 21:29       ` Gregory Maxwell
@ 2014-07-31 21:41         ` Kaz Wesley
  2014-07-31 21:51           ` Gregory Maxwell
  0 siblings, 1 reply; 24+ messages in thread
From: Kaz Wesley @ 2014-07-31 21:41 UTC (permalink / raw)
  To: Gregory Maxwell; +Cc: Bitcoin Dev

> the need to have transmitted the transaction list [..] first

32 bits per transaction is at least double the communication overhead
of the simple approach, and only offers a bound on the probability of
needing a round trip.

On Thu, Jul 31, 2014 at 2:29 PM, Gregory Maxwell <gmaxwell@gmail•com> wrote:
> On Thu, Jul 31, 2014 at 1:47 PM, Kaz Wesley <keziahw@gmail•com> wrote:
>> trip to request the missing tx; if we could somehow get the "What's
>> the Difference" approach to effectively operate on full transactions
>> instead
>
> I explain how to do this on the network block coding page.
>
> Given that you know the sizes and orders of the transactions (e.g.
> from a reconciliation step first), the sender sends non-syndromic
> forward error correcting code data somewhat larger than their estimate
> of how much data the user is missing.  Then you drop the data you know
> into place and then recover the missing blocks using the fec.
>
> There is no overhead in this approach except for FEC blocks that are
> incompletely missing (and so must be completely discarded), and the
> need to have the transmitted the transaction list and sizes first.
> (note, that just more bandwidth, not an additional round trip).



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Bitcoin-development] Squashing redundant tx data in blocks on the wire
  2014-07-31 21:41         ` Kaz Wesley
@ 2014-07-31 21:51           ` Gregory Maxwell
  2014-07-31 22:27             ` Kaz Wesley
  0 siblings, 1 reply; 24+ messages in thread
From: Gregory Maxwell @ 2014-07-31 21:51 UTC (permalink / raw)
  To: Kaz Wesley; +Cc: Bitcoin Dev

On Thu, Jul 31, 2014 at 2:41 PM, Kaz Wesley <keziahw@gmail•com> wrote:
>> the need to have transmitted the transaction list [..] first
>
> 32 bits per transaction is at least double the communication overhead
> of the simple approach, and only offers a bound on the probability of
> needing a round trip.

"(e.g. from a reconciliation step first)" the list can be communicated
in the space roughly equal to the size of the difference in sets plus
coding the permutation from the permissible orderings.   If you did
have some "simple approach" that guaranteed that some transactions
would be present, then you could code those with indexes... the FEC
still lets you fill in the missing transactions without knowing in
advance all that will be missing.   (Also, the suggestion on the
network block coding page of using part of a cryptographic permutation
as the key means that for unknown transactions the transmission of the
new unknown keys is always goodput— doesn't add overhead)

It's "only a bound" but you can pick whatever bound you want,
including— if you send data equal to the missing amount, then it'll be
always successful, but no bandwidth savings.   Though if the transport
is unordered (e.g. UDP or non-blocking SCTP) even sending 100% of the
missing amount can save time by eliminating a round trip that might
otherwise be needed for a retransmission.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Bitcoin-development] Squashing redundant tx data in blocks on the wire
  2014-07-31 21:51           ` Gregory Maxwell
@ 2014-07-31 22:27             ` Kaz Wesley
  2014-07-31 23:18               ` Gregory Maxwell
  0 siblings, 1 reply; 24+ messages in thread
From: Kaz Wesley @ 2014-07-31 22:27 UTC (permalink / raw)
  To: Gregory Maxwell; +Cc: Bitcoin Dev

> the FEC still lets you fill in the missing transactions without knowing in advance all that will be missing.

I don't see why we need to solve that problem, since the protocol
already involves exchanging the information necessary to determine
(with some false positives) what a peer is missing, and needs to
continue doing so regardless of how blocks are transmitted.

Set reconciliation does have the benefit of eliminating a subset of
those false positives and offering a finer-grained mechanism for
defining what a node can choose to forget from its mempool than
remember-last-N, but if we implement it for block transmission I don't
see why we wouldn't also use it to synchronize mempool txes, and if
mempools are synchronized we don't actually need to do it as part of
block-transmission to get those benefits.

As far as I can tell, channel memory sparseblocks achieve most of the
possible bandwidth savings, and when FEC-based mempool synchronization
is implemented its benefits can be applied to the sparseblocks by
resetting the channel memory to the mutual mempool state each time
mempool differences are exchanged. Am I missing a benefit to doing FEC
at block forwarding time that can't be realized by FEC-based mempool
synchronization, implemented separately from channel-memory based
index-coding?

On Thu, Jul 31, 2014 at 2:51 PM, Gregory Maxwell <gmaxwell@gmail•com> wrote:
> On Thu, Jul 31, 2014 at 2:41 PM, Kaz Wesley <keziahw@gmail•com> wrote:
>>> the need to have transmitted the transaction list [..] first
>>
>> 32 bits per transaction is at least double the communication overhead
>> of the simple approach, and only offers a bound on the probability of
>> needing a round trip.
>
> "(e.g. from a reconciliation step first)" the list can be communicated
> in the space roughly equal to the size of the difference in sets plus
> coding the permutation from the permissible orderings.   If you did
> have some "simple approach" that guaranteed that some transactions
> would be present, then you could code those with indexes... the FEC
> still lets you fill in the missing transactions without knowing in
> advance all that will be missing.   (Also, the suggestion on the
> network block coding page of using part of a cryptographic permutation
> as the key means that for unknown transactions the transmission of the
> new unknown keys is always goodput— doesn't add overhead)
>
> It's "only a bound" but you can pick whatever bound you want,
> including— if you send data equal to the missing amount, then it'll be
> always successful, but no bandwidth savings.   Though if the transport
> is unordered (e.g. UDP or non-blocking SCTP) even sending 100% of the
> missing amount can save time by eliminating a round trip that might
> otherwise be needed for a retransmission.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Bitcoin-development] Squashing redundant tx data in blocks on the wire
  2014-07-31 22:27             ` Kaz Wesley
@ 2014-07-31 23:18               ` Gregory Maxwell
  2014-08-01  1:00                 ` Kaz Wesley
  0 siblings, 1 reply; 24+ messages in thread
From: Gregory Maxwell @ 2014-07-31 23:18 UTC (permalink / raw)
  To: Kaz Wesley; +Cc: Bitcoin Dev

On Thu, Jul 31, 2014 at 3:27 PM, Kaz Wesley <keziahw@gmail•com> wrote:
>> the FEC still lets you fill in the missing transactions without knowing in advance all that will be missing.
>
> I don't see why we need to solve that problem, since the protocol
> already involves exchanging the information necessary to determine
> (with some false positives) what a peer is missing, and needs to
> continue doing so regardless of how blocks are transmitted.

False positives, and if you have more than one peer— false negatives.
(or a rule for what you must keep which is conservative in order to
avoid creating huge storage requirements— but then also has false
negatives).


> As far as I can tell, channel memory sparseblocks achieve most of the
> possible bandwidth savings, and when FEC-based mempool synchronization
> is implemented its benefits can be applied to the sparseblocks by
> resetting the channel memory to the mutual mempool state each time
> mempool differences are exchanged. Am I missing a benefit to doing FEC
> at block forwarding time that can't be realized by FEC-based mempool
> synchronization, implemented separately from channel-memory based
> index-coding?

Yes, minimizing latency in the face of multiple peers.

Otherwise no. And certantly no reason to to implement something simple first.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Bitcoin-development] Squashing redundant tx data in blocks on the wire
  2014-07-31 23:18               ` Gregory Maxwell
@ 2014-08-01  1:00                 ` Kaz Wesley
  0 siblings, 0 replies; 24+ messages in thread
From: Kaz Wesley @ 2014-08-01  1:00 UTC (permalink / raw)
  To: Gregory Maxwell; +Cc: Bitcoin Dev

On Thu, Jul 31, 2014 at 4:18 PM, Gregory Maxwell <gmaxwell@gmail•com> wrote:
> On Thu, Jul 31, 2014 at 3:27 PM, Kaz Wesley <keziahw@gmail•com> wrote:
>>> the FEC still lets you fill in the missing transactions without knowing in advance all that will be missing.
>>
>> I don't see why we need to solve that problem, since the protocol
>> already involves exchanging the information necessary to determine
>> (with some false positives) what a peer is missing, and needs to
>> continue doing so regardless of how blocks are transmitted.
>
> False positives, and if you have more than one peer— false negatives.
> (or a rule for what you must keep which is conservative in order to
> avoid creating huge storage requirements— but then also has false
> negatives).

I think a rule for what to keep (along with the false-positive rate
associated with the rule's conservativeness) is preferable to false
negatives, since round-trip cost is potentially very high. The
prototype I'm working on will be able to provide data on what the
false-positive-missing-tx rate would look like with something like
remember-last-N.

There are various ways that rule could be upgraded to nearly eliminate
the false-positive-missing rate, including learning what txes a peer
has dropped via periodic mempool syncing, or specifying the rule
explicitly with priority scripts, both of which have other benefits of
their own. All of these changes synergize, but as long as the
simplistic remembering rule yields worthwhile improvement over the
current approach they can all be done independently as incremental
improvements.

I also really like the idea of the referring to transactions by ids
that can themselves provide part of the tx data; I think that could
also be added separately, on top of these other changes. (Gregory,
your various wiki pages are basically my to-do list of things I'd like
to work on.)

The idea of mempool synchronization brings up the issue of transaction
expiration, since lack of mempool syncing is currently the mechanism
for tx expiry. I'm starting a discussion about how to address that in
a separate thread; see "deterministic transaction expiration".

>> As far as I can tell, channel memory sparseblocks achieve most of the
>> possible bandwidth savings, and when FEC-based mempool synchronization
>> is implemented its benefits can be applied to the sparseblocks by
>> resetting the channel memory to the mutual mempool state each time
>> mempool differences are exchanged. Am I missing a benefit to doing FEC
>> at block forwarding time that can't be realized by FEC-based mempool
>> synchronization, implemented separately from channel-memory based
>> index-coding?
>
> Yes, minimizing latency in the face of multiple peers.
>
> Otherwise no. And certantly no reason to to implement something simple first.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2014-08-01  1:00 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-17 21:35 [Bitcoin-development] Squashing redundant tx data in blocks on the wire Kaz Wesley
2014-07-17 22:46 ` Gavin Andresen
2014-07-17 23:26   ` Kaz Wesley
2014-07-18 13:53   ` Jeff Garzik
2014-07-18 14:53     ` Gavin Andresen
2014-07-18 15:06       ` Jeff Garzik
2014-07-18 17:39         ` Kaz Wesley
2014-07-18 17:48           ` Jeff Garzik
2014-07-18 17:53             ` Kaz Wesley
2014-07-18 19:51               ` Kaz Wesley
2014-07-18 19:55                 ` Jeff Garzik
2014-07-19  0:54                   ` Emin Gün Sirer
2014-07-19  1:25                     ` Gregory Maxwell
2014-07-19  3:06                       ` Emin Gün Sirer
2014-07-19  6:48                         ` Gregory Maxwell
2014-07-19  8:06       ` Wladimir
2014-07-17 23:34 ` Gregory Maxwell
     [not found] ` <CABsx9T2PSa3MpfMMDCb8ACVF5vDOZOFLEK9zfP9PakgHA4U16w@mail.gmail.com>
     [not found]   ` <CAPkFh0vKFnKRE-sd-Z9t1zB73VLPsiaQ3o=OYgBqqtUE4_rTaw@mail.gmail.com>
2014-07-31 20:47     ` Kaz Wesley
2014-07-31 21:29       ` Gregory Maxwell
2014-07-31 21:41         ` Kaz Wesley
2014-07-31 21:51           ` Gregory Maxwell
2014-07-31 22:27             ` Kaz Wesley
2014-07-31 23:18               ` Gregory Maxwell
2014-08-01  1:00                 ` Kaz Wesley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox