On Tue, Jul 16, 2013 at 04:16:23PM +0200, Wendell wrote:
> Hello everyone,
> 
> In the previous thread, I expressed interest in seeing an SPV bitcoind, further stating that I would fund such work. Mike Hearn followed up with some of Satoshi's old code for this, which is now quite broken. The offer and interest on my side still stand, as more diversity in SPV options seems like the right way to go.
> 
> Time-permitting, I would really appreciate feedback from knowledgable parties about the possible approaches to an SPV bitcoind. We at Hive ideally want to see something that could one be merge into master, rather than a fork.

Keep in mind that SPV mode is newer than many realize: bloom filters are
a 0.8 feature, itself released only last Febuary. As John Dillon posted
earlier this week in "Protecting Bitcoin against network-wide DoS
attack" the Bitcoin codebase will have to implement much better anti-DoS
attack defences soon, and in a decentralized system there aren't any
options other than requiring peers to either do work (useful or not) or
sacrifice something of value. SPV peers can't do useful work, leaving
only sacrifice - to what extent and how much is unknown. In addition SPV
nodes have serious privacy issues because their peers know that any
transaction sent to them by the SPV node is guaranteed to be from the
node rather than relayed; bloom filters are only really helpful with
payment protocols that don't exist yet and don't apply to merchants.
Then you have MITM problems, vulnerability to fake blocks etc.

It'll be awhile before we know how serious these issues are in practice,
and we're likely to find new issues we didn't think of too. In any case
Bitcoin is far better off if we make it easy to run a full node,
donating whatever resources you can. Fortunately there's a whole
continuum between SPV and full nodes.

The way you do this is by maintaining partial UTXO sets. The trick is
that if you have verified every block in some range i to j, every time
you see a txout created by a transaction, and not subsequently spent,
you can be sure that at height j the txout existed. If height j is the
current block, you can be sure the txout exists provided that the chain
itself is valid. Any transaction that only spends txouts in this partial
set is a transaction you can fully verify and safely relay; for other
transactions you just don't know and have to wait until you see them in
a block.

So what's useful about that? Basically it means your node starts with
the same security level, and usefulness to the network, as a SPV node.
But over time you keep downloading blocks as they are created, and with
whatever bandwidth you have left (out of some user-configurable
allocation) you download additional blocks going further and further
back in time. Gradually your UTXO set becomes more complete, and over
time you can verify a higher and higher % of all valid transactions.
Eventually your node becomes a full node, but in the meantime it was
still useful for the user, and still contributed to the network by
relaying blocks and an increasingly large subset of all transactions.
(optionally you can store a subset of the chain history too for other
nodes to bootstrap from) You've also got better security because you
*are* validating blocks, starting off incompletely, and increasingly
completely until your finally validating fully. Privacy is improved, for
both you and others, by mixing your transactions with others and adding
to the overall anonymity set.

In the future we'll have miners commit a hash of the UTXO set, and that
gives us even more options to, for instance, have relayed transactions
include proof that their inputs were valid, allowing all nodes to relay
them safely.


As for specifics, you need to maintain a UTXO set, and in addition a set
of spent txouts (the STXO set) for which you haven't seen the
transaction that created the txout. As download newer blocks you update
the UTXO set; as you download older blocks you update the UTXO set and
STXO set.

Nodes now advertise this new variable to their peers:

nOldestBlock - The oldest block that we've validated. (and all
subsequent blocks)

We'll also want the ability to advertise what sub-ranges of the
blockchain data we have on hand:

listArchivedBlockRanges - lists of (begin, end pairs)

Nodes should drop all but the largest n pairs, say 5 or something. The
index -1 is reserved to indicate the last block to make it easy to
advertise that you have every block starting at some height to the most
recent. (reserving -n with n as the last block might be a better choice
to show intent, but still allow for specific proofs when we get node
identities)

We probably want to define a NODE_PARTIAL service bit or something; I'll
have to re-read Pieter Wuille's proposal and think about it. Nodes
should NOT advertize NODE_NETWORK unless they have the full chain and
have verified it.

Nodes with partial peers should only relay transactions to those peers
if the transactions spend inputs the peers know about - remember how
even an SPV node has that information if it's not spending unconfirmed
inputs it didn't create. Nodes will have to update their peers
periodically as nOldestBlock changes. That said it may also be
worthwhile to simply relay all transactions in some cases too - a
reasonable way to approach this might be to set a bloom filter for tx's
that you *definitely* want, and if you are interested in everything,
just set the filter to all 1's. If someone comes up with a reasonable
micropayment or proof-of-work system even relaying txs that you haven't
validated is fine - the proof-of-work and prioritization will prevent
DoS attacks just fine.

Remember that if you're running a partial node, it can get new blocks
from any partial node, and it can retrieve historic blockchain data from
any partial node that has archived the sequence of blocks you need next.
On a large scale this is similar to how in BitTorrent you can serve data
to your peers the moment you get it - a significant scalability
improvement for the network as a whole. Even if a large % of the network
was partial nodes running for just a few hours a day the whole system
would work fine due to how partial nodes can serve each other the data
they need.

On startup you can act as a SPV node temporarily, grabbing asking for
filtered blocks matching your wallet, and then go back and get the full
blocks, or just download the full blocks right away. That's a tradeoff
on how long the node has been off.

Anyway, it's a bit more code compared to pure-SPV, but it results in a
much more scalable Bitcoin, and if you can spare the modest bandwidth
requirements to keep up with the blockchain it'll result in much better
robustness against DoS attacks for you and Bitcoin in general.

-- 
'peter'[:-1]@petertodd.org