Re: [Bitcoin-development] On-going data spam

public inbox for bitcoindev@googlegroups.com
 help / color / mirror / Atom feed

From: Mike Hearn <mike@plan99•net>
To: Jeff Garzik <jgarzik@exmulti•com>
Cc: Bitcoin Development <bitcoin-development@lists•sourceforge.net>
Subject: Re: [Bitcoin-development] On-going data spam
Date: Tue, 9 Apr 2013 12:42:12 +0200	[thread overview]
Message-ID: <CANEZrP1EKaHbpdC6X=9mvyJHC_cvW7u5p9nqM7EwkEypAg4Xmg@mail.gmail.com> (raw)
In-Reply-To: <CA+8xBpc5iV=prakWKkNFa0O+tgyhoHxJ9Xwz6ubhPRUBf_95KA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3882 bytes --]

OK, as the start of that conversation is now on the list, I might as well
post the other thoughts we had. Or at least that I had :)

It's tempting to see this kind of abuse through the lens of fees, because
we only have a few hammers and so everything looks like a kind of nail. The
problem is the moment you try to define "abuse" economically you end up
excluding legitimate and beneficial uses as well. Maybe Peters patch for
uneconomical outputs is different because of how it works. But mostly it's
true. In this case, fees would never work - Peter said the guy who uploaded
Wikileaks paid something like $500 to do it. I guess by now it's more like
$600-$700. It's hard for regular end users to compete with that kind of
wild-eyed dedication to "the cause".

The root problem here is people believe the block chain is a data structure
that will live forever and be served by everyone for free, in perpetuity,
and is thus the perfect place for "uncensorable" stuff. That's a reasonable
assumption given how Bitcoin works today. But there's no reason it will be
true in the long run (I know this can be an unpopular viewpoint).

Firstly, legal issues - I think it's very unlikely any sane court would
care about illegal stuff in the block chain given you need special tools to
extract it (mens rea). Besides, I guess most end users will end up on SPV
clients as they mature. So these users already don't have a copy of the
entire block chain. I don't worry too much about this.

Secondly, the need to host blocks forever. In future, many (most?) full
nodes will be pruning, and won't actually store old blocks at all. They'll
just have the utxo database, some undo blocks and some number of old blocks
for serving, probably whatever fits in the amount of disk space the user is
willing to allocate. But very old blocks will have been deleted.

This leads to the question of what incentives people have to not prune. The
obvious incentive is money - charge for access to older parts of the chain.
The fewer people that host it, the more you can charge. In the worst case
scenario where, you know, only 10 different organizations store a copy of
the chain, it might mean that bootstrapping a new node in a trust-less
manner is expensive. But I really doubt it'd ever get so few. Serving large
static datasets just isn't that expensive. Also, you don't actually need to
replay from the genesis block to bring up a new code, you can copy the UTXO
database from somewhere else. By comparing the databases of lots of
different nodes together, the chances of you being in a matrix-like sybil
world can be reduced to "beyond reasonable doubt". Maybe nodes would charge
for copies of their database too, but ideally there are lots of nodes and
so the charge for that should be so close to zero as makes no odds - you
can trivially undercut someone by buying access to the dataset and then
reselling it for a bit less, so the price should converge on the actual
cost of providing the service. Which will be very cheap.

There was one last thought I had, which is that if there's a shorter team
need to discourage this kind of thing we can use a network/bandwith related
hack by changing the protocol. Nodes can serve up blocks encrypted under a
random key. You only get the key when you finish the download. A blacklist
can apply to Bloom filtering such that transactions which are known to be
"abusive" require you to fully download the block rather than select the
transactions with a filter. This means that people can still access the
data in the chain, but the older it gets the slower and more bandwidth
intensive it becomes. Stuffing Wikileaks into the chain sounds good when a
20 line Python script can extract it "instantly". If someone who wants the
files has to download gigabytes of padding around it first, suddenly
hosting it on a Tor hidden service becomes more attractive.

[-- Attachment #2: Type: text/html, Size: 4290 bytes --]

next prev parent reply	other threads:[~2013-04-09 10:42 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-09  1:22 Jeff Garzik
2013-04-09  9:28 ` Peter Todd
2013-04-09 10:42 ` Mike Hearn [this message]
2013-04-09 11:09   ` Peter Todd
2013-04-09 11:17     ` Jay F
2013-04-09 11:34       ` Robert Backhaus
2013-04-09 14:14     ` Mike Hearn
2013-04-09 14:39   ` Caleb James DeLisle
2013-04-09 18:56     ` steve
2013-04-09 19:25     ` Gregory Maxwell
2013-04-09 19:43       ` Mike Hearn
2013-04-09 14:50   ` Jeff Garzik
2013-04-09 14:53     ` Mike Hearn
2013-04-09 15:01       ` Jeff Garzik
2013-04-09 17:58       ` Peter Todd

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CANEZrP1EKaHbpdC6X=9mvyJHC_cvW7u5p9nqM7EwkEypAg4Xmg@mail.gmail.com' \
    --to=mike@plan99$(echo .)net \
    --cc=bitcoin-development@lists$(echo .)sourceforge.net \
    --cc=jgarzik@exmulti$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox