public inbox for bitcoindev@googlegroups.com
 help / color / mirror / Atom feed
* [Bitcoin-development] On-going data spam
@ 2013-04-09  1:22 Jeff Garzik
  2013-04-09  9:28 ` Peter Todd
  2013-04-09 10:42 ` Mike Hearn
  0 siblings, 2 replies; 15+ messages in thread
From: Jeff Garzik @ 2013-04-09  1:22 UTC (permalink / raw)
  To: Bitcoin Development

http://www.reddit.com/r/Bitcoin/comments/1bw9xg/data_in_the_blockchain_wikileaks/

<TD> petertodd: yeah somebody put a file upload tool into the chain
and then tried to upload the entire amibios source code to it. stupid.
<TD> someone thinks it's a lot more important than it really is
<petertodd> TD: and 2.5MB of wikileaks data, and a whole bunch of GPG
encrypted stuff, and the hidden wiki cp/jb sections (no idea if it's
all the same person)
<petertodd> jgarzik:
https://blockchain.info/address/3Dw3UB6VZ3a3ay5diDQVwUFXzKScJJLeVU
iirc this is gpg symmetric key encrypted
<petertodd> jgarzik: (I wrote a tool to download the tool to download data)
<petertodd> MC1984_: just checked, surprisingly no-one has put
*anything* into the litecoin chain at all, strings returns nothing

-- 
Jeff Garzik
exMULTI, Inc.
jgarzik@exmulti•com



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Bitcoin-development] On-going data spam
  2013-04-09  1:22 [Bitcoin-development] On-going data spam Jeff Garzik
@ 2013-04-09  9:28 ` Peter Todd
  2013-04-09 10:42 ` Mike Hearn
  1 sibling, 0 replies; 15+ messages in thread
From: Peter Todd @ 2013-04-09  9:28 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Bitcoin Development

[-- Attachment #1: Type: text/plain, Size: 1610 bytes --]

On Mon, Apr 08, 2013 at 09:22:10PM -0400, Jeff Garzik wrote:
> http://www.reddit.com/r/Bitcoin/comments/1bw9xg/data_in_the_blockchain_wikileaks/
> 
> <TD> petertodd: yeah somebody put a file upload tool into the chain
> and then tried to upload the entire amibios source code to it. stupid.
> <TD> someone thinks it's a lot more important than it really is
> <petertodd> TD: and 2.5MB of wikileaks data, and a whole bunch of GPG
> encrypted stuff, and the hidden wiki cp/jb sections (no idea if it's
> all the same person)
> <petertodd> jgarzik:
> https://blockchain.info/address/3Dw3UB6VZ3a3ay5diDQVwUFXzKScJJLeVU
> iirc this is gpg symmetric key encrypted
> <petertodd> jgarzik: (I wrote a tool to download the tool to download data)
> <petertodd> MC1984_: just checked, surprisingly no-one has put
> *anything* into the litecoin chain at all, strings returns nothing

It must be "shit on the blockchain" week:

http://vog.github.io/bitcoinproof/

Timestamping the stupid way, but the user experience is really nice:

> Encoding your crypto hash into those two fields is a tricky task, so
> people are tempted to make it more complicated than it has to be(1), or
> outright cumbersome. Luckily(2), there is a simple solution that needs
> only one transaction to one address.
> 1) https://github.com/fireduck64/BitcoinTimestamp
> 2) https://github.com/goblin/chronobit

Like it or not, people will do what's easiest regardless of how much it
harms everyone. I'd send this guy an email about opentimestamps yadda
yada, but really, why bother.

-- 
'peter'[:-1]@petertodd.org

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Bitcoin-development] On-going data spam
  2013-04-09  1:22 [Bitcoin-development] On-going data spam Jeff Garzik
  2013-04-09  9:28 ` Peter Todd
@ 2013-04-09 10:42 ` Mike Hearn
  2013-04-09 11:09   ` Peter Todd
                     ` (2 more replies)
  1 sibling, 3 replies; 15+ messages in thread
From: Mike Hearn @ 2013-04-09 10:42 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Bitcoin Development

[-- Attachment #1: Type: text/plain, Size: 3882 bytes --]

OK, as the start of that conversation is now on the list, I might as well
post the other thoughts we had. Or at least that I had :)

It's tempting to see this kind of abuse through the lens of fees, because
we only have a few hammers and so everything looks like a kind of nail. The
problem is the moment you try to define "abuse" economically you end up
excluding legitimate and beneficial uses as well. Maybe Peters patch for
uneconomical outputs is different because of how it works. But mostly it's
true. In this case, fees would never work - Peter said the guy who uploaded
Wikileaks paid something like $500 to do it. I guess by now it's more like
$600-$700. It's hard for regular end users to compete with that kind of
wild-eyed dedication to "the cause".

The root problem here is people believe the block chain is a data structure
that will live forever and be served by everyone for free, in perpetuity,
and is thus the perfect place for "uncensorable" stuff. That's a reasonable
assumption given how Bitcoin works today. But there's no reason it will be
true in the long run (I know this can be an unpopular viewpoint).

Firstly, legal issues - I think it's very unlikely any sane court would
care about illegal stuff in the block chain given you need special tools to
extract it (mens rea). Besides, I guess most end users will end up on SPV
clients as they mature. So these users already don't have a copy of the
entire block chain. I don't worry too much about this.

Secondly, the need to host blocks forever. In future, many (most?) full
nodes will be pruning, and won't actually store old blocks at all. They'll
just have the utxo database, some undo blocks and some number of old blocks
for serving, probably whatever fits in the amount of disk space the user is
willing to allocate. But very old blocks will have been deleted.

This leads to the question of what incentives people have to not prune. The
obvious incentive is money - charge for access to older parts of the chain.
The fewer people that host it, the more you can charge. In the worst case
scenario where, you know, only 10 different organizations store a copy of
the chain, it might mean that bootstrapping a new node in a trust-less
manner is expensive. But I really doubt it'd ever get so few. Serving large
static datasets just isn't that expensive. Also, you don't actually need to
replay from the genesis block to bring up a new code, you can copy the UTXO
database from somewhere else. By comparing the databases of lots of
different nodes together, the chances of you being in a matrix-like sybil
world can be reduced to "beyond reasonable doubt". Maybe nodes would charge
for copies of their database too, but ideally there are lots of nodes and
so the charge for that should be so close to zero as makes no odds - you
can trivially undercut someone by buying access to the dataset and then
reselling it for a bit less, so the price should converge on the actual
cost of providing the service. Which will be very cheap.

There was one last thought I had, which is that if there's a shorter team
need to discourage this kind of thing we can use a network/bandwith related
hack by changing the protocol. Nodes can serve up blocks encrypted under a
random key. You only get the key when you finish the download. A blacklist
can apply to Bloom filtering such that transactions which are known to be
"abusive" require you to fully download the block rather than select the
transactions with a filter. This means that people can still access the
data in the chain, but the older it gets the slower and more bandwidth
intensive it becomes. Stuffing Wikileaks into the chain sounds good when a
20 line Python script can extract it "instantly". If someone who wants the
files has to download gigabytes of padding around it first, suddenly
hosting it on a Tor hidden service becomes more attractive.

[-- Attachment #2: Type: text/html, Size: 4290 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Bitcoin-development] On-going data spam
  2013-04-09 10:42 ` Mike Hearn
@ 2013-04-09 11:09   ` Peter Todd
  2013-04-09 11:17     ` Jay F
  2013-04-09 14:14     ` Mike Hearn
  2013-04-09 14:39   ` Caleb James DeLisle
  2013-04-09 14:50   ` Jeff Garzik
  2 siblings, 2 replies; 15+ messages in thread
From: Peter Todd @ 2013-04-09 11:09 UTC (permalink / raw)
  To: Mike Hearn; +Cc: Bitcoin Development

[-- Attachment #1: Type: text/plain, Size: 653 bytes --]

On Tue, Apr 09, 2013 at 12:42:12PM +0200, Mike Hearn wrote:
> hack by changing the protocol. Nodes can serve up blocks encrypted under a
> random key. You only get the key when you finish the download. A blacklist

NAK

Makes bringing up a new node dependent on other nodes having consistent
uptimes, particularly if you are on a low-bandwidth connection.

> can apply to Bloom filtering such that transactions which are known to be
> "abusive" require you to fully download the block rather than select the
> transactions with a filter. This means that people can still access the

NAK

No blacklists

-- 
'peter'[:-1]@petertodd.org

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Bitcoin-development] On-going data spam
  2013-04-09 11:09   ` Peter Todd
@ 2013-04-09 11:17     ` Jay F
  2013-04-09 11:34       ` Robert Backhaus
  2013-04-09 14:14     ` Mike Hearn
  1 sibling, 1 reply; 15+ messages in thread
From: Jay F @ 2013-04-09 11:17 UTC (permalink / raw)
  To: Peter Todd; +Cc: Bitcoin Development

On 4/9/2013 4:09 AM, Peter Todd wrote:
> On Tue, Apr 09, 2013 at 12:42:12PM +0200, Mike Hearn wrote:
>> hack by changing the protocol. Nodes can serve up blocks encrypted under a
>> random key. You only get the key when you finish the download. A blacklist
> NAK
>
> Makes bringing up a new node dependent on other nodes having consistent
> uptimes, particularly if you are on a low-bandwidth connection.
>
>> can apply to Bloom filtering such that transactions which are known to be
>> "abusive" require you to fully download the block rather than select the
>> transactions with a filter. This means that people can still access the
> NAK
>
> No blacklists
>
It depends on how clever the spammers get encoding stuff. If law 
enforcement forensic tools can pull a jpeg header + child porn out of 
the blockchain, then there's a problem that needs mitigation.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Bitcoin-development] On-going data spam
  2013-04-09 11:17     ` Jay F
@ 2013-04-09 11:34       ` Robert Backhaus
  0 siblings, 0 replies; 15+ messages in thread
From: Robert Backhaus @ 2013-04-09 11:34 UTC (permalink / raw)
  To: Bitcoin Development

[-- Attachment #1: Type: text/plain, Size: 1928 bytes --]

The obvious problem is that if you can frame it as a valid address, you can
put what you want there. If you can make it pass the validation, miners
have no way of knowing it's not a valid address.

Of course, there is nothing new about this. I ran strings on the blockchain
and found all sorts of ascii rubbish right from the beginning.


On 9 April 2013 21:17, Jay F <jayf@outlook•com> wrote:

> On 4/9/2013 4:09 AM, Peter Todd wrote:
> > On Tue, Apr 09, 2013 at 12:42:12PM +0200, Mike Hearn wrote:
> >> hack by changing the protocol. Nodes can serve up blocks encrypted
> under a
> >> random key. You only get the key when you finish the download. A
> blacklist
> > NAK
> >
> > Makes bringing up a new node dependent on other nodes having consistent
> > uptimes, particularly if you are on a low-bandwidth connection.
> >
> >> can apply to Bloom filtering such that transactions which are known to
> be
> >> "abusive" require you to fully download the block rather than select the
> >> transactions with a filter. This means that people can still access the
> > NAK
> >
> > No blacklists
> >
> It depends on how clever the spammers get encoding stuff. If law
> enforcement forensic tools can pull a jpeg header + child porn out of
> the blockchain, then there's a problem that needs mitigation.
>
>
> ------------------------------------------------------------------------------
> Precog is a next-generation analytics platform capable of advanced
> analytics on semi-structured data. The platform includes APIs for building
> apps and a phenomenal toolset for data science. Developers can use
> our toolset for easy data analysis & visualization. Get a free account!
> http://www2.precog.com/precogplatform/slashdotnewsletter
> _______________________________________________
> Bitcoin-development mailing list
> Bitcoin-development@lists•sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bitcoin-development
>

[-- Attachment #2: Type: text/html, Size: 2728 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Bitcoin-development] On-going data spam
  2013-04-09 11:09   ` Peter Todd
  2013-04-09 11:17     ` Jay F
@ 2013-04-09 14:14     ` Mike Hearn
  1 sibling, 0 replies; 15+ messages in thread
From: Mike Hearn @ 2013-04-09 14:14 UTC (permalink / raw)
  To: Peter Todd; +Cc: Bitcoin Development

[-- Attachment #1: Type: text/plain, Size: 455 bytes --]

> Makes bringing up a new node dependent on other nodes having consistent
> uptimes, particularly if you are on a low-bandwidth connection.
>

This is already the case and always has been.


> NAK
>
> No blacklists


If you're volunteering to store and serve the chain no matter what it
contains, indefinitely, then you're free to have a no blacklists policy and
serve up data transactions for no cost. Otherwise, other people will do
whatever they want.

[-- Attachment #2: Type: text/html, Size: 929 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Bitcoin-development] On-going data spam
  2013-04-09 10:42 ` Mike Hearn
  2013-04-09 11:09   ` Peter Todd
@ 2013-04-09 14:39   ` Caleb James DeLisle
  2013-04-09 18:56     ` steve
  2013-04-09 19:25     ` Gregory Maxwell
  2013-04-09 14:50   ` Jeff Garzik
  2 siblings, 2 replies; 15+ messages in thread
From: Caleb James DeLisle @ 2013-04-09 14:39 UTC (permalink / raw)
  To: bitcoin-development

An approach which I see as workable in the long term is to keep the block
header and an array of bitfields representing each transaction's spent
and unspent outputs. When someone wants to spend money you ask them for the
transaction and ideally you ask them for the transaction and the merkle branch
from that transaction to the header. If they want to spend the money they have
to carry around the data.

Agreed on the legality aspect but another case which is worth considering is
what anti-virus software might do when certain streams of bytes are sent across
the tcp socket or persisted to disk. Perhaps worth contacting an AV company and
asking what is the smallest data they have a signature on.

Thanks,
Caleb


On 04/09/2013 06:42 AM, Mike Hearn wrote:
> OK, as the start of that conversation is now on the list, I might as well post the other thoughts we had. Or at least that I had :)
> 
> It's tempting to see this kind of abuse through the lens of fees, because we only have a few hammers and so everything looks like a kind of nail. The problem is the moment you try to define "abuse" economically you end up excluding legitimate and beneficial uses as well. Maybe Peters patch for uneconomical outputs is different because of how it works. But mostly it's true. In this case, fees would never work - Peter said the guy who uploaded Wikileaks paid something like $500 to do it. I guess
> by now it's more like $600-$700. It's hard for regular end users to compete with that kind of wild-eyed dedication to "the cause".
> 
> The root problem here is people believe the block chain is a data structure that will live forever and be served by everyone for free, in perpetuity, and is thus the perfect place for "uncensorable" stuff. That's a reasonable assumption given how Bitcoin works today. But there's no reason it will be true in the long run (I know this can be an unpopular viewpoint).
> 
> Firstly, legal issues - I think it's very unlikely any sane court would care about illegal stuff in the block chain given you need special tools to extract it (mens rea). Besides, I guess most end users will end up on SPV clients as they mature. So these users already don't have a copy of the entire block chain. I don't worry too much about this.
> 
> Secondly, the need to host blocks forever. In future, many (most?) full nodes will be pruning, and won't actually store old blocks at all. They'll just have the utxo database, some undo blocks and some number of old blocks for serving, probably whatever fits in the amount of disk space the user is willing to allocate. But very old blocks will have been deleted. 
> 
> This leads to the question of what incentives people have to not prune. The obvious incentive is money - charge for access to older parts of the chain. The fewer people that host it, the more you can charge. In the worst case scenario where, you know, only 10 different organizations store a copy of the chain, it might mean that bootstrapping a new node in a trust-less manner is expensive. But I really doubt it'd ever get so few. Serving large static datasets just isn't that expensive. Also, you
> don't actually need to replay from the genesis block to bring up a new code, you can copy the UTXO database from somewhere else. By comparing the databases of lots of different nodes together, the chances of you being in a matrix-like sybil world can be reduced to "beyond reasonable doubt". Maybe nodes would charge for copies of their database too, but ideally there are lots of nodes and so the charge for that should be so close to zero as makes no odds - you can trivially undercut someone by
> buying access to the dataset and then reselling it for a bit less, so the price should converge on the actual cost of providing the service. Which will be very cheap.
> 
> There was one last thought I had, which is that if there's a shorter team need to discourage this kind of thing we can use a network/bandwith related hack by changing the protocol. Nodes can serve up blocks encrypted under a random key. You only get the key when you finish the download. A blacklist can apply to Bloom filtering such that transactions which are known to be "abusive" require you to fully download the block rather than select the transactions with a filter. This means that people
> can still access the data in the chain, but the older it gets the slower and more bandwidth intensive it becomes. Stuffing Wikileaks into the chain sounds good when a 20 line Python script can extract it "instantly". If someone who wants the files has to download gigabytes of padding around it first, suddenly hosting it on a Tor hidden service becomes more attractive.
> 
> 
> 
> 
> ------------------------------------------------------------------------------
> Precog is a next-generation analytics platform capable of advanced
> analytics on semi-structured data. The platform includes APIs for building
> apps and a phenomenal toolset for data science. Developers can use
> our toolset for easy data analysis & visualization. Get a free account!
> http://www2.precog.com/precogplatform/slashdotnewsletter
> 
> 
> 
> _______________________________________________
> Bitcoin-development mailing list
> Bitcoin-development@lists•sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bitcoin-development
> 




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Bitcoin-development] On-going data spam
  2013-04-09 10:42 ` Mike Hearn
  2013-04-09 11:09   ` Peter Todd
  2013-04-09 14:39   ` Caleb James DeLisle
@ 2013-04-09 14:50   ` Jeff Garzik
  2013-04-09 14:53     ` Mike Hearn
  2 siblings, 1 reply; 15+ messages in thread
From: Jeff Garzik @ 2013-04-09 14:50 UTC (permalink / raw)
  To: Mike Hearn; +Cc: Bitcoin Development

Well, I'm not fundamentally opposed to a blacklist, but it would have
to be done in a VERY open manner.  I do think the community would
agree that storing big data transactions is not the primary purpose of
bitcoin.

However, there should be some metrics and heuristics that take care of
this problem.  Notably the dev consensus (sans you, Mike :)) seems to
be that uneconomical outputs should be made non-standard.

Here is one approach:
    Block uneconomic UTXO creation
    https://github.com/bitcoin/bitcoin/pull/2351

I would like to see at least a stopgap solution to data spam in 0.8.2,
as it is a clear and present problem.

-- 
Jeff Garzik
exMULTI, Inc.
jgarzik@exmulti•com



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Bitcoin-development] On-going data spam
  2013-04-09 14:50   ` Jeff Garzik
@ 2013-04-09 14:53     ` Mike Hearn
  2013-04-09 15:01       ` Jeff Garzik
  2013-04-09 17:58       ` Peter Todd
  0 siblings, 2 replies; 15+ messages in thread
From: Mike Hearn @ 2013-04-09 14:53 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Bitcoin Development

[-- Attachment #1: Type: text/plain, Size: 546 bytes --]

> However, there should be some metrics and heuristics that take care of
> this problem.  Notably the dev consensus (sans you, Mike :)) seems to
> be that uneconomical outputs should be made non-standard.


I think that patch is ok as it doesn't really have any fixed concept of
what is uneconomical. But I haven't thought about it much. As Gavin says,
there's an obvious backwards compatibility problem there. It should
probably wait until the payment protocol work is done, so the major user of
micropayments-as-messages  can migrate off them.

[-- Attachment #2: Type: text/html, Size: 800 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Bitcoin-development] On-going data spam
  2013-04-09 14:53     ` Mike Hearn
@ 2013-04-09 15:01       ` Jeff Garzik
  2013-04-09 17:58       ` Peter Todd
  1 sibling, 0 replies; 15+ messages in thread
From: Jeff Garzik @ 2013-04-09 15:01 UTC (permalink / raw)
  To: Mike Hearn; +Cc: Bitcoin Development

On Tue, Apr 9, 2013 at 10:53 AM, Mike Hearn <mike@plan99•net> wrote:
>> However, there should be some metrics and heuristics that take care of
>> this problem.  Notably the dev consensus (sans you, Mike :)) seems to
>> be that uneconomical outputs should be made non-standard.

> I think that patch is ok as it doesn't really have any fixed concept of what
> is uneconomical. But I haven't thought about it much. As Gavin says, there's
> an obvious backwards compatibility problem there. It should probably wait
> until the payment protocol work is done, so the major user of
> micropayments-as-messages  can migrate off them.

"wait" is only an option if there is an alternate solution already
coded and ready for 0.8.2.
-- 
Jeff Garzik
exMULTI, Inc.
jgarzik@exmulti•com



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Bitcoin-development] On-going data spam
  2013-04-09 14:53     ` Mike Hearn
  2013-04-09 15:01       ` Jeff Garzik
@ 2013-04-09 17:58       ` Peter Todd
  1 sibling, 0 replies; 15+ messages in thread
From: Peter Todd @ 2013-04-09 17:58 UTC (permalink / raw)
  To: Mike Hearn; +Cc: Bitcoin Development

[-- Attachment #1: Type: text/plain, Size: 873 bytes --]

On Tue, Apr 09, 2013 at 04:53:47PM +0200, Mike Hearn wrote:
> there's an obvious backwards compatibility problem there. It should
> probably wait until the payment protocol work is done, so the major user of
> micropayments-as-messages  can migrate off them.

As I pointed out in my initial post on the issue, SatoshiDice is pretty
much unaffected by the patch. They just have to deduct enough from
incoming bets to make the "you lost" output economical and they're good
to go. IIRC they already deduct fees on low-value bets anyway.

On the other hand, the patch makes a clear statement that Bitcoin is not
for microtransactions. If succesful it will gradually force a number of
users, ad-based faucet sites and the like, to off-chain transactions or
off Bitcoin entirely. The payment protocol has nothing to do with that.

-- 
'peter'[:-1]@petertodd.org

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Bitcoin-development] On-going data spam
  2013-04-09 14:39   ` Caleb James DeLisle
@ 2013-04-09 18:56     ` steve
  2013-04-09 19:25     ` Gregory Maxwell
  1 sibling, 0 replies; 15+ messages in thread
From: steve @ 2013-04-09 18:56 UTC (permalink / raw)
  To: Caleb James DeLisle; +Cc: bitcoin-development

On 09/04/2013 15:39, Caleb James DeLisle wrote:
> Agreed on the legality aspect but another case which is worth considering is
> what anti-virus software might do when certain streams of bytes are sent across
> the tcp socket or persisted to disk.

Do you mean firewalls or something like snort or other deep packet
inspection for the tcp sockets statement? I dont see much of an issue
with either.

set up your own private testnet and have a play with this

http://www.eicar.org/83-0-Anti-Malware-Testfile.html

The eicar test virus.

> Perhaps worth contacting an AV company and
> asking what is the smallest data they have a signature on.

I have tried a few ways of getting the eicar string into the blockchain
(on a private testnet) and getting it flagged by AV, however it is a bit
tricky (the getting it flagged bit). and tbh you would exclude the
bitcoin directory and runtime from antivirus scans so i stopped bothering.

I am making vague assumptions about using windows with antivirus. (and
linux for deep packet inspection, but the idea is the same whatever.)

I found no greater attack surface area (in the blockchain) than
cookies... thinking about it a bit more, there is a bit more potential
as a bounce pad/egg drop location but not much - no heap spraying as
such, or d/c tors, or heap header structs, etc. Im sure someone is sure
to come up with something very clever tho. just not me.

cheers,

steve

> 
> Thanks,
> Caleb
> 
> 
> On 04/09/2013 06:42 AM, Mike Hearn wrote:
>> OK, as the start of that conversation is now on the list, I might as well post the other thoughts we had. Or at least that I had :)
>>
>> It's tempting to see this kind of abuse through the lens of fees, because we only have a few hammers and so everything looks like a kind of nail. The problem is the moment you try to define "abuse" economically you end up excluding legitimate and beneficial uses as well. Maybe Peters patch for uneconomical outputs is different because of how it works. But mostly it's true. In this case, fees would never work - Peter said the guy who uploaded Wikileaks paid something like $500 to do it. I guess
>> by now it's more like $600-$700. It's hard for regular end users to compete with that kind of wild-eyed dedication to "the cause".
>>
>> The root problem here is people believe the block chain is a data structure that will live forever and be served by everyone for free, in perpetuity, and is thus the perfect place for "uncensorable" stuff. That's a reasonable assumption given how Bitcoin works today. But there's no reason it will be true in the long run (I know this can be an unpopular viewpoint).
>>
>> Firstly, legal issues - I think it's very unlikely any sane court would care about illegal stuff in the block chain given you need special tools to extract it (mens rea). Besides, I guess most end users will end up on SPV clients as they mature. So these users already don't have a copy of the entire block chain. I don't worry too much about this.
>>
>> Secondly, the need to host blocks forever. In future, many (most?) full nodes will be pruning, and won't actually store old blocks at all. They'll just have the utxo database, some undo blocks and some number of old blocks for serving, probably whatever fits in the amount of disk space the user is willing to allocate. But very old blocks will have been deleted. 
>>
>> This leads to the question of what incentives people have to not prune. The obvious incentive is money - charge for access to older parts of the chain. The fewer people that host it, the more you can charge. In the worst case scenario where, you know, only 10 different organizations store a copy of the chain, it might mean that bootstrapping a new node in a trust-less manner is expensive. But I really doubt it'd ever get so few. Serving large static datasets just isn't that expensive. Also, you
>> don't actually need to replay from the genesis block to bring up a new code, you can copy the UTXO database from somewhere else. By comparing the databases of lots of different nodes together, the chances of you being in a matrix-like sybil world can be reduced to "beyond reasonable doubt". Maybe nodes would charge for copies of their database too, but ideally there are lots of nodes and so the charge for that should be so close to zero as makes no odds - you can trivially undercut someone by
>> buying access to the dataset and then reselling it for a bit less, so the price should converge on the actual cost of providing the service. Which will be very cheap.
>>
>> There was one last thought I had, which is that if there's a shorter team need to discourage this kind of thing we can use a network/bandwith related hack by changing the protocol. Nodes can serve up blocks encrypted under a random key. You only get the key when you finish the download. A blacklist can apply to Bloom filtering such that transactions which are known to be "abusive" require you to fully download the block rather than select the transactions with a filter. This means that people
>> can still access the data in the chain, but the older it gets the slower and more bandwidth intensive it becomes. Stuffing Wikileaks into the chain sounds good when a 20 line Python script can extract it "instantly". If someone who wants the files has to download gigabytes of padding around it first, suddenly hosting it on a Tor hidden service becomes more attractive.
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Precog is a next-generation analytics platform capable of advanced
>> analytics on semi-structured data. The platform includes APIs for building
>> apps and a phenomenal toolset for data science. Developers can use
>> our toolset for easy data analysis & visualization. Get a free account!
>> http://www2.precog.com/precogplatform/slashdotnewsletter
>>
>>
>>
>> _______________________________________________
>> Bitcoin-development mailing list
>> Bitcoin-development@lists•sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/bitcoin-development
>>
> 
> 
> ------------------------------------------------------------------------------
> Precog is a next-generation analytics platform capable of advanced
> analytics on semi-structured data. The platform includes APIs for building
> apps and a phenomenal toolset for data science. Developers can use
> our toolset for easy data analysis & visualization. Get a free account!
> http://www2.precog.com/precogplatform/slashdotnewsletter
> _______________________________________________
> Bitcoin-development mailing list
> Bitcoin-development@lists•sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bitcoin-development
> 




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Bitcoin-development] On-going data spam
  2013-04-09 14:39   ` Caleb James DeLisle
  2013-04-09 18:56     ` steve
@ 2013-04-09 19:25     ` Gregory Maxwell
  2013-04-09 19:43       ` Mike Hearn
  1 sibling, 1 reply; 15+ messages in thread
From: Gregory Maxwell @ 2013-04-09 19:25 UTC (permalink / raw)
  To: Caleb James DeLisle; +Cc: bitcoin-development

On Tue, Apr 9, 2013 at 7:39 AM, Caleb James DeLisle
<calebdelisle@lavabit•com> wrote:
> what anti-virus software might do when certain streams of bytes are sent across
> the tcp socket or persisted to disk. Perhaps worth contacting an AV company and
> asking what is the smallest data they have a signature on.

I stuffed the testnet chain full of the EICAR test string and it
hasn't triggered for anyone— it seems that (most?) AV tools do not
scan big binary files of unknown type.. apparently.

If we encounter a case where they do we can implement storage
scrambling: E.g. every node picks a random word and all their stored
data is xored with it.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Bitcoin-development] On-going data spam
  2013-04-09 19:25     ` Gregory Maxwell
@ 2013-04-09 19:43       ` Mike Hearn
  0 siblings, 0 replies; 15+ messages in thread
From: Mike Hearn @ 2013-04-09 19:43 UTC (permalink / raw)
  To: Gregory Maxwell; +Cc: Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 1817 bytes --]

AV software changes all the time, I definitely recall cases where AV got
interested in, eg, web browser caches and ended up corrupting things. But
that might be because it knew the files were written by a web browser.
Lightly frying the contents has the disadvantage of no mmap and no
sendfile() in future. Perhaps an idea to stash in our back pockets if it
turns out to be needed later.


On Tue, Apr 9, 2013 at 9:25 PM, Gregory Maxwell <gmaxwell@gmail•com> wrote:

> On Tue, Apr 9, 2013 at 7:39 AM, Caleb James DeLisle
> <calebdelisle@lavabit•com> wrote:
> > what anti-virus software might do when certain streams of bytes are sent
> across
> > the tcp socket or persisted to disk. Perhaps worth contacting an AV
> company and
> > asking what is the smallest data they have a signature on.
>
> I stuffed the testnet chain full of the EICAR test string and it
> hasn't triggered for anyone— it seems that (most?) AV tools do not
> scan big binary files of unknown type.. apparently.
>
> If we encounter a case where they do we can implement storage
> scrambling: E.g. every node picks a random word and all their stored
> data is xored with it.
>
>
> ------------------------------------------------------------------------------
> Precog is a next-generation analytics platform capable of advanced
> analytics on semi-structured data. The platform includes APIs for building
> apps and a phenomenal toolset for data science. Developers can use
> our toolset for easy data analysis & visualization. Get a free account!
> http://www2.precog.com/precogplatform/slashdotnewsletter
> _______________________________________________
> Bitcoin-development mailing list
> Bitcoin-development@lists•sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bitcoin-development
>

[-- Attachment #2: Type: text/html, Size: 2530 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2013-04-09 19:43 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-09  1:22 [Bitcoin-development] On-going data spam Jeff Garzik
2013-04-09  9:28 ` Peter Todd
2013-04-09 10:42 ` Mike Hearn
2013-04-09 11:09   ` Peter Todd
2013-04-09 11:17     ` Jay F
2013-04-09 11:34       ` Robert Backhaus
2013-04-09 14:14     ` Mike Hearn
2013-04-09 14:39   ` Caleb James DeLisle
2013-04-09 18:56     ` steve
2013-04-09 19:25     ` Gregory Maxwell
2013-04-09 19:43       ` Mike Hearn
2013-04-09 14:50   ` Jeff Garzik
2013-04-09 14:53     ` Mike Hearn
2013-04-09 15:01       ` Jeff Garzik
2013-04-09 17:58       ` Peter Todd

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox