Re: [Bitcoin-development] Ultimate Blockchain Compression w/ trust-free lite node

public inbox for bitcoindev@googlegroups.com
 help / color / mirror / Atom feed

* Re: [Bitcoin-development] Ultimate Blockchain Compression w/ trust-free lite node
@ 2012-06-19 18:29 Andrew Miller
  0 siblings, 0 replies; 9+ messages in thread
From: Andrew Miller @ 2012-06-19 18:29 UTC (permalink / raw)
  To: bitcoin-development

Alan Reiner wrote:
> A PATRICIA tree/trie would be ideal, in my mind, as it also has a
> completely deterministic structure, and is an order-of-magnitude more
> space-efficient.  Insert, delete and query times are still O(1).
> However, it is not a trivial implementation.  I have occasionally looked
> for implementations, but not found any that were satisfactory.

PATRICIA Tries (aka Radix trees) have worst-case O(k), where k is the
number of bits in the key. Notice that since we would storing k-bit
hashes, the number of elements must be less than 2^k, or else by
birthday paradox we would have a hash collision! So O(log N) <= O(k).

You're right, though, that such a trie would have the property that
any two trees containing the same data (leaves) will be identical. I
can't think of any reason why this is useful, although I am hoping we
can figure out what is triggering your intuition to desire this! I am
indeed assuming that the tree will be incrementally constructed
according to the canonical (blockchain) ordering of transactions, and
that the balancing rules are agreed on as part of the protocol.

-- 
Andrew Miller

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bitcoin-development] Ultimate Blockchain Compression w/ trust-free lite node
  2012-06-21 21:42       ` Mike Koss
@ 2012-06-21 22:02         ` Gregory Maxwell
  0 siblings, 0 replies; 9+ messages in thread
From: Gregory Maxwell @ 2012-06-21 22:02 UTC (permalink / raw)
  To: Mike Koss; +Cc: bitcoin-development

On Thu, Jun 21, 2012 at 5:42 PM, Mike Koss <mike@coinlab•com> wrote:
> Are we just talking about pruning the spent transactions from an old block?

No.

We're talking about commitments to the state of _unspent_ transactions
which would allow ~memoryless nodes to engage in full validation
without having to trust anything with the help of some untrusted
non-memoryless peers.  Also, talking about being able to securely
initialize new pruned nodes (not memoryless but reduced memory)
without exposing them to the old history of the chain. In both cases
this is possible without substantially degrading the full node
security model (rule violations prior to where they begin are only
undetectable with a conspiracy of the entire network).

But it requires a new data structure for managing these trees of
unspent transactions in a secure, scalable, and DOS resistant manner.
Fortunately there are lots of possibilities here.

> Does it really make sense to adopt a more complex data-structure than the merkle tree for inclusing in the bticoin protocol?

Yes. Though this is obviously not an ultra short term thing.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bitcoin-development] Ultimate Blockchain Compression w/ trust-free lite node
  2012-06-19 18:30     ` Alan Reiner
@ 2012-06-21 21:42       ` Mike Koss
  2012-06-21 22:02         ` Gregory Maxwell
  0 siblings, 1 reply; 9+ messages in thread
From: Mike Koss @ 2012-06-21 21:42 UTC (permalink / raw)
  To: Alan Reiner; +Cc: bitcoin-development

[-- Attachment #1: Type: text/plain, Size: 3922 bytes --]

Are we just talking about pruning the spent transactions from an old block?
 We already have a data structure that allows us to replace any un-needed
transaction by just it's hash - and possibly a whole sub-tree if we get
lucky in that the un-needed transaction all fall within a common node of
the merkle tree.

If a lite client only cares to retain a single transaction in a block (the
most common case) - it will only need O(log2(T)) merkle hashes plus the
transaction it cares about.

Does it really make sense to adopt a more complex data-structure than the
merkle tree for inclusing in the bticoin protocol?  And we're not talking
about blocks with millions of transactions in them - I don't understand the
relevance of Order statistics for random access to a transaction given its
block.

On Tue, Jun 19, 2012 at 11:30 AM, Alan Reiner <etotheipi@gmail•com> wrote:

>  On 06/19/2012 02:18 PM, Mark Friedenbach wrote:
>
> On Tue, Jun 19, 2012 at 10:33 AM, Alan Reiner <etotheipi@gmail•com> wrote:
>
>  If we were to use a raw trie structure, then we'd have all the above
>> issues solved:  a trie has the same configuration no matter how elements
>> are inserted or deleted, and accesses to elements in the tree are
>> constant time -- O(1).  There is no such thing as an unbalanced trie.
>> But overall space-efficiency is an issue.
>>
>> A PATRICIA tree/trie would be ideal, in my mind, as it also has a
>> completely deterministic structure, and is an order-of-magnitude more
>> space-efficient.  Insert, delete and query times are still O(1).
>> However, it is not a trivial implementation.  I have occasionally looked
>> for implementations, but not found any that were satisfactory.
>>
>
>  No, a trie of any sort is dependent upon distribution of input data for
> balancing. As Peter Todd points out, a malicious actor could construct
> transaction or address hashes in such a way as to grow some segment of the
> trie in an unbalanced fashion. It's not much of an attack, but in principle
> exploitable under particular timing-sensitive circumstances.
>
>  Self-balancing search trees (KVL, RB, 2-3-4, whatever) don't suffer from
> this problem.
>
>  Mark
>
>
> I was using "unbalanced" to refer to "query time" (and also insert/delete
> time).  If your trie nodes branch based on the next byte of your key hash,
> then the max depth of your trie is 32.  Period.  No one can do anything to
> ever make you do more than 32 hops to find/insert/delete your data.   And
> if you're using a raw trie, you'll always use *exactly* 32 hops
> regardless of the distribution of the underlying data.  Hence, the trie
> structure is deterministic (history-independent) and cannot become
> unbalanced in terms of access time.
>
> My first concern was that a malicious actor could linearize parts of the
> tree and cause access requests to take much longer than log(N) time.  With
> the trie, that's not only impossible, you're actually accessing in O(1)
> time.
>
> However, you are right that disk space can be affected by a malicious
> actor.  The more branching he can induce, the more branch nodes that are
> created to support branches with only one leaf.
>
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Bitcoin-development mailing list
> Bitcoin-development@lists•sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bitcoin-development
>
>


-- 
Mike Koss
CTO, CoinLab
(425) 246-7701 (m)

A Bitcoin Primer <http://coinlab.com/a-bitcoin-primer.pdf> - What you need
to know about Bitcoins.

[-- Attachment #2: Type: text/html, Size: 6305 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bitcoin-development] Ultimate Blockchain Compression w/ trust-free lite node
  2012-06-19 18:18   ` Mark Friedenbach
@ 2012-06-19 18:30     ` Alan Reiner
  2012-06-21 21:42       ` Mike Koss
  0 siblings, 1 reply; 9+ messages in thread
From: Alan Reiner @ 2012-06-19 18:30 UTC (permalink / raw)
  To: Mark Friedenbach; +Cc: bitcoin-development

[-- Attachment #1: Type: text/plain, Size: 2340 bytes --]

On 06/19/2012 02:18 PM, Mark Friedenbach wrote:
> On Tue, Jun 19, 2012 at 10:33 AM, Alan Reiner <etotheipi@gmail•com 
> <mailto:etotheipi@gmail•com>> wrote:
>
>     If we were to use a raw trie structure, then we'd have all the above
>     issues solved:  a trie has the same configuration no matter how
>     elements
>     are inserted or deleted, and accesses to elements in the tree are
>     constant time -- O(1).  There is no such thing as an unbalanced trie.
>     But overall space-efficiency is an issue.
>
>     A PATRICIA tree/trie would be ideal, in my mind, as it also has a
>     completely deterministic structure, and is an order-of-magnitude more
>     space-efficient.  Insert, delete and query times are still O(1).
>     However, it is not a trivial implementation.  I have occasionally
>     looked
>     for implementations, but not found any that were satisfactory.
>
>
> No, a trie of any sort is dependent upon distribution of input data 
> for balancing. As Peter Todd points out, a malicious actor could 
> construct transaction or address hashes in such a way as to grow some 
> segment of the trie in an unbalanced fashion. It's not much of an 
> attack, but in principle exploitable under particular timing-sensitive 
> circumstances.
>
> Self-balancing search trees (KVL, RB, 2-3-4, whatever) don't suffer 
> from this problem.
>
> Mark

I was using "unbalanced" to refer to "query time" (and also 
insert/delete time).  If your trie nodes branch based on the next byte 
of your key hash, then the max depth of your trie is 32.  Period.  No 
one can do anything to ever make you do more than 32 hops to 
find/insert/delete your data.   And if you're using a raw trie, you'll 
always use /exactly/ 32 hops regardless of the distribution of the 
underlying data.  Hence, the trie structure is deterministic 
(history-independent) and cannot become unbalanced in terms of access time.

My first concern was that a malicious actor could linearize parts of the 
tree and cause access requests to take much longer than log(N) time.  
With the trie, that's not only impossible, you're actually accessing in 
O(1) time.

However, you are right that disk space can be affected by a malicious 
actor.  The more branching he can induce, the more branch nodes that are 
created to support branches with only one leaf.

[-- Attachment #2: Type: text/html, Size: 4120 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bitcoin-development] Ultimate Blockchain Compression w/ trust-free lite node
  2012-06-19 17:33 ` Alan Reiner
  2012-06-19 17:59   ` Gregory Maxwell
@ 2012-06-19 18:18   ` Mark Friedenbach
  2012-06-19 18:30     ` Alan Reiner
  1 sibling, 1 reply; 9+ messages in thread
From: Mark Friedenbach @ 2012-06-19 18:18 UTC (permalink / raw)
  To: Alan Reiner; +Cc: bitcoin-development

[-- Attachment #1: Type: text/plain, Size: 2456 bytes --]

On Tue, Jun 19, 2012 at 10:33 AM, Alan Reiner <etotheipi@gmail•com> wrote:

> I hope that someone else here would chime in on the issue raised in the
> thread, about using a tree-structure that has multiple valid
> configurations for the same set of unspent-TxOuts.  If you use any
> binary tree, you must replay the entire history of insertions and
> deletions in the correct order to get the tree structure and correct
> root.  Along those lines, using something like a red-black tree, while
> theoretically well-known, could be subject to implementation errors.
> One implementation of a red-black tree may do the rebalancing
> differently, and still work for it's intended purpose in the majority of
> applications where it doesn't matter.  One app developer updates their
> RB tree code which updated the RB-tree optimizations/rebalancing, and
> now a significant portion of the network can't agree on the correct
> root.  Not only would that be disruptive, it would be a disaster to
> track down.
>

Then use a 2-3-4 tree (aka self-balancing B-tree of order 4), which is a
generalization of RB-trees that doesn't allow for implementation choices in
balancing (assuming ordered insertion and deletion).

As gmaxwell points out, this is an trivially fixable 'problem'. Choose a
standard, mandate it, and write test cases.

If we were to use a raw trie structure, then we'd have all the above
> issues solved:  a trie has the same configuration no matter how elements
> are inserted or deleted, and accesses to elements in the tree are
> constant time -- O(1).  There is no such thing as an unbalanced trie.
> But overall space-efficiency is an issue.
>
> A PATRICIA tree/trie would be ideal, in my mind, as it also has a
> completely deterministic structure, and is an order-of-magnitude more
> space-efficient.  Insert, delete and query times are still O(1).
> However, it is not a trivial implementation.  I have occasionally looked
> for implementations, but not found any that were satisfactory.
>

No, a trie of any sort is dependent upon distribution of input data for
balancing. As Peter Todd points out, a malicious actor could construct
transaction or address hashes in such a way as to grow some segment of the
trie in an unbalanced fashion. It's not much of an attack, but in principle
exploitable under particular timing-sensitive circumstances.

Self-balancing search trees (KVL, RB, 2-3-4, whatever) don't suffer from
this problem.

Mark

[-- Attachment #2: Type: text/html, Size: 3755 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bitcoin-development] Ultimate Blockchain Compression w/ trust-free lite node
  2012-06-19 17:59   ` Gregory Maxwell
@ 2012-06-19 18:12     ` Alan Reiner
  0 siblings, 0 replies; 9+ messages in thread
From: Alan Reiner @ 2012-06-19 18:12 UTC (permalink / raw)
  To: Gregory Maxwell; +Cc: bitcoin-development

[-- Attachment #1: Type: text/plain, Size: 1626 bytes --]

On 06/19/2012 01:59 PM, Gregory Maxwell wrote:
> On Tue, Jun 19, 2012 at 1:33 PM, Alan Reiner<etotheipi@gmail•com>  wrote:
>>   One app developer updates their
>> RB tree code which updated the RB-tree optimizations/rebalancing, and
>> now a significant portion of the network can't agree on the correct
>> root.  Not only would that be disruptive, it would be a disaster to
>> track down.
> This is why good comprehensive tests and a well specified algorithim
> are important. The tree update algorithm would be normative in that
> scheme. Worrying that implementers might get it wrong would be like
> worrying that they'd get SHA256 wrong.

The point is not that they get it *wrong*, it's that the implement it 
*differently*.  Given a set of 100 TxOuts, there's a seemingly-infinite 
number of ways to construct a binary tree.  Put them in in a different 
order, and you get a different tree. *They're all correct and legal* in 
terms of satisfying expectations of insert, delete and query runtime -- 
but they will produce different root hashes.   And the differences in 
underlying structure are completely transparent to the calling code.

I'm extremely uncomfortable with the idea the you can have all the nodes 
in the tree, but have to replay X years of blockchain history just to 
get the same tree configuration as someone else.  However, a trie 
configuration is history-independent -- given an unspent-TxOut list, 
there's only one way to construct that tree.  That's an important 
property to me.

I can't tell if you're joking about Judy structures: I've never heard of 
them.  But I'll look into it anyway...

[-- Attachment #2: Type: text/html, Size: 2224 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bitcoin-development] Ultimate Blockchain Compression w/ trust-free lite node
  2012-06-19 17:33 ` Alan Reiner
@ 2012-06-19 17:59   ` Gregory Maxwell
  2012-06-19 18:12     ` Alan Reiner
  2012-06-19 18:18   ` Mark Friedenbach
  1 sibling, 1 reply; 9+ messages in thread
From: Gregory Maxwell @ 2012-06-19 17:59 UTC (permalink / raw)
  To: Alan Reiner; +Cc: bitcoin-development

On Tue, Jun 19, 2012 at 1:33 PM, Alan Reiner <etotheipi@gmail•com> wrote:
> One app developer updates their
> RB tree code which updated the RB-tree optimizations/rebalancing, and
> now a significant portion of the network can't agree on the correct
> root.  Not only would that be disruptive, it would be a disaster to
> track down.

This is why good comprehensive tests and a well specified algorithim
are important. The tree update algorithm would be normative in that
scheme. Worrying that implementers might get it wrong would be like
worrying that they'd get SHA256 wrong.

> A PATRICIA tree/trie would be ideal, in my mind, as it also has a
> completely deterministic structure, and is an order-of-magnitude more

Provable libJudy trees. Oh boy.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bitcoin-development] Ultimate Blockchain Compression w/ trust-free lite node
  2012-06-19 16:46 Andrew Miller
@ 2012-06-19 17:33 ` Alan Reiner
  2012-06-19 17:59   ` Gregory Maxwell
  2012-06-19 18:18   ` Mark Friedenbach
  0 siblings, 2 replies; 9+ messages in thread
From: Alan Reiner @ 2012-06-19 17:33 UTC (permalink / raw)
  To: bitcoin-development

I hope that someone else here would chime in on the issue raised in the 
thread, about using a tree-structure that has multiple valid 
configurations for the same set of unspent-TxOuts.  If you use any 
binary tree, you must replay the entire history of insertions and 
deletions in the correct order to get the tree structure and correct 
root.  Along those lines, using something like a red-black tree, while 
theoretically well-known, could be subject to implementation errors.  
One implementation of a red-black tree may do the rebalancing 
differently, and still work for it's intended purpose in the majority of 
applications where it doesn't matter.  One app developer updates their 
RB tree code which updated the RB-tree optimizations/rebalancing, and 
now a significant portion of the network can't agree on the correct 
root.  Not only would that be disruptive, it would be a disaster to 
track down.

If we were to use a raw trie structure, then we'd have all the above 
issues solved:  a trie has the same configuration no matter how elements 
are inserted or deleted, and accesses to elements in the tree are 
constant time -- O(1).  There is no such thing as an unbalanced trie.  
But overall space-efficiency is an issue.

A PATRICIA tree/trie would be ideal, in my mind, as it also has a 
completely deterministic structure, and is an order-of-magnitude more 
space-efficient.  Insert, delete and query times are still O(1).    
However, it is not a trivial implementation.  I have occasionally looked 
for implementations, but not found any that were satisfactory.

So, I don't have a good all-around solution, within my own stated 
constraints. But perhaps I'm being too demanding of this solution.

-Alan

On 06/19/2012 12:46 PM, Andrew Miller wrote:
>> Peter Todd wrote:
>> My solution was to simply state that vertexes that happened to cause the
>> tree to be unbalanced would be discarded, and set the depth of inbalance
>> such that this would be extremely unlikely to happen by accident. I'd
>> rather see someone come up with something better though.
> Here is a simpler solution. (most of this message repeats the content
> of my reply to the forum)
>
> Suppose we were talking about a binary search tree, rather than a
> Merkle tree. It's important to balance a binary search tree, so that
> the worst-case maximum length from the root to a leaf is bounded by
> O(log N). AVL trees were the original algorithm to do this, Red-Black
> trees are also popular, and there are many similar methods. All
> involve storing some form of 'balancing metadata' at each node. In a
> RedBlack tree, this is a single bit (red or black). Every operation on
> these trees, including search, inserting, deleting, and rebalancing,
> requires a worst-case effort of O(log N).
>
> Any (acyclic) recursive data structure can be Merkle-ized, simply by
> adding a hash of the child node alongside each link/pointer. This way,
> you can verify the data for each node very naturally, as you traverse
> the structure.
>
> In fact, as long as a lite-client knows the O(1) root hash, the rest
> of the storage burden can be delegated to an untrusted helper server.
> Suppose a lite-client wants to insert and rebalance its tree. This
> requires accessing at most O(log N) nodes. The client can request only
> the data relevant to these nodes, and it knows the hash for each chunk
> of data in advance of accessing it. After computing the updated root
> hash, the client can even discard the data it processed.
>
> This technique has been well discussed in the academic literature,
> e.g. [1,2], although since I am not aware of any existing
> implementation, I made my own, intended as an explanatory aid:
> https://github.com/amiller/redblackmerkle/blob/master/redblack.py
>
>
> [1] Certificate Revocation and Update
>      Naor and Nissim. 1998
>      http://static.usenix.org/publications/library/proceedings/sec98/full_papers/nissim/nissim.pdf
>
> [2] A General Model for Authenticated Data Structures
>      Martel, Nuckolls, Devanbu, Michael Gertz, Kwong, Stubblebine. 2004
>      http://truthsayer.cs.ucdavis.edu/algorithmica.pdf
>
> --
> Andrew Miller
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Bitcoin-development mailing list
> Bitcoin-development@lists•sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bitcoin-development

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bitcoin-development] Ultimate Blockchain Compression w/ trust-free lite node
@ 2012-06-19 16:46 Andrew Miller
  2012-06-19 17:33 ` Alan Reiner
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Miller @ 2012-06-19 16:46 UTC (permalink / raw)
  To: Bitcoin-development

> Peter Todd wrote:
> My solution was to simply state that vertexes that happened to cause the
> tree to be unbalanced would be discarded, and set the depth of inbalance
> such that this would be extremely unlikely to happen by accident. I'd
> rather see someone come up with something better though.

Here is a simpler solution. (most of this message repeats the content
of my reply to the forum)

Suppose we were talking about a binary search tree, rather than a
Merkle tree. It's important to balance a binary search tree, so that
the worst-case maximum length from the root to a leaf is bounded by
O(log N). AVL trees were the original algorithm to do this, Red-Black
trees are also popular, and there are many similar methods. All
involve storing some form of 'balancing metadata' at each node. In a
RedBlack tree, this is a single bit (red or black). Every operation on
these trees, including search, inserting, deleting, and rebalancing,
requires a worst-case effort of O(log N).

Any (acyclic) recursive data structure can be Merkle-ized, simply by
adding a hash of the child node alongside each link/pointer. This way,
you can verify the data for each node very naturally, as you traverse
the structure.

In fact, as long as a lite-client knows the O(1) root hash, the rest
of the storage burden can be delegated to an untrusted helper server.
Suppose a lite-client wants to insert and rebalance its tree. This
requires accessing at most O(log N) nodes. The client can request only
the data relevant to these nodes, and it knows the hash for each chunk
of data in advance of accessing it. After computing the updated root
hash, the client can even discard the data it processed.

This technique has been well discussed in the academic literature,
e.g. [1,2], although since I am not aware of any existing
implementation, I made my own, intended as an explanatory aid:
https://github.com/amiller/redblackmerkle/blob/master/redblack.py

[1] Certificate Revocation and Update
    Naor and Nissim. 1998
    http://static.usenix.org/publications/library/proceedings/sec98/full_papers/nissim/nissim.pdf

[2] A General Model for Authenticated Data Structures
    Martel, Nuckolls, Devanbu, Michael Gertz, Kwong, Stubblebine. 2004
    http://truthsayer.cs.ucdavis.edu/algorithmica.pdf

--
Andrew Miller

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2012-06-21 22:02 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-19 18:29 [Bitcoin-development] Ultimate Blockchain Compression w/ trust-free lite node Andrew Miller
  -- strict thread matches above, loose matches on Subject: below --
2012-06-19 16:46 Andrew Miller
2012-06-19 17:33 ` Alan Reiner
2012-06-19 17:59   ` Gregory Maxwell
2012-06-19 18:12     ` Alan Reiner
2012-06-19 18:18   ` Mark Friedenbach
2012-06-19 18:30     ` Alan Reiner
2012-06-21 21:42       ` Mike Koss
2012-06-21 22:02         ` Gregory Maxwell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox