On Tue, Jun 19, 2012 at 10:33 AM, Alan Reiner <etotheipi@gmail.com> wrote:

> I hope that someone else here would chime in on the issue raised in the
> thread, about using a tree-structure that has multiple valid
> configurations for the same set of unspent-TxOuts.  If you use any
> binary tree, you must replay the entire history of insertions and
> deletions in the correct order to get the tree structure and correct
> root.  Along those lines, using something like a red-black tree, while
> theoretically well-known, could be subject to implementation errors.
> One implementation of a red-black tree may do the rebalancing
> differently, and still work for it's intended purpose in the majority of
> applications where it doesn't matter.  One app developer updates their
> RB tree code which updated the RB-tree optimizations/rebalancing, and
> now a significant portion of the network can't agree on the correct
> root.  Not only would that be disruptive, it would be a disaster to
> track down.
>

Then use a 2-3-4 tree (aka self-balancing B-tree of order 4), which is a
generalization of RB-trees that doesn't allow for implementation choices in
balancing (assuming ordered insertion and deletion).

As gmaxwell points out, this is an trivially fixable 'problem'. Choose a
standard, mandate it, and write test cases.

If we were to use a raw trie structure, then we'd have all the above
> issues solved:  a trie has the same configuration no matter how elements
> are inserted or deleted, and accesses to elements in the tree are
> constant time -- O(1).  There is no such thing as an unbalanced trie.
> But overall space-efficiency is an issue.
>
> A PATRICIA tree/trie would be ideal, in my mind, as it also has a
> completely deterministic structure, and is an order-of-magnitude more
> space-efficient.  Insert, delete and query times are still O(1).
> However, it is not a trivial implementation.  I have occasionally looked
> for implementations, but not found any that were satisfactory.
>

No, a trie of any sort is dependent upon distribution of input data for
balancing. As Peter Todd points out, a malicious actor could construct
transaction or address hashes in such a way as to grow some segment of the
trie in an unbalanced fashion. It's not much of an attack, but in principle
exploitable under particular timing-sensitive circumstances.

Self-balancing search trees (KVL, RB, 2-3-4, whatever) don't suffer from
this problem.

Mark