On Tue, Jun 19, 2012 at 10:33 AM, Alan Reiner <etotheipi@gmail.com> wrote:
I hope that someone else here would chime in on the issue raised in the
thread, about using a tree-structure that has multiple valid
configurations for the same set of unspent-TxOuts.  If you use any
binary tree, you must replay the entire history of insertions and
deletions in the correct order to get the tree structure and correct
root.  Along those lines, using something like a red-black tree, while
theoretically well-known, could be subject to implementation errors.
One implementation of a red-black tree may do the rebalancing
differently, and still work for it's intended purpose in the majority of
applications where it doesn't matter.  One app developer updates their
RB tree code which updated the RB-tree optimizations/rebalancing, and
now a significant portion of the network can't agree on the correct
root.  Not only would that be disruptive, it would be a disaster to
track down.

Then use a 2-3-4 tree (aka self-balancing B-tree of order 4), which is a generalization of RB-trees that doesn't allow for implementation choices in balancing (assuming ordered insertion and deletion).

As gmaxwell points out, this is an trivially fixable 'problem'. Choose a standard, mandate it, and write test cases.

If we were to use a raw trie structure, then we'd have all the above
issues solved:  a trie has the same configuration no matter how elements
are inserted or deleted, and accesses to elements in the tree are
constant time -- O(1).  There is no such thing as an unbalanced trie.
But overall space-efficiency is an issue.

A PATRICIA tree/trie would be ideal, in my mind, as it also has a
completely deterministic structure, and is an order-of-magnitude more
space-efficient.  Insert, delete and query times are still O(1).
However, it is not a trivial implementation.  I have occasionally looked
for implementations, but not found any that were satisfactory.

No, a trie of any sort is dependent upon distribution of input data for balancing. As Peter Todd points out, a malicious actor could construct transaction or address hashes in such a way as to grow some segment of the trie in an unbalanced fashion. It's not much of an attack, but in principle exploitable under particular timing-sensitive circumstances.

Self-balancing search trees (KVL, RB, 2-3-4, whatever) don't suffer from this problem.

Mark