Are we just talking about pruning the spent transactions from an old block?  We already have a data structure that allows us to replace any un-needed transaction by just it's hash - and possibly a whole sub-tree if we get lucky in that the un-needed transaction all fall within a common node of the merkle tree.

If a lite client only cares to retain a single transaction in a block (the most common case) - it will only need O(log2(T)) merkle hashes plus the transaction it cares about.

Does it really make sense to adopt a more complex data-structure than the merkle tree for inclusing in the bticoin protocol?  And we're not talking about blocks with millions of transactions in them - I don't understand the relevance of Order statistics for random access to a transaction given its block.

On Tue, Jun 19, 2012 at 11:30 AM, Alan Reiner <etotheipi@gmail.com> wrote:
On 06/19/2012 02:18 PM, Mark Friedenbach wrote:
On Tue, Jun 19, 2012 at 10:33 AM, Alan Reiner <etotheipi@gmail.com> wrote:

If we were to use a raw trie structure, then we'd have all the above
issues solved:  a trie has the same configuration no matter how elements
are inserted or deleted, and accesses to elements in the tree are
constant time -- O(1).  There is no such thing as an unbalanced trie.
But overall space-efficiency is an issue.

A PATRICIA tree/trie would be ideal, in my mind, as it also has a
completely deterministic structure, and is an order-of-magnitude more
space-efficient.  Insert, delete and query times are still O(1).
However, it is not a trivial implementation.  I have occasionally looked
for implementations, but not found any that were satisfactory.

No, a trie of any sort is dependent upon distribution of input data for balancing. As Peter Todd points out, a malicious actor could construct transaction or address hashes in such a way as to grow some segment of the trie in an unbalanced fashion. It's not much of an attack, but in principle exploitable under particular timing-sensitive circumstances.

Self-balancing search trees (KVL, RB, 2-3-4, whatever) don't suffer from this problem.

Mark

I was using "unbalanced" to refer to "query time" (and also insert/delete time).  If your trie nodes branch based on the next byte of your key hash, then the max depth of your trie is 32.  Period.  No one can do anything to ever make you do more than 32 hops to find/insert/delete your data.   And if you're using a raw trie, you'll always use exactly 32 hops regardless of the distribution of the underlying data.  Hence, the trie structure is deterministic (history-independent) and cannot become unbalanced in terms of access time.

My first concern was that a malicious actor could linearize parts of the tree and cause access requests to take much longer than log(N) time.  With the trie, that's not only impossible, you're actually accessing in O(1) time.

However, you are right that disk space can be affected by a malicious actor.  The more branching he can induce, the more branch nodes that are created to support branches with only one leaf. 



------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development




--
Mike Koss
CTO, CoinLab
(425) 246-7701 (m)

A Bitcoin Primer - What you need to know about Bitcoins.