On Mon, May 22, 2017 at 12:05 AM, Russell O'Connor via bitcoin-dev <bitcoin-dev@lists.linuxfoundation.org> wrote:
The SHA256 compression function takes two inputs:

1. A 256-bit value for the previous chunk in a chain, or an initial vector in the case of the first chunk.
2. A 512-bit chunk of data.

    sha256Compress : Word256 × Word512 -> Word256

In total, the SHA256 compression function compresses 768-bits into 256-bits.  The Merkle roots of two branches occupy 512 bits, and this leaves another 256-bits of space available for tags.

Ya know, when you're building a Merkle Trie that's exactly the primitive you need.

In my own construction the assumption is that the values are already hashed down to 256 bits when they're passed in, and the tags (which are currently done by sacrificing bits instead of using tags, that needs to be fixed) include three states for either side: empty, unary, or middle. Three of those possibilities are unreachable (empty/empty, empty/unary, unary/empty) so there are 6 possible tags needed. This approach essentially skips doing the unary hashes, a further performance improvement. There doesn't appear to be any downside in leveraging this trick as long as tags are cheap.