>>>> This does not produce unmalleable block hashes. Duplicate tx hash malleation remains in either case, to the same effect. Without a resolution to both issues this is an empty promise.

>>> Duplicate txids have been invalid since 2012 (CVE-2012-2459).

>> I think again here you may have misunderstood me. I was not making a point pertaining to BIP30.

> No, in fact you did. CVE-2012-2459 is unrelated to BIP30, it's the duplicate txids malleability found by forrestv in 2012. It's the one you are talking about thereafter and the one relevant for the purpose of this discussion.

Yes, my mistake. I didn't look up the CVE because malleability has no affect on consensus rules (validity). Without BIP30/34/90 a duplicated tx/txid (in a given chain) would still be valid (and under the caveats previously mentioned, still is). So I assumed you were referring to it/them. Malleability pertains strictly to validation implementation shortcuts (checkpoints, milestones, invalidity caching), not what is actually valid.

>> The proposal does not enable that objective, it is already the case. No malleated block is a valid block.

> You are right. The advantage i initially mentioned about how making 64-bytes transactions invalid could help caching block failures at an earlier stage is incorrect.

Hopefully the discussion leads to simpler and more performant implementation. As I mentioned previously, the usefulness (i.e. performance improving outcome) of block hash invalidity caching is very limited.

Libbitcoin implements an append-only store. And we write a checkpointed, milestoned, or current/strong header chains before obtaining blocks. So in the case where an invalid block corresponds to a stored header we must store the header's invalidity. Obviously this is guarded by PoW and therefore extremely rare, but must be accounted for. Otherwise we do not under any circumstances store invalidity. This is far more effective than storing it, even under heavy/constant "attack".

Given the PoW guard, the worst case scenario is where the witness commitment is invalid (it is performed after tx commitment, because it relies on the coinbase tx commit). Next worse is where the tx commitment is invalid. Neither present any cost to the attacker and neither rely on Merkle tree malleability. The latter requires hashing every tx and performing the Merkle root calculation. The former requires doing this twice. For a block with 4096 txs, that's [2 * (4096 + 4095) = 16382] tx hashes.

While that's nothing to sneeze at, in our implementation this constitutes 1-2% of total sync time on my 7 year old machine (no shani and no avx512). But what if we were to cache every invalid hash? Let's say we're under constant attack (despite dropping any peer that provides an invalid/unrequested block/message). The smart attacker doesn't use malleation, since he knows this is mitigated and cheaper in both cases to guard against. He just sends block messages with requested headers and a maximal set of valid txs (maybe from that actual block) and modifies one byte of any witness (or of any script for non-witness blocks). Every time sending a unique block, of which he can produce an effectively unlimited quantity. With or without caching this requires computation of all 16382 hashes for each bogus block that includes a requested header (unrequested are dismissed at the cost of just one hash).

In this case there is never a cache hit. Each bogus block is unique, but "valid enough" to force full double Merkle root computations. Storing the cached invalid hash then absorbs additional time and 32 bytes of space plus indexation, and achieves nothing. It's as if the hope is that the attacker is dumb and just keeps sending the same invalid block. But what's actually happening as (1) deoptimization, (2) unnecessary complexity, and (3) exposure to a disk-full attack vector which must then also be mitigated.

The other scenarios where parse fails cannot rely on invalidity caching, since they don't produce valid commitments, and are dismissed cheaply. That leaves only malleability. This comes in two forms, the 64 byte form ("type64") and what we call "type32" (hashes are 32 bytes and in this form they are duplicated). Type64 malleation is the cheapest form of dismissal, very early in parse (as discussed). Type32 malleation is far more expensive, but no more so than the worst case scenario above. In the Core implementation this detection adds a constant (and unnecessarily high) cost to the Merkle root computation. This makes it *more* expensive to detect than the worst case non-witness scenario above (and its discovery cannot be cached). It is possible to reduce this cost significantly by relying on some simple math operating over the tx count. So even this scenario is not inherently worst case.

So unless one is caching invalidity under PoW and due to an append-only store, I can see no reason to ever do it. Getting rid of it would improve both performance and security while reducing complexity. Optimally dismissing both types of malleation as described would improve performance, but is neutral regarding security.

e

--
You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bitcoindev/c8f285b3-bcc4-43f3-b9d8-06fe23ee8303n%40googlegroups.com.