Hi Eric,

> While at some level the block message buffer would generally be 
referenced by one or more C pointers, the difference between a valid 
coinbase input (i.e. with a "null point") and any other input, is not 
nullptr vs. !nullptr. A "null point" is a 36 byte value, 32 0x00 byes 
followed by 4 0xff bytes. In his infinite wisdom Satoshi decided it was 
better (or easier) to serialize a first block tx (coinbase) with an input 
containing an unusable script and pointing to an invalid [tx:index] tuple 
(input point) as opposed to just not having any input. That invalid input 
point is called a "null point", and of course cannot be pointed to by a 
"null pointer". The coinbase must be identified by comparing those 36 bytes 
to the well-known null point value (and if this does not match the Merkle 
hash cannot have been type64 malleated).

Good for the clarification here, I had in mind the core's `CheckBlock` path 
where the first block transaction pointer is dereferenced to verify if the 
transaction is a coinbase (i.e a "null point" where the prevout is null). 
Zooming out and back to my remark, I think this is correct that adding a 
new 64 byte size check on all block transactions to detect block hash 
invalidity could be a low memory overhead (implementation dependant), 
rather than making that 64 byte check alone on the coinbase transaction as 
in my understanding you're proposing.

> We call this type64 malleability (or malleation where it is not only 
possible but occurs).

Yes, the problem which has been described as the lack of "domain 
separation".

> The second one is the bip141 wtxid commitment in one of the coinbase 
transaction `scriptpubkey` output, which is itself covered by a txid in the 
merkle tree.

> While symmetry seems to imply that the witness commitment would be 
malleable, just as the txs commitment, this is not the case. If the tx 
commitment is correct it is computationally infeasible for the witness 
commitment to be malleated, as the witness commitment incorporates each 
full tx (with witness, sentinel, and marker). As such the block identifier, 
which relies only on the header and tx commitment, is a sufficient 
identifier. Yet it remains necessary to validate the witness commitment to 
ensure that the correct witness data has been provided in the block message.
> 
> The second type of malleability, in addition to type64, is what we call 
type32. This is the consequence of duplicated trailing sets of txs (and 
therefore tx hashes) in a block message. This is applicable to some but not 
all blocks, as a function of the number of txs contained.

To precise more your statement in describing source of malleability. The 
witness stack can be malleated altering the wtxid and yet still valid. I 
think you can still have the case where you're feeded a block header with a 
merkle root commitment deserializing to a valid coinbase transaction with 
an invalid witness commitment. This is the case of a "block message with 
valid header but malleatead committed valid tx data". Validation of the 
witness commitment to ensure the correct witness data has been provided in 
the block message is indeed necessary.

>> Background: A fully-validated block has established identity in its 
block hash. However an invalid block message may include the same block 
header, producing the same hash, but with any kind of nonsense following 
the header. The purpose of the transaction and witness commitments is of 
course to establish this identity, so these two checks are therefore 
necessary even under checkpoint/milestone. And then of course the two 
Merkle tree issues complicate the tx commitment (the integrity of the 
witness commitment is assured by that of the tx commitment).
>>
>> So what does it mean to speak of a block hash derived from:
>> (1) a block message with an unparseable header?
>> (2) a block message with parseable but invalid header?
>> (3) a block message with valid header but unparseable tx data?
>> (4) a block message with valid header but parseable invalid uncommitted 
tx data?
>> (5) a block message with valid header but parseable invalid malleated 
committed tx data?
>> (6) a block message with valid header but parseable invalid unmalleated 
committed tx data?
>> (7) a block message with valid header but uncommitted valid tx data?
>> (8) a block message with valid header but malleated committed valid tx 
data?
>> (9) a block message with valid header but unmalleated committed valid tx 
data?
>>
>> Note that only the #9 p2p block message contains an actual Bitcoin 
block, the others are bogus messages. In all cases the message can be 
sha256 hashed to establish the identity of the *message*. And if one's 
objective is to reject repeating bogus messages, this might be a useful 
strategy. It's already part of the p2p protocol, is orders of magnitude 
cheaper to produce than a Merkle root, and has no identity issues.

> I think I mostly agree with the identity issue as laid out so far, there 
is one caveat to add if you're considering identity caching as the problem 
solved. A validation node might have to consider differently block messages 
processed if they connect on the longest most PoW valid chain for which all 
blocks have been validated. Or alternatively if they have to be added on a 
candidate longest most PoW valid chain.

> Certainly an important consideration. We store both types. Once there is 
a stronger candidate header chain we store the headers and proceed to 
obtaining the blocks (if we don't already have them). The blocks are stored 
in the same table; the confirmed vs. candidate indexes simply point to them 
as applicable. It is feasible (and has happened twice) for two blocks to 
share the very same coinbase tx, even with either/all bip30/34/90 active 
(and setting aside future issues here for the sake of simplicity). This 
remains only because two competing branches can have blocks at the same 
height, and bip34 requires only height in the coinbase input script. This 
therefore implies the same transaction but distinct blocks. It is however 
infeasible for one block to exist in multiple distinct chains. In order for 
this to happen two blocks at the same height must have the same coinbase 
(ok), and also the same parent (ok). But this then means that they either 
(1) have distinct identity due to another header property deviation, or (2) 
are the same block with the same parent and are therefore in just one 
chain. So I don't see an actual caveat. I'm not certain if this is the 
ambiguity that you were referring to. If not please feel free to clarify.

If you assume no network partition and the no blocks more than 2h in the 
future consensus rule, I cannot see how one block with no header property 
deviation can exist in multiple distinct chains. The ambiguity I was 
referring was about a different angle, if the design goal of introducing a 
64 byte size check is to "it was about being able to cache the hash of a 
(non-malleated) invalid block as permanently invalid to avoid 
re-downloading and re-validating it", in my thinking we shall consider the 
whole block headers caching strategy and be sure we don't get situations 
where an attacker can attach a chain of low-pow block headers with 
malleated committed valid tx data yielding a block invalidity at the end, 
provoking as a side-effect a network-wide data download blowup. So I think 
any implementation of the validation of a block validity, of which identity 
is a sub-problem, should be strictly ordered by adequate proof-of-work 
checks.

> We don't do this and I don't see how it would be relevant. If a peer 
provides any invalid message or otherwise violates the protocol it is 
simply dropped.
> 
> The "problematic" that I'm referring to is the reliance on the block hash 
as a message identifier, because it does not identify the message and 
cannot be useful in an effectively unlimited number of zero-cost cases.

Historically, it was to isolate transaction-relay from block-relay to 
optimistically harden in face of network partition, as this is easy to 
infer transaction-relay topology with a lot of heuristics.

I think this is correct that block hash message cannot be relied on as it 
cannot be useful in an unlimited number of zero-cost cases, as I was 
pointing that bitcoin core partially mitigate that with discouraging 
connections to block-relay peers servicing block messages 
(`MaybePunishNodeForBlocks`).

> #4 and #5 refer to "uncommitted" and "malleated committed". It may not be 
clear, but "uncommitted" means that the tx commitment is not valid (Merkle 
root doesn't match the header's value) and "malleated committed" means that 
the (matching) commitment cannot be relied upon because the txs represent 
malleation, invalidating the identifier. So neither of these are usable 
identifiers.
> 
> It seems you may be referring to "unconfirmed" txs as opposed to 
"uncommitted" txs. This doesn't pertain to tx storage or identifiers. 
Neither #7 nor #8 are usable for the same reasons.
> 
> I'm making no reference to tx malleability. This concerns only Merkle 
tree (block hash) malleability, the two types described in detail in the 
paper I referenced earlier, here again:
> 
> 
https://lists.linuxfoundation.org/pipermail/bitcoin-dev/attachments/20190225/a27d8837/attachment-0001.pdf

I believe somehow the bottleneck we're circling around is computationally 
definining what are the "usable" identifiers for block messages.
The most straightforward answer to this question is the full block in one 
single peer message, at least in my perspective.
Reality since headers first synchronization (`getheaders`), block 
validation has been dissociated in steps for performance reasons, among 
others.

> Again, this has no relation to tx hashes/identifiers. Libbitcoin has a tx 
pool, we just don't store them in RAM (memory).

> I don't follow this. An invalid 64 byte tx consensus rule would 
definitely not make it harder to exploit block message invalidity. In fact 
it would just slow down validation by adding a redundant rule. Furthermore, 
as I have detailed in a previous message, caching invalidity does 
absolutely nothing to increase protection. In fact it makes the situation 
materially worse.

Just to recall, in my understanding the proposal we're discussing is about 
outlawing 64 bytes size transactions at the consensus-level to minimize 
denial-of-service vectors during block validation. I think we're talking 
about each other because the mempool already introduce a layer of caching 
in bitcoin core, of which the result are re-used at block validation, such 
as signature verification results. I'm not sure we can fully waive apart 
performance considerations, though I agree implementation architecture 
subsystems like mempool should only be a sideline considerations.

> No, this is not the case. As I detailed in my previous message, there is 
no possible scenario where invalidation caching does anything but make the 
situation materially worse.

I think this can be correct that invalidation caching make the situation 
materially worse, or is denial-of-service neutral, as I believe a full node 
is only trading space for time resources in matters of block messages 
validation. I still believe such analysis, as detailed in your previous 
message, would benefit to be more detailed.

> On the other hand, just dealing with parse failure on the spot by 
introducing a leading pattern in the stream just inflates the size of p2p 
messages, and the transaction-relay bandwidth cost.

> I think you misunderstood me. I am suggesting no change to serialization. 
I can see how it might be unclear, but I said, "nothing precludes 
incorporating a requirement for a necessary leading pattern in the stream." 
I meant that the parser can simply incorporate the *requirement* that the 
byte stream starts with a null input point. That identifies the malleation 
or invalidity without a single hash operation and while only reading a 
handful of bytes. No change to any messages.

Indeed, this is clearer with the re-explanation above about what you meant 
by the "null point". In my understanding, you're suggesting the following 
algorithm:
- receive transaction p2p messages
- deserialize transaction p2p messages
- if the transaction is a coinbase candidate, verify null input point
- if null input point pattern invalid, reject the transaction

If I'm understanding correctly, the last rule has for effect to constraint 
the transaction space that can be used to brute-force and mount a Merkle 
root forgery with a 64-byte coinbase transaction.

As described in the 3.1.1 of the paper: 
https://lists.linuxfoundation.org/pipermail/bitcoin-dev/attachments/20190225/a27d8837/attachment-0001.pdf

> I'm referring to DoS mitigation (the only relevant security consideration 
here). I'm pointing out that invalidity caching is pointless in all cases, 
and in this case is the most pointless as type64 malleation is the cheapest 
of all invalidity to detect. I would prefer that all bogus blocks sent to 
my node are of this type. The worst types of invalidity detection have no 
mitigation and from a security standpoint are counterproductive to cache. 
I'm describing what overall is actually not a tradeoff. It's all negative 
and no positive.

I think we're both discussing the same issue about DoS mitigation for sure. 
Again, I think that saying the "invalidity caching" is pointless in all 
cases cannot be fully grounded as a statement without precising (a) what is 
the internal cache(s) layout of the full node processing block messages and 
(b) the sha256 mining resources available during N difficulty period and if 
any miner engage in self-fish mining like strategy.

About (a), I'll maintain my point I think it's a classic time-space 
trade-off to ponder in function of the internal cache layouts. About (b) I 
think we''ll be back to the headers synchronization strategy as implemented
by a full node to discuss if they're exploitable asymmetries for self-fish 
mining like strategies.

If you can give a pseudo-code example of the "null point" validation 
implementation in libbitcoin code (?) I think this can make the 
conversation more concrete on the caching aspect.

> Rust has its own set of problems. No need to get into a language Jihad 
here. My point was to clarify that the particular question was not about a 
C (or C++) null pointer value, either on the surface or underneath an 
abstraction.

Thanks for the additional comments on libbitcoin usage of dependencies, yes 
I don't think there is a need to get into a language jihad here. It's just 
like all languages have their memory model (stack, dynamic alloc, smart 
pointers, etc) and when you're talking about performance it's useful to 
have their minds, imho.

Best,
Antoine
ots hash: 058d7b3adb154a3e64d5f8ccf1944903bcd0c49dbb525f7212adf4f7ac7f8c55
Le mardi 9 juillet 2024 à 02:16:20 UTC+1, Eric Voskuil a écrit :

> > This is why we don't use C - unsafe, unclear, unnecessary.
>
> Actually, I think libbitcoin is using its own maintained fork of 
> secp256k1, which is written in C.
>
>
> We do not maintain secp256k1 code. For years that library carried the same 
> version, despite regular breaking changes to its API. This compelled us to 
> place these different versions on distinct git branches. When it finally 
> became versioned we started phasing this unfortunate practice out.
>
> Out of the 10 repositories and at least half million lines of code, apart 
> from an embedded copy of qrencode that we don’t independently maintain, I 
> believe there is only one .c file in use in the entire project - the 
> database mmap.c implementation for msvc builds. This includes hash 
> functions, with vectorization optimizations, etc.
>  
>
> For sure, I wouldn't recommend using C across a whole codebase as it's not 
> memory-safe (euphemism) though it's still un-match if you wish to 
> understand low-level memory management in hot paths.
>
>
> This is a commonly held misperception.
>
> It can be easier to use C++ or Rust, though it doesn't mean it will be as 
> (a) perf optimal and (b) hardened against side-channels.
>
>
> Rust has its own set of problems. No need to get into a language Jihad 
> here. My point was to clarify that the particular question was not about a 
> C (or C++) null pointer value, either on the surface or underneath an 
> abstraction.
>
> e 
>

-- 
You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bitcoindev/ac6cc3b8-43e5-4cd6-aabe-f5ffc4672812n%40googlegroups.com.