Re: [bitcoindev] Re: Great Consensus Cleanup Revival

public inbox for bitcoindev@googlegroups.com
 help / color / mirror / Atom feed

From: Antoine Riard <antoine.riard@gmail•com>
To: Bitcoin Development Mailing List <bitcoindev@googlegroups.com>
Subject: Re: [bitcoindev] Re: Great Consensus Cleanup Revival
Date: Mon, 1 Jul 2024 19:36:08 -0700 (PDT)	[thread overview]
Message-ID: <301c64c7-0f0f-476a-90c4-913659477276n@googlegroups.com> (raw)
In-Reply-To: <3dceca4d-03a8-44f3-be64-396702247fadn@googlegroups.com>


[-- Attachment #1.1: Type: text/plain, Size: 16921 bytes --]

Hi Eric,

> Ok, thanks for clarifying. I'm still not making the connection to 
"checking a non-null [C] pointer" but that's prob on me.

A C pointer, which is a language idiome assigning to a memory address A the 
value o memory address B can be 0 (or NULL a standard macro defined in 
stddef.h).

Here a snippet example of linked list code checking the pointer 
(`*begin_list`) is non null before the comparison operation to find the 
target element list.

```
pointer_t       ft_list_find(pointer_t **start_list, void *data_ref, int 
(*cmp)())
{
        while (*start_list)
        {
                if (cmp((*start_list)->data, data_ref) == 0)
                        return (*start_list);
                *start_list = (*start_list)->next;
        }
        return (0);
}
```

While both libbitcoin and bitcoin core are both written in c++, you still 
have underlying pointer derefencing playing out to access the coinbase
transaction, and all underlying implications in terms of memory management.

> Yes, a rough correlation but not necessarily equivalence. Note that 
block.check has context free and contextual overrides.
> 
> The 'bypass' parameter indicates a block under checkpoint or milestone 
("assume valid"). In this case we must check Merkle root, witness 
commitment, and both types of malleation - as the purpose is to establish 
identity. Absent 'bypass' the typical checks are performed, and therefore a 
malleation check is not required here. The "type64" malleation is subsumed 
by the is_first_non_coinbase check and the "type32" malleation is subsumed 
by the is_internal_double_spend check.

Yes, I understand it's not a 1-to-1 compatibility, just a rough logical 
equivalence.

I think it's interesting to point out the two types of malleation that a 
bitcoin consensus validation logic should respect w.r.t block validity 
checks.

Like you said the first one on the merkle root committed in the headers's 
`hashMerkleRoot` due to the lack of domain separation between leaf and 
merkle tree nodes.
The second one is the bip141 wtxid commitment in one of the coinbase 
transaction `scriptpubkey` output, which is itself covered by a txid in the 
merkle tree.

> Caching identity in the case of invalidity is more interesting question 
than it might seem.
> 
> Background: A fully-validated block has established identity in its block 
hash. However an invalid block message may include the same block header, 
producing the same hash, but with any kind of nonsense following the 
header. The purpose of the transaction and witness commitments is of course 
to establish this identity, so these two checks are therefore necessary 
even under checkpoint/milestone. And then of course the two Merkle tree 
issues complicate the tx commitment (the integrity of the witness 
commitment is assured by that of the tx commitment).
> 
> So what does it mean to speak of a block hash derived from:
> 
> (1) a block message with an unparseable header?
> (2) a block message with parseable but invalid header?
> (3) a block message with valid header but unparseable tx data?
> (4) a block message with valid header but parseable invalid uncommitted 
tx data?
> (5) a block message with valid header but parseable invalid malleated 
committed tx data?
> (6) a block message with valid header but parseable invalid unmalleated 
committed tx data?
> (7) a block message with valid header but uncommitted valid tx data?
> (8) a block message with valid header but malleated committed valid tx 
data?
> (9) a block message with valid header but unmalleated committed valid tx 
data?
> 
> Note that only the #9 p2p block message contains an actual Bitcoin block, 
the others are bogus messages. In all cases the message can be sha256 
hashed to establish the identity of the *message*. And if one's objective 
is to reject repeating bogus messages, this might be a useful strategy. 
It's already part of the p2p protocol, is orders of magnitude cheaper to 
produce than a Merkle root, and has no identity issues.

I think I mostly agree with the identity issue as laid out so far, there is 
one caveat to add if you're considering identity caching as the problem 
solved.
A validation node might have to consider differently block messages 
processed if they connect on the longest most PoW valid chain for which all 
blocks have been validated. Or alternatively if they have to be added on a 
candidate longest most PoW valid chain.

> The concept of Bitcoin block hash as unique identifier for invalid p2p 
block messages is problematic. Apart from the malleation question, what is 
the Bitcoin block
> hash for a message with unparseable data (#1 and #3)? Such messages are 
trivial to produce and have no block hash.

For reasons, bitcoin core has the concept of outbound `BLOCK_RELAY` (in 
`src/node/connection_types.h`) where some preferential peering policy is 
applied in matters of block messages download.

> What is the useful identifier for a block with malleated commitments (#5 
and #8) or invalid commitments (#4 and #7) - valid txs or otherwise?

The block header, as it commits to the transaction identifier tree can be 
useful as much for #4 and #5. On the bitcoin core side, about #7 the 
uncommitted valid tx data can be already present in the validation cache 
from mempool acceptance. About #8, the malleaed committed valid 
transactions shall be also committed in the merkle root in headers.

> This seems reasonable at first glance, but given the list of scenarios 
above, which does it apply to?

> This seems reasonable at first glance, but given the list of scenarios 
above, which does it apply to? Presumably the invalid header (#2) doesn't 
get this far because of headers-first.
> That leaves just invalid blocks with useful block hash identifiers (#6). 
In all other cases the message is simply discarded. In this case the 
attempt is to move category #5 into category #6 by prohibiting 64 byte txs.

Yes, it's moving from the category #5 to the category #6. Note, transaction 
malleability can be a distinct issue than lack of domain separation.

> The requirement to "avoid re-downloading and re-validating it" is about 
performance, presumably minimizing initial block download/catch-up time. 
There is a > computational cost to producing 64 byte malleations and none 
for any of the other bogus block message categories above, including the 
other form of malleation. > Furthermore, 64 byte malleation has almost zero 
cost to preclude. No hashing and not even true header or tx parsing are 
required. Only a handful of bytes must be read > from the raw message 
before it can be discarded presently.

> That's actually far cheaper than any of the other scenarios that again, 
have no cost to produce. The other type of malleation requires parsing all 
of the txs in the block and > hashing and comparing some or all of them. In 
other words, if there is an attack scenario, that must be addressed before 
this can be meaningful. In fact all of the other
> bogus message scenarios (with tx data) will remain more expensive to 
discard than this one.

In practice on the bitcoin core side, the bogus block message categories 
from #4 to #6 are already mitigated by validation caching for transactions 
that have been received early. While libbitcoin has no mempool (at least in 
earlier versions) transactions buffering can be done by bip152's 
HeadersAndShortIds message.

About #7 and #8, introducing a domain separation where 64 bytes 
transactions are rejected and making it harder to exploit #7 and #8 
categories of bogus block messages.
This is correct that bitcoin core might accept valid transaction data 
before the merkle tree commitment has been verified.

> The problem arises from trying to optimize dismissal by storing an 
identifier. Just *producing* the identifier is orders of magnitude more 
costly than simply dismissing this > bogus message. I can't imagine why any 
implementation would want to compute and store and retrieve and recompute 
and compare hashes when the alterative is just
> dismissing the bogus messages with no hashing at all.

> Bogus messages will arrive, they do not even have to be requested. The 
simplest are dealt with by parse failure. What defines a parse is entirely 
subjective. Generally it's
> "structural" but nothing precludes incorporating a requirement for a 
necessary leading pattern in the stream, sort of like how the witness 
pattern is identified. If we were
> going to prioritize early dismissal this is where we would put it.

I don't think this is that simple - While producing an identifier comes 
with a computational cost (e.g fixed 64-byte structured coinbase 
transaction), if the full node have a hierarchy of validation cache like 
bitcoin core has already, the cost of bogus block messages can be slashed 
down. On the other hand, just dealing with parse failure on the spot by 
introducing a leading pattern in the stream just inflates the size of p2p 
messages, and the transaction-relay bandwidth cost.

> However, there is a tradeoff in terms of early dismissal. Looking up 
invalid hashes is a costly tradeoff, which becomes multiplied by every 
block validated. For example,
> expending 1 millisecond in hash/lookup to save 1 second of validation 
time in the failure case seems like a reasonable tradeoff, until you 
multiply across the whole chain. > 1 ms becomes 14 minutes across the 
chain, just to save a second for each mallied block encountered. That means 
you need to have encountered 840 such mallied blocks > just to break even. 
Early dismissing the block for non-null coinbase point (without hashing 
anything) would be on the order of 1000x faster than that (breakeven at 1 > 
encounter). So why the block hash cache requirement? It cannot be applied 
to many scenarios, and cannot be optimal in this one.

I think what you're describing is more a classic time-space tradeoff which 
is well-known in classic computer science litterature. In my reasonable 
opinion, one should more reason under what is the security paradigm we wish 
for bitcoin block-relay network and perduring decentralization, i.e one 
where it's easy to verify block messages proofs which could have been 
generated on specialized hardware with an asymmetric cost. Obviously 
encountering 840 such malliead blocks to make it break even doesn't make 
the math up to save on hash lookup, unless you can reduce the attack 
scenario in terms of adversaries capabilities.

Best,
Antoine 
Le samedi 29 juin 2024 à 21:42:23 UTC+1, Eric Voskuil a écrit :

> Caching identity in the case of invalidity is more interesting question 
> than it might seem.
>
> Background: A fully-validated block has established identity in its block 
> hash. However an invalid block message may include the same block header, 
> producing the same hash, but with any kind of nonsense following the 
> header. The purpose of the transaction and witness commitments is of course 
> to establish this identity, so these two checks are therefore necessary 
> even under checkpoint/milestone. And then of course the two Merkle tree 
> issues complicate the tx commitment (the integrity of the witness 
> commitment is assured by that of the tx commitment).
>
> So what does it mean to speak of a block hash derived from:
>
> (1) a block message with an unparseable header?
> (2) a block message with parseable but invalid header?
> (3) a block message with valid header but unparseable tx data?
> (4) a block message with valid header but parseable invalid uncommitted tx 
> data?
> (5) a block message with valid header but parseable invalid malleated 
> committed tx data?
> (6) a block message with valid header but parseable invalid unmalleated 
> committed tx data?
> (7) a block message with valid header but uncommitted valid tx data?
> (8) a block message with valid header but malleated committed valid tx 
> data?
> (9) a block message with valid header but unmalleated committed valid tx 
> data?
>
> Note that only the #9 p2p block message contains an actual Bitcoin block, 
> the others are bogus messages. In all cases the message can be sha256 
> hashed to establish the identity of the *message*. And if one's objective 
> is to reject repeating bogus messages, this might be a useful strategy. 
> It's already part of the p2p protocol, is orders of magnitude cheaper to 
> produce than a Merkle root, and has no identity issues.
>
> The concept of Bitcoin block hash as unique identifier for invalid p2p 
> block messages is problematic. Apart from the malleation question, what is 
> the Bitcoin block hash for a message with unparseable data (#1 and #3)? 
> Such messages are trivial to produce and have no block hash. What is the 
> useful identifier for a block with malleated commitments (#5 and #8) or 
> invalid commitments (#4 and #7) - valid txs or otherwise?
>
> The stated objective for a consensus rule to invalidate all 64 byte txs is:
>
> > being able to cache the hash of a (non-malleated) invalid block as 
> permanently invalid to avoid re-downloading and re-validating it.
>
> This seems reasonable at first glance, but given the list of scenarios 
> above, which does it apply to? Presumably the invalid header (#2) doesn't 
> get this far because of headers-first. That leaves just invalid blocks with 
> useful block hash identifiers (#6). In all other cases the message is 
> simply discarded. In this case the attempt is to move category #5 into 
> category #6 by prohibiting 64 byte txs.
>
> The requirement to "avoid re-downloading and re-validating it" is about 
> performance, presumably minimizing initial block download/catch-up time. 
> There is a computational cost to producing 64 byte malleations and none for 
> any of the other bogus block message categories above, including the other 
> form of malleation. Furthermore, 64 byte malleation has almost zero cost to 
> preclude. No hashing and not even true header or tx parsing are required. 
> Only a handful of bytes must be read from the raw message before it can be 
> discarded presently.
>
> That's actually far cheaper than any of the other scenarios that again, 
> have no cost to produce. The other type of malleation requires parsing all 
> of the txs in the block and hashing and comparing some or all of them. In 
> other words, if there is an attack scenario, that must be addressed before 
> this can be meaningful. In fact all of the other bogus message scenarios 
> (with tx data) will remain more expensive to discard than this one.
>
> The problem arises from trying to optimize dismissal by storing an 
> identifier. Just *producing* the identifier is orders of magnitude more 
> costly than simply dismissing this bogus message. I can't imagine why any 
> implementation would want to compute and store and retrieve and recompute 
> and compare hashes when the alterative is just dismissing the bogus 
> messages with no hashing at all.
>
> Bogus messages will arrive, they do not even have to be requested. The 
> simplest are dealt with by parse failure. What defines a parse is entirely 
> subjective. Generally it's "structural" but nothing precludes incorporating 
> a requirement for a necessary leading pattern in the stream, sort of like 
> how the witness pattern is identified. If we were going to prioritize early 
> dismissal this is where we would put it.
>
> However, there is a tradeoff in terms of early dismissal. Looking up 
> invalid hashes is a costly tradeoff, which becomes multiplied by every 
> block validated. For example, expending 1 millisecond in hash/lookup to 
> save 1 second of validation time in the failure case seems like a 
> reasonable tradeoff, until you multiply across the whole chain. 1 ms 
> becomes 14 minutes across the chain, just to save a second for each mallied 
> block encountered. That means you need to have encountered 840 such mallied 
> blocks just to break even. Early dismissing the block for non-null coinbase 
> point (without hashing anything) would be on the order of 1000x faster than 
> that (breakeven at 1 encounter). So why the block hash cache requirement? 
> It cannot be applied to many scenarios, and cannot be optimal in this one.
>
> Eric
>

-- 
You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+unsubscribe@googlegroups•com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bitcoindev/301c64c7-0f0f-476a-90c4-913659477276n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 17684 bytes --]

next prev parent reply	other threads:[~2024-07-02  5:03 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-24 18:10 [bitcoindev] " 'Antoine Poinsot' via Bitcoin Development Mailing List
2024-03-26 19:11 ` [bitcoindev] " Antoine Riard
2024-03-27 10:35   ` 'Antoine Poinsot' via Bitcoin Development Mailing List
2024-03-27 18:57     ` Antoine Riard
2024-04-18  0:46     ` Mark F
2024-04-18 10:04       ` 'Antoine Poinsot' via Bitcoin Development Mailing List
2024-04-25  6:08         ` Antoine Riard
2024-04-30 22:20           ` Mark F
2024-05-06  1:10             ` Antoine Riard
2024-07-20 21:39     ` Murad Ali
2024-06-17 22:15 ` Eric Voskuil
2024-06-18  8:13   ` 'Antoine Poinsot' via Bitcoin Development Mailing List
2024-06-18 13:02     ` Eric Voskuil
2024-06-21 13:09       ` 'Antoine Poinsot' via Bitcoin Development Mailing List
2024-06-24  0:35         ` Eric Voskuil
2024-06-27  9:35           ` 'Antoine Poinsot' via Bitcoin Development Mailing List
2024-06-28 17:14             ` Eric Voskuil
2024-06-29  1:06               ` Antoine Riard
2024-06-29  1:31                 ` Eric Voskuil
2024-06-29  1:53                   ` Antoine Riard
2024-06-29 20:29                     ` Eric Voskuil
2024-06-29 20:40                       ` Eric Voskuil
2024-07-02  2:36                         ` Antoine Riard [this message]
2024-07-03  1:07                           ` Larry Ruane
2024-07-03 23:29                             ` Eric Voskuil
2024-07-04 13:20                               ` Antoine Riard
2024-07-04 14:45                                 ` Eric Voskuil
2024-07-18 17:39                                   ` Antoine Riard
2024-07-20 20:29                                     ` Eric Voskuil
2024-11-28  5:18                                       ` Antoine Riard
2024-07-03  1:13                           ` Eric Voskuil
2024-07-02 10:23               ` 'Antoine Poinsot' via Bitcoin Development Mailing List
2024-07-02 15:57                 ` Eric Voskuil

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=301c64c7-0f0f-476a-90c4-913659477276n@googlegroups.com \
    --to=antoine.riard@gmail$(echo .)com \
    --cc=bitcoindev@googlegroups.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox