From: Eric Voskuil <eric@voskuil•org>
To: Bitcoin Development Mailing List <bitcoindev@googlegroups.com>
Subject: Re: [bitcoindev] Re: Great Consensus Cleanup Revival
Date: Sat, 29 Jun 2024 13:40:39 -0700 (PDT) [thread overview]
Message-ID: <3dceca4d-03a8-44f3-be64-396702247fadn@googlegroups.com> (raw)
In-Reply-To: <607a2233-ac12-4a80-ae4a-08341b3549b3n@googlegroups.com>
[-- Attachment #1.1: Type: text/plain, Size: 5861 bytes --]
Caching identity in the case of invalidity is more interesting question
than it might seem.
Background: A fully-validated block has established identity in its block
hash. However an invalid block message may include the same block header,
producing the same hash, but with any kind of nonsense following the
header. The purpose of the transaction and witness commitments is of course
to establish this identity, so these two checks are therefore necessary
even under checkpoint/milestone. And then of course the two Merkle tree
issues complicate the tx commitment (the integrity of the witness
commitment is assured by that of the tx commitment).
So what does it mean to speak of a block hash derived from:
(1) a block message with an unparseable header?
(2) a block message with parseable but invalid header?
(3) a block message with valid header but unparseable tx data?
(4) a block message with valid header but parseable invalid uncommitted tx
data?
(5) a block message with valid header but parseable invalid malleated
committed tx data?
(6) a block message with valid header but parseable invalid unmalleated
committed tx data?
(7) a block message with valid header but uncommitted valid tx data?
(8) a block message with valid header but malleated committed valid tx data?
(9) a block message with valid header but unmalleated committed valid tx
data?
Note that only the #9 p2p block message contains an actual Bitcoin block,
the others are bogus messages. In all cases the message can be sha256
hashed to establish the identity of the *message*. And if one's objective
is to reject repeating bogus messages, this might be a useful strategy.
It's already part of the p2p protocol, is orders of magnitude cheaper to
produce than a Merkle root, and has no identity issues.
The concept of Bitcoin block hash as unique identifier for invalid p2p
block messages is problematic. Apart from the malleation question, what is
the Bitcoin block hash for a message with unparseable data (#1 and #3)?
Such messages are trivial to produce and have no block hash. What is the
useful identifier for a block with malleated commitments (#5 and #8) or
invalid commitments (#4 and #7) - valid txs or otherwise?
The stated objective for a consensus rule to invalidate all 64 byte txs is:
> being able to cache the hash of a (non-malleated) invalid block as
permanently invalid to avoid re-downloading and re-validating it.
This seems reasonable at first glance, but given the list of scenarios
above, which does it apply to? Presumably the invalid header (#2) doesn't
get this far because of headers-first. That leaves just invalid blocks with
useful block hash identifiers (#6). In all other cases the message is
simply discarded. In this case the attempt is to move category #5 into
category #6 by prohibiting 64 byte txs.
The requirement to "avoid re-downloading and re-validating it" is about
performance, presumably minimizing initial block download/catch-up time.
There is a computational cost to producing 64 byte malleations and none for
any of the other bogus block message categories above, including the other
form of malleation. Furthermore, 64 byte malleation has almost zero cost to
preclude. No hashing and not even true header or tx parsing are required.
Only a handful of bytes must be read from the raw message before it can be
discarded presently.
That's actually far cheaper than any of the other scenarios that again,
have no cost to produce. The other type of malleation requires parsing all
of the txs in the block and hashing and comparing some or all of them. In
other words, if there is an attack scenario, that must be addressed before
this can be meaningful. In fact all of the other bogus message scenarios
(with tx data) will remain more expensive to discard than this one.
The problem arises from trying to optimize dismissal by storing an
identifier. Just *producing* the identifier is orders of magnitude more
costly than simply dismissing this bogus message. I can't imagine why any
implementation would want to compute and store and retrieve and recompute
and compare hashes when the alterative is just dismissing the bogus
messages with no hashing at all.
Bogus messages will arrive, they do not even have to be requested. The
simplest are dealt with by parse failure. What defines a parse is entirely
subjective. Generally it's "structural" but nothing precludes incorporating
a requirement for a necessary leading pattern in the stream, sort of like
how the witness pattern is identified. If we were going to prioritize early
dismissal this is where we would put it.
However, there is a tradeoff in terms of early dismissal. Looking up
invalid hashes is a costly tradeoff, which becomes multiplied by every
block validated. For example, expending 1 millisecond in hash/lookup to
save 1 second of validation time in the failure case seems like a
reasonable tradeoff, until you multiply across the whole chain. 1 ms
becomes 14 minutes across the chain, just to save a second for each mallied
block encountered. That means you need to have encountered 840 such mallied
blocks just to break even. Early dismissing the block for non-null coinbase
point (without hashing anything) would be on the order of 1000x faster than
that (breakeven at 1 encounter). So why the block hash cache requirement?
It cannot be applied to many scenarios, and cannot be optimal in this one.
Eric
--
You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+unsubscribe@googlegroups•com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bitcoindev/3dceca4d-03a8-44f3-be64-396702247fadn%40googlegroups.com.
[-- Attachment #1.2: Type: text/html, Size: 6253 bytes --]
next prev parent reply other threads:[~2024-06-29 20:42 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-24 18:10 [bitcoindev] " 'Antoine Poinsot' via Bitcoin Development Mailing List
2024-03-26 19:11 ` [bitcoindev] " Antoine Riard
2024-03-27 10:35 ` 'Antoine Poinsot' via Bitcoin Development Mailing List
2024-03-27 18:57 ` Antoine Riard
2024-04-18 0:46 ` Mark F
2024-04-18 10:04 ` 'Antoine Poinsot' via Bitcoin Development Mailing List
2024-04-25 6:08 ` Antoine Riard
2024-04-30 22:20 ` Mark F
2024-05-06 1:10 ` Antoine Riard
2024-07-20 21:39 ` Murad Ali
2024-06-17 22:15 ` Eric Voskuil
2024-06-18 8:13 ` 'Antoine Poinsot' via Bitcoin Development Mailing List
2024-06-18 13:02 ` Eric Voskuil
2024-06-21 13:09 ` 'Antoine Poinsot' via Bitcoin Development Mailing List
2024-06-24 0:35 ` Eric Voskuil
2024-06-27 9:35 ` 'Antoine Poinsot' via Bitcoin Development Mailing List
2024-06-28 17:14 ` Eric Voskuil
2024-06-29 1:06 ` Antoine Riard
2024-06-29 1:31 ` Eric Voskuil
2024-06-29 1:53 ` Antoine Riard
2024-06-29 20:29 ` Eric Voskuil
2024-06-29 20:40 ` Eric Voskuil [this message]
2024-07-02 2:36 ` Antoine Riard
2024-07-03 1:07 ` Larry Ruane
2024-07-03 23:29 ` Eric Voskuil
2024-07-04 13:20 ` Antoine Riard
2024-07-04 14:45 ` Eric Voskuil
2024-07-18 17:39 ` Antoine Riard
2024-07-20 20:29 ` Eric Voskuil
2024-11-28 5:18 ` Antoine Riard
2024-07-03 1:13 ` Eric Voskuil
2024-07-02 10:23 ` 'Antoine Poinsot' via Bitcoin Development Mailing List
2024-07-02 15:57 ` Eric Voskuil
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3dceca4d-03a8-44f3-be64-396702247fadn@googlegroups.com \
--to=eric@voskuil$(echo .)org \
--cc=bitcoindev@googlegroups.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox