public inbox for bitcoindev@googlegroups.com
 help / color / mirror / Atom feed
* [bitcoin-dev] Breaking change in calculation of hash_serialized_2
@ 2023-10-20 17:19 Fabian
  2023-10-20 17:34 ` Peter Todd
  0 siblings, 1 reply; 3+ messages in thread
From: Fabian @ 2023-10-20 17:19 UTC (permalink / raw)
  To: bitcoin-dev

[-- Attachment #1: Type: text/plain, Size: 2470 bytes --]

Hello list,

on Wednesday I found a potential malleability issue in the UTXO set dump files
generated for and used by assumeutxo [1]. On Thursday morning theStack had
found the cause of the issue [2]: A bug in the serialization of UTXOs for the
calculation of hash_serialized_2. This is the value used by Bitcoin Core to
check if the UTXO set loaded from a dump file matches what is expected. The
value of hash_serialized_2 expected for a particular block is hardcoded into
the chainparams of each chain.

Implications:
We have been working on a fix [3] for the serialization and aim to include it
in v26.0 (aimed to be released in November). The serialization must change
which means that all historical UTXO set hash results will change after you
upgrade your node to v26.0. To further highlight this, we will also increment
the version, i.e., the value returned in gettxoutset will be renamed to
hash_serialized_3.
It should also be noted that there were additional potentially problematic
issues found from fuzz testing by dergoegge which is why we decided to switch
the serialization completely rather than implementing a minimal fix. The
serialization format is now the same as used by MuHash.

How this may concern you:
1. If you are using hash_serialized_2 for any security critical purposes, you
should check if the bugs in the serialization code could cause issues for you.
You may switch to using hash_serialized_3 as soon as possible (or maybe
consider using MuHash).
2. If you are utilizing hash_serialized_2 for anything critical in your project
in general and require time to upgrade and adapt to the change described above,
please let us know. While we usually try to avoid breaking changes in our APIs without deprecation warning, we currently tend to think it is not necessary to
keep the buggy hash_serialized_2 around since we don’t know of any substantial
use cases and using it may even pose security risks. Furthermore, keeping the
old code around comes at some additional review and maintenance burden and may
lead to some delay in the release of v26.0. But we are happy to reconsider if
keeping hash_serialized_2 around holds serious value for downstream projects.

Feel free to reach out to me directly or comment in the PR [3] or here on the
list.

Cheers,
Fabian

[1] https://github.com/bitcoin/bitcoin/issues/28675
[2] https://github.com/bitcoin/bitcoin/issues/28675#issuecomment-1770389468[3] https://github.com/bitcoin/bitcoin/pull/28685

[-- Attachment #2: Type: text/html, Size: 4039 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [bitcoin-dev] Breaking change in calculation of hash_serialized_2
  2023-10-20 17:19 [bitcoin-dev] Breaking change in calculation of hash_serialized_2 Fabian
@ 2023-10-20 17:34 ` Peter Todd
  2023-10-20 22:01   ` Fabian
  0 siblings, 1 reply; 3+ messages in thread
From: Peter Todd @ 2023-10-20 17:34 UTC (permalink / raw)
  To: Fabian, Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 1466 bytes --]

On Fri, Oct 20, 2023 at 05:19:19PM +0000, Fabian via bitcoin-dev wrote:
> Hello list,
> 
> on Wednesday I found a potential malleability issue in the UTXO set dump files
> generated for and used by assumeutxo [1]. On Thursday morning theStack had
> found the cause of the issue [2]: A bug in the serialization of UTXOs for the
> calculation of hash_serialized_2. This is the value used by Bitcoin Core to
> check if the UTXO set loaded from a dump file matches what is expected. The
> value of hash_serialized_2 expected for a particular block is hardcoded into
> the chainparams of each chain.

<snip>

> [1] https://github.com/bitcoin/bitcoin/issues/28675
> [2] https://github.com/bitcoin/bitcoin/issues/28675#issuecomment-1770389468[3] https://github.com/bitcoin/bitcoin/pull/28685

James made the following comment on the above issue:

> Wow, good find @fjahr et al. I wonder if there's any value in committing to a
> sha256sum of the snapshot file itself in the source code as a
> belt-and-suspenders remediation for issues like this.

Why *isn't* the sha256 hash of the snapshot file itself the canonical hash?
That would obviously eliminate any malleability issues. gettxoutsetinfo already
has to walk the entire UTXO set to calculate the hash. Making it simply
generate the actual contents of the dump file and calculate the hash of it is
the obvious way to implement this.

-- 
https://petertodd.org 'peter'[:-1]@petertodd.org

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [bitcoin-dev] Breaking change in calculation of hash_serialized_2
  2023-10-20 17:34 ` Peter Todd
@ 2023-10-20 22:01   ` Fabian
  0 siblings, 0 replies; 3+ messages in thread
From: Fabian @ 2023-10-20 22:01 UTC (permalink / raw)
  To: Peter Todd; +Cc: Bitcoin Protocol Discussion

Hi Peter,

to my knowledge, this was never considered as an option previously (James correct me if I am wrong). At least I couldn't find any reference to that in the original proposal [1] and I can not remember it being discussed since I have followed the project more closely (ca. 2020).

Here are the reasons that I can think of why that might be the case:

- If the serialization and hashing of the UTXO set works as intended, that hash should be working just as well as the flat file hash and hash_serialized_2 certainly was assumed to be robust since it has been around for a very long time. So it may simply have been viewed as additional overhead.
- We may want to optimize the serialization of data to file further, adding compression, etc. to have smaller files that result in the same UTXO set without having to change the chainparams committing to that UTXO set or potentially having multiple file hashes for the same block.
- We may want to introduce other file hashing strategies instead that are more optimized for P2P sharing of the UTXO snapshots. P2P sharing the UTXO set has always been part of the idea of assumeutxo but so far it hasn't been explored very deeply. For more on this see the conversation on IRC that started in the meeting yesterday between sipa, aj et al [2][3].

Cheers,
Fabian

[1] https://github.com/jamesob/assumeutxo-docs/tree/2019-04-proposal/proposal
[2] https://bitcoin-irc.chaincode.com/bitcoin-core-dev/2023-10-19#976439;
[3] https://bitcoin-irc.chaincode.com/bitcoin-core-dev/2023-10-20#976636;

------- Original Message -------
On Friday, October 20th, 2023 at 7:34 PM, Peter Todd <pete@petertodd•org> wrote:


> On Fri, Oct 20, 2023 at 05:19:19PM +0000, Fabian via bitcoin-dev wrote:
> 
> > Hello list,
> > 
> > on Wednesday I found a potential malleability issue in the UTXO set dump files
> > generated for and used by assumeutxo [1]. On Thursday morning theStack had
> > found the cause of the issue [2]: A bug in the serialization of UTXOs for the
> > calculation of hash_serialized_2. This is the value used by Bitcoin Core to
> > check if the UTXO set loaded from a dump file matches what is expected. The
> > value of hash_serialized_2 expected for a particular block is hardcoded into
> > the chainparams of each chain.
> 
> 
> <snip>
> 
> > [1] https://github.com/bitcoin/bitcoin/issues/28675
> > [2] https://github.com/bitcoin/bitcoin/issues/28675#issuecomment-1770389468[3] https://github.com/bitcoin/bitcoin/pull/28685
> 
> 
> James made the following comment on the above issue:
> 
> > Wow, good find @fjahr et al. I wonder if there's any value in committing to a
> > sha256sum of the snapshot file itself in the source code as a
> > belt-and-suspenders remediation for issues like this.
> 
> 
> Why isn't the sha256 hash of the snapshot file itself the canonical hash?
> That would obviously eliminate any malleability issues. gettxoutsetinfo already
> has to walk the entire UTXO set to calculate the hash. Making it simply
> generate the actual contents of the dump file and calculate the hash of it is
> the obvious way to implement this.
> 
> --
> https://petertodd.org 'peter'[:-1]@petertodd.org


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-10-20 22:02 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-20 17:19 [bitcoin-dev] Breaking change in calculation of hash_serialized_2 Fabian
2023-10-20 17:34 ` Peter Todd
2023-10-20 22:01   ` Fabian

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox