From: Greg Maxwell <gmaxwell@gmail•com>
To: Bitcoin Development Mailing List <bitcoindev@googlegroups.com>
Subject: [bitcoindev] Re: SwiftSync - smarter synchronization with hints
Date: Thu, 1 May 2025 23:47:20 -0700 (PDT) [thread overview]
Message-ID: <cc2dfa79-89f0-4170-9725-894ea189a0e2n@googlegroups.com> (raw)
In-Reply-To: <CAPv7TjaM0tfbcBTRa0_713Bk6Y9jr+ShOC1KZi2V3V2zooTXyg@mail.gmail.com>
[-- Attachment #1.1: Type: text/plain, Size: 5733 bytes --]
This sounds like an excellent idea that preserves the assume valid-like
security properties but with more performance and more optimization
oppturnities.
I like particularly -- if I understand it correctly-- that the hints
themselves are not security relevant, if they're wrong you'll just fail
rather than end up with something incorrect. Also I like the lack of
basically anything being normative, it's easier to feel comfortable with
something when you won't be stuck with it forever... the whole scheme could
just be reworked every version with no particular harm because its effects
are all ephemeral. At worst you might have problems if you started IBD
with one version and tried to complete it with another.
I haven't seen any code, but if hash() is a MD style hash function such as
sha256 you should place the salt first so that it can be absorbed then the
midstate reused. For sha256 you could get potentially double the hashing
performance and I assume/hope that hash is actually fairly high in the
profile cost.. maybe more like a 1/3 improvement considering the size of
the entry, care should be taken to try to minimize compression function
runs. ... but at least the utxo representation there is entirely
implementation defined.
You may be able to optimize the size of the hints further with the
observation that false positives are harmless as long as they're rare, as
in you can save some extra txouts and if you later find them being spent,
then just go ahead and remove them. So for example ribbon filters give you
a memory space and read efficient hash table that is constructed by solving
a linear system to make sure all the inputs match. One could solve for
successively larger filters to target a specific false positive rate. (I
mean just a 2,4 cuckoo filter or similar would also work, but that's two
memory accesses required and you don't actually need runtime modification).
On Wednesday, April 9, 2025 at 10:11:29 AM UTC Ruben Somsen wrote:
> Hi everyone,
>
> SwiftSync is a new validation method that allows for near-stateless, fully
> parallelizable validation of the Bitcoin blockchain via hints about which
> outputs remain unspent (<100MB total). All other inputs/outputs are
> efficiently crossed off inside a single hash aggregate that only reaches
> zero if validation was successful and the hints were correct.
>
> The main observation is that it can be much easier to validate that a
> given UTXO set is correct than to compute it yourself. It allows us to no
> longer require a stateful moment-to-moment UTXO set during IBD and allows
> everything to be order independent. I'll briefly summarize the protocol,
> before sharing the link to the full write-up.
>
> Each output gets a boolean hint (e.g. committed into Bitcoin Core) about
> whether or not it will still be in the UTXO set after validation completes.
> If it does, we write it to disk (append-only - it won't be used until
> SwiftSync finishes). If it does not, we hash the UTXO data and add it to an
> aggregate. For each input, we once again hash the UTXO data and remove it
> from the aggregate.
>
> At the end, for every added output there should have been exactly one
> removed input, bringing the end total of the aggregate to zero. If this is
> indeed the case, we will have validated that the hints, and the resulting
> UTXO set, were correct.
>
> E.g. For spent outputs A, B and inputs C, D we calculate
> hash(UTXO_A||salt) + hash(UTXO_B||salt) - hash(UTXO_C||salt) -
> hash(UTXO_D||salt) == 0 (proving (A==C && B==D) || (A==D && B==C)).
>
> There is one missing step. The UTXO data is only available when processing
> the output, but not when processing the input. We resolve this by either
> downloading the outputs that were spent for each block (equivalent to the
> undo data, maybe 10-15% of a block), or we lean on assumevalid, making it
> sufficient to only hash the outpoints (which are available in both the
> output and input) rather than the full UTXO data.
>
> Ignoring bandwidth, the expectation is that the speedup will be most
> significant on either RAM limited devices and/or devices with many CPU
> cores. Initial PoC benchmarks (thanks to theStack) show a 5.28x speed-up,
> while currently still being largely sequential.
>
> Many more details are in the full write-up:
> https://gist.github.com/RubenSomsen/a61a37d14182ccd78760e477c78133cd
>
> It will answer the following questions (and more):
>
> - How the hash aggregate can be made secure against the generalized
> birthday problem
> - How exactly assumevalid is utilized and what the implications are
> - How we can still check for inflation when we skip the amounts with
> assumevalid
> - How we can validate transaction order while doing everything in parallel
> - How we can perform the BIP30 duplicate output check without the UTXO set
> - How this all relates to assumeutxo
>
> To my knowledge, every validation check involving the UTXO set is covered,
> but I'd be curious to hear if anything was overlooked or if you spot any
> other issues.
>
> Thanks for reading, and thanks to everyone who provided invaluable
> feedback while the idea was coming together.
>
> -- Ruben Somsen
>
--
You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+unsubscribe@googlegroups•com.
To view this discussion visit https://groups.google.com/d/msgid/bitcoindev/cc2dfa79-89f0-4170-9725-894ea189a0e2n%40googlegroups.com.
[-- Attachment #1.2: Type: text/html, Size: 6604 bytes --]
next prev parent reply other threads:[~2025-05-02 6:48 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-09 10:10 [bitcoindev] " Ruben Somsen
2025-05-02 6:47 ` Greg Maxwell [this message]
2025-05-02 10:59 ` [bitcoindev] " Ruben Somsen
2025-05-02 13:38 ` Saint Wenhao
2025-05-02 16:07 ` Greg Maxwell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cc2dfa79-89f0-4170-9725-894ea189a0e2n@googlegroups.com \
--to=gmaxwell@gmail$(echo .)com \
--cc=bitcoindev@googlegroups.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox