* [bitcoindev] SwiftSync - smarter synchronization with hints @ 2025-04-09 10:10 Ruben Somsen 2025-05-02 6:47 ` [bitcoindev] " Greg Maxwell 0 siblings, 1 reply; 5+ messages in thread From: Ruben Somsen @ 2025-04-09 10:10 UTC (permalink / raw) To: bitcoindev [-- Attachment #1: Type: text/plain, Size: 3481 bytes --] Hi everyone, SwiftSync is a new validation method that allows for near-stateless, fully parallelizable validation of the Bitcoin blockchain via hints about which outputs remain unspent (<100MB total). All other inputs/outputs are efficiently crossed off inside a single hash aggregate that only reaches zero if validation was successful and the hints were correct. The main observation is that it can be much easier to validate that a given UTXO set is correct than to compute it yourself. It allows us to no longer require a stateful moment-to-moment UTXO set during IBD and allows everything to be order independent. I'll briefly summarize the protocol, before sharing the link to the full write-up. Each output gets a boolean hint (e.g. committed into Bitcoin Core) about whether or not it will still be in the UTXO set after validation completes. If it does, we write it to disk (append-only - it won't be used until SwiftSync finishes). If it does not, we hash the UTXO data and add it to an aggregate. For each input, we once again hash the UTXO data and remove it from the aggregate. At the end, for every added output there should have been exactly one removed input, bringing the end total of the aggregate to zero. If this is indeed the case, we will have validated that the hints, and the resulting UTXO set, were correct. E.g. For spent outputs A, B and inputs C, D we calculate hash(UTXO_A||salt) + hash(UTXO_B||salt) - hash(UTXO_C||salt) - hash(UTXO_D||salt) == 0 (proving (A==C && B==D) || (A==D && B==C)). There is one missing step. The UTXO data is only available when processing the output, but not when processing the input. We resolve this by either downloading the outputs that were spent for each block (equivalent to the undo data, maybe 10-15% of a block), or we lean on assumevalid, making it sufficient to only hash the outpoints (which are available in both the output and input) rather than the full UTXO data. Ignoring bandwidth, the expectation is that the speedup will be most significant on either RAM limited devices and/or devices with many CPU cores. Initial PoC benchmarks (thanks to theStack) show a 5.28x speed-up, while currently still being largely sequential. Many more details are in the full write-up: https://gist.github.com/RubenSomsen/a61a37d14182ccd78760e477c78133cd It will answer the following questions (and more): - How the hash aggregate can be made secure against the generalized birthday problem - How exactly assumevalid is utilized and what the implications are - How we can still check for inflation when we skip the amounts with assumevalid - How we can validate transaction order while doing everything in parallel - How we can perform the BIP30 duplicate output check without the UTXO set - How this all relates to assumeutxo To my knowledge, every validation check involving the UTXO set is covered, but I'd be curious to hear if anything was overlooked or if you spot any other issues. Thanks for reading, and thanks to everyone who provided invaluable feedback while the idea was coming together. -- Ruben Somsen -- You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group. To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+unsubscribe@googlegroups•com. To view this discussion visit https://groups.google.com/d/msgid/bitcoindev/CAPv7TjaM0tfbcBTRa0_713Bk6Y9jr%2BShOC1KZi2V3V2zooTXyg%40mail.gmail.com. [-- Attachment #2: Type: text/html, Size: 4042 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* [bitcoindev] Re: SwiftSync - smarter synchronization with hints 2025-04-09 10:10 [bitcoindev] SwiftSync - smarter synchronization with hints Ruben Somsen @ 2025-05-02 6:47 ` Greg Maxwell 2025-05-02 10:59 ` Ruben Somsen 0 siblings, 1 reply; 5+ messages in thread From: Greg Maxwell @ 2025-05-02 6:47 UTC (permalink / raw) To: Bitcoin Development Mailing List [-- Attachment #1.1: Type: text/plain, Size: 5733 bytes --] This sounds like an excellent idea that preserves the assume valid-like security properties but with more performance and more optimization oppturnities. I like particularly -- if I understand it correctly-- that the hints themselves are not security relevant, if they're wrong you'll just fail rather than end up with something incorrect. Also I like the lack of basically anything being normative, it's easier to feel comfortable with something when you won't be stuck with it forever... the whole scheme could just be reworked every version with no particular harm because its effects are all ephemeral. At worst you might have problems if you started IBD with one version and tried to complete it with another. I haven't seen any code, but if hash() is a MD style hash function such as sha256 you should place the salt first so that it can be absorbed then the midstate reused. For sha256 you could get potentially double the hashing performance and I assume/hope that hash is actually fairly high in the profile cost.. maybe more like a 1/3 improvement considering the size of the entry, care should be taken to try to minimize compression function runs. ... but at least the utxo representation there is entirely implementation defined. You may be able to optimize the size of the hints further with the observation that false positives are harmless as long as they're rare, as in you can save some extra txouts and if you later find them being spent, then just go ahead and remove them. So for example ribbon filters give you a memory space and read efficient hash table that is constructed by solving a linear system to make sure all the inputs match. One could solve for successively larger filters to target a specific false positive rate. (I mean just a 2,4 cuckoo filter or similar would also work, but that's two memory accesses required and you don't actually need runtime modification). On Wednesday, April 9, 2025 at 10:11:29 AM UTC Ruben Somsen wrote: > Hi everyone, > > SwiftSync is a new validation method that allows for near-stateless, fully > parallelizable validation of the Bitcoin blockchain via hints about which > outputs remain unspent (<100MB total). All other inputs/outputs are > efficiently crossed off inside a single hash aggregate that only reaches > zero if validation was successful and the hints were correct. > > The main observation is that it can be much easier to validate that a > given UTXO set is correct than to compute it yourself. It allows us to no > longer require a stateful moment-to-moment UTXO set during IBD and allows > everything to be order independent. I'll briefly summarize the protocol, > before sharing the link to the full write-up. > > Each output gets a boolean hint (e.g. committed into Bitcoin Core) about > whether or not it will still be in the UTXO set after validation completes. > If it does, we write it to disk (append-only - it won't be used until > SwiftSync finishes). If it does not, we hash the UTXO data and add it to an > aggregate. For each input, we once again hash the UTXO data and remove it > from the aggregate. > > At the end, for every added output there should have been exactly one > removed input, bringing the end total of the aggregate to zero. If this is > indeed the case, we will have validated that the hints, and the resulting > UTXO set, were correct. > > E.g. For spent outputs A, B and inputs C, D we calculate > hash(UTXO_A||salt) + hash(UTXO_B||salt) - hash(UTXO_C||salt) - > hash(UTXO_D||salt) == 0 (proving (A==C && B==D) || (A==D && B==C)). > > There is one missing step. The UTXO data is only available when processing > the output, but not when processing the input. We resolve this by either > downloading the outputs that were spent for each block (equivalent to the > undo data, maybe 10-15% of a block), or we lean on assumevalid, making it > sufficient to only hash the outpoints (which are available in both the > output and input) rather than the full UTXO data. > > Ignoring bandwidth, the expectation is that the speedup will be most > significant on either RAM limited devices and/or devices with many CPU > cores. Initial PoC benchmarks (thanks to theStack) show a 5.28x speed-up, > while currently still being largely sequential. > > Many more details are in the full write-up: > https://gist.github.com/RubenSomsen/a61a37d14182ccd78760e477c78133cd > > It will answer the following questions (and more): > > - How the hash aggregate can be made secure against the generalized > birthday problem > - How exactly assumevalid is utilized and what the implications are > - How we can still check for inflation when we skip the amounts with > assumevalid > - How we can validate transaction order while doing everything in parallel > - How we can perform the BIP30 duplicate output check without the UTXO set > - How this all relates to assumeutxo > > To my knowledge, every validation check involving the UTXO set is covered, > but I'd be curious to hear if anything was overlooked or if you spot any > other issues. > > Thanks for reading, and thanks to everyone who provided invaluable > feedback while the idea was coming together. > > -- Ruben Somsen > -- You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group. To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+unsubscribe@googlegroups•com. To view this discussion visit https://groups.google.com/d/msgid/bitcoindev/cc2dfa79-89f0-4170-9725-894ea189a0e2n%40googlegroups.com. [-- Attachment #1.2: Type: text/html, Size: 6604 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [bitcoindev] Re: SwiftSync - smarter synchronization with hints 2025-05-02 6:47 ` [bitcoindev] " Greg Maxwell @ 2025-05-02 10:59 ` Ruben Somsen 2025-05-02 13:38 ` Saint Wenhao 2025-05-02 16:07 ` Greg Maxwell 0 siblings, 2 replies; 5+ messages in thread From: Ruben Somsen @ 2025-05-02 10:59 UTC (permalink / raw) To: Greg Maxwell; +Cc: Bitcoin Development Mailing List [-- Attachment #1: Type: text/plain, Size: 8613 bytes --] Hi Greg, I appreciate your thoughts. >excellent idea that preserves the assume valid-like security properties Thanks. Yeah, the assumevalid version is particularly efficient. Without assumevalid there's a bandwidth tradeoff, but that also clearly seems worth it. >Also I like the lack of basically anything being normative, it's easier to feel comfortable with something when you won't be stuck with it forever Yes, that is a very pleasant property. There is essentially no difference between the end state that you reach with or without SwiftSync. >the hints themselves are not security relevant, if they're wrong you'll just fail I agree. >sha256 you should place the salt first so that it can be absorbed then the midstate reused Nice. That is a very good point that I had not yet considered. >false positives are harmless as long as they're rare, as in you can save some extra txouts and if you later find them being spent, then just go ahead and remove them I don't see a practical way to do this without compromising the benefits of SwiftSync because of the "later find them being spent" part. For one, it would require doing a lookup for each input to see if it's in your UTXO set, something we currently don't need to do at all. Secondly, it would require blocks to be processed in order, limiting parallelization. The space savings would also be negligible, considering the hint data is already <100MB (compressed). I think most of the remaining optimization potential (other than the engineering work to enable parallel validation) is in the hash aggregate, like the midstate reuse. Is there a faster alternative to sha256? Can we get away with 16 byte hashes? I could be mistaken, but in theory it seems we could even get away with 4 byte hashes if we allowed for a 1 in 4 billion chance of accidentally accepting an invalid chain. Leaving consensus to chance feels philosophically improper, but it's a pretty safe bet considering it also involves PoW to trigger and even then would only affect one or two random unlucky people on Earth. Thanks for chiming in. Cheers, Ruben On Fri, May 2, 2025 at 8:48 AM Greg Maxwell <gmaxwell@gmail•com> wrote: > This sounds like an excellent idea that preserves the assume valid-like > security properties but with more performance and more optimization > oppturnities. > > I like particularly -- if I understand it correctly-- that the hints > themselves are not security relevant, if they're wrong you'll just fail > rather than end up with something incorrect. Also I like the lack of > basically anything being normative, it's easier to feel comfortable with > something when you won't be stuck with it forever... the whole scheme could > just be reworked every version with no particular harm because its effects > are all ephemeral. At worst you might have problems if you started IBD > with one version and tried to complete it with another. > > I haven't seen any code, but if hash() is a MD style hash function such > as sha256 you should place the salt first so that it can be absorbed then > the midstate reused. For sha256 you could get potentially double the > hashing performance and I assume/hope that hash is actually fairly high in > the profile cost.. maybe more like a 1/3 improvement considering the size > of the entry, care should be taken to try to minimize compression function > runs. ... but at least the utxo representation there is entirely > implementation defined. > > You may be able to optimize the size of the hints further with the > observation that false positives are harmless as long as they're rare, as > in you can save some extra txouts and if you later find them being spent, > then just go ahead and remove them. So for example ribbon filters give you > a memory space and read efficient hash table that is constructed by solving > a linear system to make sure all the inputs match. One could solve for > successively larger filters to target a specific false positive rate. (I > mean just a 2,4 cuckoo filter or similar would also work, but that's two > memory accesses required and you don't actually need runtime modification). > > > > > On Wednesday, April 9, 2025 at 10:11:29 AM UTC Ruben Somsen wrote: > >> Hi everyone, >> >> SwiftSync is a new validation method that allows for near-stateless, >> fully parallelizable validation of the Bitcoin blockchain via hints about >> which outputs remain unspent (<100MB total). All other inputs/outputs are >> efficiently crossed off inside a single hash aggregate that only reaches >> zero if validation was successful and the hints were correct. >> >> The main observation is that it can be much easier to validate that a >> given UTXO set is correct than to compute it yourself. It allows us to no >> longer require a stateful moment-to-moment UTXO set during IBD and allows >> everything to be order independent. I'll briefly summarize the protocol, >> before sharing the link to the full write-up. >> >> Each output gets a boolean hint (e.g. committed into Bitcoin Core) about >> whether or not it will still be in the UTXO set after validation completes. >> If it does, we write it to disk (append-only - it won't be used until >> SwiftSync finishes). If it does not, we hash the UTXO data and add it to an >> aggregate. For each input, we once again hash the UTXO data and remove it >> from the aggregate. >> >> At the end, for every added output there should have been exactly one >> removed input, bringing the end total of the aggregate to zero. If this is >> indeed the case, we will have validated that the hints, and the resulting >> UTXO set, were correct. >> >> E.g. For spent outputs A, B and inputs C, D we calculate >> hash(UTXO_A||salt) + hash(UTXO_B||salt) - hash(UTXO_C||salt) - >> hash(UTXO_D||salt) == 0 (proving (A==C && B==D) || (A==D && B==C)). >> >> There is one missing step. The UTXO data is only available when >> processing the output, but not when processing the input. We resolve this >> by either downloading the outputs that were spent for each block >> (equivalent to the undo data, maybe 10-15% of a block), or we lean on >> assumevalid, making it sufficient to only hash the outpoints (which are >> available in both the output and input) rather than the full UTXO data. >> >> Ignoring bandwidth, the expectation is that the speedup will be most >> significant on either RAM limited devices and/or devices with many CPU >> cores. Initial PoC benchmarks (thanks to theStack) show a 5.28x speed-up, >> while currently still being largely sequential. >> >> Many more details are in the full write-up: >> https://gist.github.com/RubenSomsen/a61a37d14182ccd78760e477c78133cd >> >> It will answer the following questions (and more): >> >> - How the hash aggregate can be made secure against the generalized >> birthday problem >> - How exactly assumevalid is utilized and what the implications are >> - How we can still check for inflation when we skip the amounts with >> assumevalid >> - How we can validate transaction order while doing everything in parallel >> - How we can perform the BIP30 duplicate output check without the UTXO set >> - How this all relates to assumeutxo >> >> To my knowledge, every validation check involving the UTXO set is >> covered, but I'd be curious to hear if anything was overlooked or if you >> spot any other issues. >> >> Thanks for reading, and thanks to everyone who provided invaluable >> feedback while the idea was coming together. >> >> -- Ruben Somsen >> > -- > You received this message because you are subscribed to the Google Groups > "Bitcoin Development Mailing List" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to bitcoindev+unsubscribe@googlegroups•com. > To view this discussion visit > https://groups.google.com/d/msgid/bitcoindev/cc2dfa79-89f0-4170-9725-894ea189a0e2n%40googlegroups.com > <https://groups.google.com/d/msgid/bitcoindev/cc2dfa79-89f0-4170-9725-894ea189a0e2n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group. To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+unsubscribe@googlegroups•com. To view this discussion visit https://groups.google.com/d/msgid/bitcoindev/CAPv7TjaDGr4HCdQ0rR6_ma5zh2umU9r3_529szdswn_GjjnuCw%40mail.gmail.com. [-- Attachment #2: Type: text/html, Size: 10104 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [bitcoindev] Re: SwiftSync - smarter synchronization with hints 2025-05-02 10:59 ` Ruben Somsen @ 2025-05-02 13:38 ` Saint Wenhao 2025-05-02 16:07 ` Greg Maxwell 1 sibling, 0 replies; 5+ messages in thread From: Saint Wenhao @ 2025-05-02 13:38 UTC (permalink / raw) To: Ruben Somsen; +Cc: Greg Maxwell, Bitcoin Development Mailing List [-- Attachment #1: Type: text/plain, Size: 10945 bytes --] > >sha256 you should place the salt first so that it can be absorbed then the midstate reused > Nice. That is a very good point that I had not yet considered. Basically, if you put salt first, and if the size of the salt will be aligned with the size of SHA-256 internal chunks, then you will reach tagged hashes (where salt is just a tag's name). > Is there a faster alternative to sha256? Even if it is, then if you introduce another hash function, which is not used anywhere else, then it will be another moving part, which means more complexity. > Can we get away with 16 byte hashes? If you want to get just that, then you can always strip SHA-256 output, and take just the first/last N bytes. > I could be mistaken, but in theory it seems we could even get away with 4 byte hashes if we allowed for a 1 in 4 billion chance of accidentally accepting an invalid chain. Grinding 2^32 values seems to be too easy (it is just network difficulty equal to one, in case of double SHA-256). Because then, even if someone will not compromise the system, by grinding a matching hash, then still: it can slow things down. And also, if you take the sum of hashes, which should be finally zero, then by grinding UTXOs, someone could make it zero, even if there is some mismatch. And just for that reason, you probably want something bigger than 32-bit, and something at least big enough to be collision and preimage resistant. pt., 2 maj 2025 o 13:01 Ruben Somsen <rsomsen@gmail•com> napisał(a): > Hi Greg, > > I appreciate your thoughts. > > > >excellent idea that preserves the assume valid-like security properties > > Thanks. Yeah, the assumevalid version is particularly efficient. Without > assumevalid there's a bandwidth tradeoff, but that also clearly seems worth > it. > > > >Also I like the lack of basically anything being normative, it's easier > to feel comfortable with something when you won't be stuck with it forever > > Yes, that is a very pleasant property. There is essentially no difference > between the end state that you reach with or without SwiftSync. > > > >the hints themselves are not security relevant, if they're wrong you'll > just fail > > I agree. > > > >sha256 you should place the salt first so that it can be absorbed then > the midstate reused > > Nice. That is a very good point that I had not yet considered. > > > >false positives are harmless as long as they're rare, as in you can save > some extra txouts and if you later find them being spent, then just go > ahead and remove them > > I don't see a practical way to do this without compromising the benefits > of SwiftSync because of the "later find them being spent" part. For one, it > would require doing a lookup for each input to see if it's in your UTXO > set, something we currently don't need to do at all. Secondly, it would > require blocks to be processed in order, limiting parallelization. The > space savings would also be negligible, considering the hint data is > already <100MB (compressed). > > > I think most of the remaining optimization potential (other than the > engineering work to enable parallel validation) is in the hash > aggregate, like the midstate reuse. Is there a faster alternative to > sha256? Can we get away with 16 byte hashes? I could be mistaken, but in > theory it seems we could even get away with 4 byte hashes if we allowed for > a 1 in 4 billion chance of accidentally accepting an invalid chain. Leaving > consensus to chance feels philosophically improper, but it's a pretty safe > bet considering it also involves PoW to trigger and even then would only > affect one or two random unlucky people on Earth. > > > Thanks for chiming in. > > Cheers, > Ruben > > On Fri, May 2, 2025 at 8:48 AM Greg Maxwell <gmaxwell@gmail•com> wrote: > >> This sounds like an excellent idea that preserves the assume valid-like >> security properties but with more performance and more optimization >> oppturnities. >> >> I like particularly -- if I understand it correctly-- that the hints >> themselves are not security relevant, if they're wrong you'll just fail >> rather than end up with something incorrect. Also I like the lack of >> basically anything being normative, it's easier to feel comfortable with >> something when you won't be stuck with it forever... the whole scheme could >> just be reworked every version with no particular harm because its effects >> are all ephemeral. At worst you might have problems if you started IBD >> with one version and tried to complete it with another. >> >> I haven't seen any code, but if hash() is a MD style hash function such >> as sha256 you should place the salt first so that it can be absorbed then >> the midstate reused. For sha256 you could get potentially double the >> hashing performance and I assume/hope that hash is actually fairly high in >> the profile cost.. maybe more like a 1/3 improvement considering the size >> of the entry, care should be taken to try to minimize compression function >> runs. ... but at least the utxo representation there is entirely >> implementation defined. >> >> You may be able to optimize the size of the hints further with the >> observation that false positives are harmless as long as they're rare, as >> in you can save some extra txouts and if you later find them being spent, >> then just go ahead and remove them. So for example ribbon filters give you >> a memory space and read efficient hash table that is constructed by solving >> a linear system to make sure all the inputs match. One could solve for >> successively larger filters to target a specific false positive rate. (I >> mean just a 2,4 cuckoo filter or similar would also work, but that's two >> memory accesses required and you don't actually need runtime modification). >> >> >> >> >> On Wednesday, April 9, 2025 at 10:11:29 AM UTC Ruben Somsen wrote: >> >>> Hi everyone, >>> >>> SwiftSync is a new validation method that allows for near-stateless, >>> fully parallelizable validation of the Bitcoin blockchain via hints about >>> which outputs remain unspent (<100MB total). All other inputs/outputs are >>> efficiently crossed off inside a single hash aggregate that only reaches >>> zero if validation was successful and the hints were correct. >>> >>> The main observation is that it can be much easier to validate that a >>> given UTXO set is correct than to compute it yourself. It allows us to no >>> longer require a stateful moment-to-moment UTXO set during IBD and allows >>> everything to be order independent. I'll briefly summarize the protocol, >>> before sharing the link to the full write-up. >>> >>> Each output gets a boolean hint (e.g. committed into Bitcoin Core) about >>> whether or not it will still be in the UTXO set after validation completes. >>> If it does, we write it to disk (append-only - it won't be used until >>> SwiftSync finishes). If it does not, we hash the UTXO data and add it to an >>> aggregate. For each input, we once again hash the UTXO data and remove it >>> from the aggregate. >>> >>> At the end, for every added output there should have been exactly one >>> removed input, bringing the end total of the aggregate to zero. If this is >>> indeed the case, we will have validated that the hints, and the resulting >>> UTXO set, were correct. >>> >>> E.g. For spent outputs A, B and inputs C, D we calculate >>> hash(UTXO_A||salt) + hash(UTXO_B||salt) - hash(UTXO_C||salt) - >>> hash(UTXO_D||salt) == 0 (proving (A==C && B==D) || (A==D && B==C)). >>> >>> There is one missing step. The UTXO data is only available when >>> processing the output, but not when processing the input. We resolve this >>> by either downloading the outputs that were spent for each block >>> (equivalent to the undo data, maybe 10-15% of a block), or we lean on >>> assumevalid, making it sufficient to only hash the outpoints (which are >>> available in both the output and input) rather than the full UTXO data. >>> >>> Ignoring bandwidth, the expectation is that the speedup will be most >>> significant on either RAM limited devices and/or devices with many CPU >>> cores. Initial PoC benchmarks (thanks to theStack) show a 5.28x speed-up, >>> while currently still being largely sequential. >>> >>> Many more details are in the full write-up: >>> https://gist.github.com/RubenSomsen/a61a37d14182ccd78760e477c78133cd >>> >>> It will answer the following questions (and more): >>> >>> - How the hash aggregate can be made secure against the generalized >>> birthday problem >>> - How exactly assumevalid is utilized and what the implications are >>> - How we can still check for inflation when we skip the amounts with >>> assumevalid >>> - How we can validate transaction order while doing everything in >>> parallel >>> - How we can perform the BIP30 duplicate output check without the UTXO >>> set >>> - How this all relates to assumeutxo >>> >>> To my knowledge, every validation check involving the UTXO set is >>> covered, but I'd be curious to hear if anything was overlooked or if you >>> spot any other issues. >>> >>> Thanks for reading, and thanks to everyone who provided invaluable >>> feedback while the idea was coming together. >>> >>> -- Ruben Somsen >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "Bitcoin Development Mailing List" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to bitcoindev+unsubscribe@googlegroups•com. >> To view this discussion visit >> https://groups.google.com/d/msgid/bitcoindev/cc2dfa79-89f0-4170-9725-894ea189a0e2n%40googlegroups.com >> <https://groups.google.com/d/msgid/bitcoindev/cc2dfa79-89f0-4170-9725-894ea189a0e2n%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- > You received this message because you are subscribed to the Google Groups > "Bitcoin Development Mailing List" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to bitcoindev+unsubscribe@googlegroups•com. > To view this discussion visit > https://groups.google.com/d/msgid/bitcoindev/CAPv7TjaDGr4HCdQ0rR6_ma5zh2umU9r3_529szdswn_GjjnuCw%40mail.gmail.com > <https://groups.google.com/d/msgid/bitcoindev/CAPv7TjaDGr4HCdQ0rR6_ma5zh2umU9r3_529szdswn_GjjnuCw%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group. To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+unsubscribe@googlegroups•com. To view this discussion visit https://groups.google.com/d/msgid/bitcoindev/CACgYNOLnV2JO%3Dn4UZhByLGMzEJ2vUXoaB%2BPCnLG2nzXvu8uAsg%40mail.gmail.com. [-- Attachment #2: Type: text/html, Size: 12670 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [bitcoindev] Re: SwiftSync - smarter synchronization with hints 2025-05-02 10:59 ` Ruben Somsen 2025-05-02 13:38 ` Saint Wenhao @ 2025-05-02 16:07 ` Greg Maxwell 1 sibling, 0 replies; 5+ messages in thread From: Greg Maxwell @ 2025-05-02 16:07 UTC (permalink / raw) To: Bitcoin Development Mailing List [-- Attachment #1.1: Type: text/plain, Size: 3807 bytes --] On Friday, May 2, 2025 at 11:01:10 AM UTC Ruben Somsen wrote: I don't see a practical way to do this without compromising the benefits of SwiftSync because of the "later find them being spent" part. For one, it would require doing a lookup for each input to see if it's in your UTXO set, something we currently don't need to do at all. Secondly, it would require blocks to be processed in order, limiting parallelization. The space savings would also be negligible, considering the hint data is already <100MB (compressed). Doh. Right. Got too excited. I think most of the remaining optimization potential (other than the engineering work to enable parallel validation) is in the hash aggregate, like the midstate reuse. Is there a faster alternative to sha256? Can we get away with 16 byte hashes? I could be mistaken, but in theory it seems we could even get away with 4 byte hashes if we allowed for a 1 in 4 billion chance of accidentally accepting an invalid chain. Leaving consensus to chance feels philosophically improper, but it's a pretty safe bet considering it also involves PoW to trigger and even then would only affect one or two random unlucky people on Earth. I think the problem isn't so much the size of the hash output, but rather the property you need is that each salt gives you a different hash function such that the attacker cannot predictably create collisions. The most expedient way to get there is a cryptographic hash function, which limits you lower performance choices. Your reduction function could just be xor and if it is I doubt the output size itself matters much for performance... and my guess is that e.g. xor with 32 byte hashes will have better performance than addition with 16 bytes. It doesn't need to be so in the initial implementation but ultimately it may make sense for the host to benchmark available hashes and pick the fastest. SHA-256 will easily be a winner on hardware with SHA-NI or similar. But there are other cryptographic hashes in the codebase that might be faster on systems without sha256 hardware support. There are argument I can see to prove the security of simpler hashes that only work if you assume that the attacker could only insert specific finite numbers of bad changes... but they really have completely full control of the hash function input except for the salt and that I think makes it hard to say much positive about the security of any optimization except truncating a secure hash (and I don't think truncating will win you much security). In terms of security keep in mind that a prospective attacker needs to also perform POW to get their attack chain up to the users accepted chain tip, which means that they have to do the work between the AV point and the tip assuming the user isn't isolated. This fact could be used to justify a rather weak hash function, but I think it's not really worth a lot of analysis ahead of the initial functionality. I'm guessing that once this is implemented, even if its with a quite expensive hash function that the IBD performance will be heavily bottlenecked in network and parallelism related issues and be far from the lowest hanging fruit for a while, considering that this has eliminated the big sequential part and a number of annoying to optimize components entirely. -- You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group. To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+unsubscribe@googlegroups•com. To view this discussion visit https://groups.google.com/d/msgid/bitcoindev/69194329-4ce6-4272-acc5-fd913a7986f3n%40googlegroups.com. [-- Attachment #1.2: Type: text/html, Size: 4550 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-05-02 19:09 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2025-04-09 10:10 [bitcoindev] SwiftSync - smarter synchronization with hints Ruben Somsen 2025-05-02 6:47 ` [bitcoindev] " Greg Maxwell 2025-05-02 10:59 ` Ruben Somsen 2025-05-02 13:38 ` Saint Wenhao 2025-05-02 16:07 ` Greg Maxwell
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox