[bitcoindev] SwiftSync - smarter synchronization with hints

public inbox for bitcoindev@googlegroups.com
 help / color / mirror / Atom feed

* [bitcoindev] SwiftSync - smarter synchronization with hints
@ 2025-04-09 10:10 Ruben Somsen
  2025-05-02  6:47 ` [bitcoindev] " Greg Maxwell
  0 siblings, 1 reply; 16+ messages in thread
From: Ruben Somsen @ 2025-04-09 10:10 UTC (permalink / raw)
  To: bitcoindev

[-- Attachment #1: Type: text/plain, Size: 3481 bytes --]

Hi everyone,

SwiftSync is a new validation method that allows for near-stateless, fully
parallelizable validation of the Bitcoin blockchain via hints about which
outputs remain unspent (<100MB total). All other inputs/outputs are
efficiently crossed off inside a single hash aggregate that only reaches
zero if validation was successful and the hints were correct.

The main observation is that it can be much easier to validate that a given
UTXO set is correct than to compute it yourself. It allows us to no longer
require a stateful moment-to-moment UTXO set during IBD and allows
everything to be order independent. I'll briefly summarize the protocol,
before sharing the link to the full write-up.

Each output gets a boolean hint (e.g. committed into Bitcoin Core) about
whether or not it will still be in the UTXO set after validation completes.
If it does, we write it to disk (append-only - it won't be used until
SwiftSync finishes). If it does not, we hash the UTXO data and add it to an
aggregate. For each input, we once again hash the UTXO data and remove it
from the aggregate.

At the end, for every added output there should have been exactly one
removed input, bringing the end total of the aggregate to zero. If this is
indeed the case, we will have validated that the hints, and the resulting
UTXO set, were correct.

E.g. For spent outputs A, B and inputs C, D we calculate hash(UTXO_A||salt)
+ hash(UTXO_B||salt) - hash(UTXO_C||salt) - hash(UTXO_D||salt) == 0
(proving (A==C && B==D) || (A==D && B==C)).

There is one missing step. The UTXO data is only available when processing
the output, but not when processing the input. We resolve this by either
downloading the outputs that were spent for each block (equivalent to the
undo data, maybe 10-15% of a block), or we lean on assumevalid, making it
sufficient to only hash the outpoints (which are available in both the
output and input) rather than the full UTXO data.

Ignoring bandwidth, the expectation is that the speedup will be most
significant on either RAM limited devices and/or devices with many CPU
cores. Initial PoC benchmarks (thanks to theStack) show a 5.28x speed-up,
while currently still being largely sequential.

Many more details are in the full write-up:
https://gist.github.com/RubenSomsen/a61a37d14182ccd78760e477c78133cd

It will answer the following questions (and more):

- How the hash aggregate can be made secure against the generalized
birthday problem
- How exactly assumevalid is utilized and what the implications are
- How we can still check for inflation when we skip the amounts with
assumevalid
- How we can validate transaction order while doing everything in parallel
- How we can perform the BIP30 duplicate output check without the UTXO set
- How this all relates to assumeutxo

To my knowledge, every validation check involving the UTXO set is covered,
but I'd be curious to hear if anything was overlooked or if you spot any
other issues.

Thanks for reading, and thanks to everyone who provided invaluable feedback
while the idea was coming together.

-- Ruben Somsen

-- 
You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+unsubscribe@googlegroups•com.
To view this discussion visit https://groups.google.com/d/msgid/bitcoindev/CAPv7TjaM0tfbcBTRa0_713Bk6Y9jr%2BShOC1KZi2V3V2zooTXyg%40mail.gmail.com.

[-- Attachment #2: Type: text/html, Size: 4042 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [bitcoindev] Re: SwiftSync - smarter synchronization with hints
  2025-04-09 10:10 [bitcoindev] SwiftSync - smarter synchronization with hints Ruben Somsen
@ 2025-05-02  6:47 ` Greg Maxwell
  2025-05-02 10:59   ` Ruben Somsen
  0 siblings, 1 reply; 16+ messages in thread
From: Greg Maxwell @ 2025-05-02  6:47 UTC (permalink / raw)
  To: Bitcoin Development Mailing List

[-- Attachment #1.1: Type: text/plain, Size: 5733 bytes --]

This sounds like an excellent idea that preserves the assume valid-like 
security properties but with more performance and more optimization 
oppturnities.

I like particularly -- if I understand it correctly-- that the hints 
themselves are not security relevant, if they're wrong you'll just fail 
rather than end up with something incorrect.  Also I like the lack of 
basically anything being normative, it's easier to feel comfortable with 
something when you won't be stuck with it forever... the whole scheme could 
just be reworked every version with no particular harm because its effects 
are all ephemeral.  At worst you might have problems if you started IBD 
with one version and tried to complete it with another.

I haven't seen any code,  but if hash() is a MD style hash function such as 
sha256 you should place the salt first so that it can be absorbed then the 
midstate reused. For sha256 you could get potentially double the hashing 
performance and I assume/hope that hash is actually fairly high in the 
profile cost.. maybe more like a 1/3 improvement considering the size of 
the entry, care should be taken to try to minimize compression function 
runs. ... but at least the utxo representation there is entirely 
implementation defined.

You may be able to optimize the size of the hints further with the 
observation that false positives are harmless as long as they're rare,  as 
in you can save some extra txouts and if you later find them being spent, 
then just go ahead and remove them.  So for example ribbon filters give you 
a memory space and read efficient hash table that is constructed by solving 
a linear system to make sure all the inputs match.  One could solve for 
successively larger filters to target a specific false positive rate.  (I 
mean just a 2,4 cuckoo filter or similar would also work, but that's two 
memory accesses required and  you don't actually need runtime modification).

On Wednesday, April 9, 2025 at 10:11:29 AM UTC Ruben Somsen wrote:

> Hi everyone,
>
> SwiftSync is a new validation method that allows for near-stateless, fully 
> parallelizable validation of the Bitcoin blockchain via hints about which 
> outputs remain unspent (<100MB total). All other inputs/outputs are 
> efficiently crossed off inside a single hash aggregate that only reaches 
> zero if validation was successful and the hints were correct.
>
> The main observation is that it can be much easier to validate that a 
> given UTXO set is correct than to compute it yourself. It allows us to no 
> longer require a stateful moment-to-moment UTXO set during IBD and allows 
> everything to be order independent. I'll briefly summarize the protocol, 
> before sharing the link to the full write-up.
>
> Each output gets a boolean hint (e.g. committed into Bitcoin Core) about 
> whether or not it will still be in the UTXO set after validation completes. 
> If it does, we write it to disk (append-only - it won't be used until 
> SwiftSync finishes). If it does not, we hash the UTXO data and add it to an 
> aggregate. For each input, we once again hash the UTXO data and remove it 
> from the aggregate.
>
> At the end, for every added output there should have been exactly one 
> removed input, bringing the end total of the aggregate to zero. If this is 
> indeed the case, we will have validated that the hints, and the resulting 
> UTXO set, were correct.
>
> E.g. For spent outputs A, B and inputs C, D we calculate 
> hash(UTXO_A||salt) + hash(UTXO_B||salt) - hash(UTXO_C||salt) - 
> hash(UTXO_D||salt) == 0 (proving (A==C && B==D) || (A==D && B==C)).
>
> There is one missing step. The UTXO data is only available when processing 
> the output, but not when processing the input. We resolve this by either 
> downloading the outputs that were spent for each block (equivalent to the 
> undo data, maybe 10-15% of a block), or we lean on assumevalid, making it 
> sufficient to only hash the outpoints (which are available in both the 
> output and input) rather than the full UTXO data.
>
> Ignoring bandwidth, the expectation is that the speedup will be most 
> significant on either RAM limited devices and/or devices with many CPU 
> cores. Initial PoC benchmarks (thanks to theStack) show a 5.28x speed-up, 
> while currently still being largely sequential.
>
> Many more details are in the full write-up:
> https://gist.github.com/RubenSomsen/a61a37d14182ccd78760e477c78133cd
>
> It will answer the following questions (and more):
>
> - How the hash aggregate can be made secure against the generalized 
> birthday problem
> - How exactly assumevalid is utilized and what the implications are
> - How we can still check for inflation when we skip the amounts with 
> assumevalid
> - How we can validate transaction order while doing everything in parallel
> - How we can perform the BIP30 duplicate output check without the UTXO set
> - How this all relates to assumeutxo
>
> To my knowledge, every validation check involving the UTXO set is covered, 
> but I'd be curious to hear if anything was overlooked or if you spot any 
> other issues.
>
> Thanks for reading, and thanks to everyone who provided invaluable 
> feedback while the idea was coming together.
>
> -- Ruben Somsen
>

-- 
You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+unsubscribe@googlegroups•com.
To view this discussion visit https://groups.google.com/d/msgid/bitcoindev/cc2dfa79-89f0-4170-9725-894ea189a0e2n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 6604 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [bitcoindev] Re: SwiftSync - smarter synchronization with hints
  2025-05-02  6:47 ` [bitcoindev] " Greg Maxwell
@ 2025-05-02 10:59   ` Ruben Somsen
  2025-05-02 13:38     ` Saint Wenhao
  2025-05-02 16:07     ` Greg Maxwell
  0 siblings, 2 replies; 16+ messages in thread
From: Ruben Somsen @ 2025-05-02 10:59 UTC (permalink / raw)
  To: Greg Maxwell; +Cc: Bitcoin Development Mailing List

[-- Attachment #1: Type: text/plain, Size: 8613 bytes --]

Hi Greg,

I appreciate your thoughts.


>excellent idea that preserves the assume valid-like security properties

Thanks. Yeah, the assumevalid version is particularly efficient. Without
assumevalid there's a bandwidth tradeoff, but that also clearly seems worth
it.


>Also I like the lack of basically anything being normative, it's easier to
feel comfortable with something when you won't be stuck with it forever

Yes, that is a very pleasant property. There is essentially no difference
between the end state that you reach with or without SwiftSync.


>the hints themselves are not security relevant, if they're wrong you'll
just fail

I agree.


>sha256 you should place the salt first so that it can be absorbed then the
midstate reused

Nice. That is a very good point that I had not yet considered.


>false positives are harmless as long as they're rare,  as in you can save
some extra txouts and if you later find them being spent, then just go
ahead and remove them

I don't see a practical way to do this without compromising the benefits of
SwiftSync because of the "later find them being spent" part. For one, it
would require doing a lookup for each input to see if it's in your UTXO
set, something we currently don't need to do at all. Secondly, it would
require blocks to be processed in order, limiting parallelization. The
space savings would also be negligible, considering the hint data is
already <100MB (compressed).


I think most of the remaining optimization potential (other than the
engineering work to enable parallel validation) is in the hash
aggregate, like the midstate reuse. Is there a faster alternative to
sha256? Can we get away with 16 byte hashes? I could be mistaken, but in
theory it seems we could even get away with 4 byte hashes if we allowed for
a 1 in 4 billion chance of accidentally accepting an invalid chain. Leaving
consensus to chance feels philosophically improper, but it's a pretty safe
bet considering it also involves PoW to trigger and even then would only
affect one or two random unlucky people on Earth.


Thanks for chiming in.

Cheers,
Ruben

On Fri, May 2, 2025 at 8:48 AM Greg Maxwell <gmaxwell@gmail•com> wrote:

> This sounds like an excellent idea that preserves the assume valid-like
> security properties but with more performance and more optimization
> oppturnities.
>
> I like particularly -- if I understand it correctly-- that the hints
> themselves are not security relevant, if they're wrong you'll just fail
> rather than end up with something incorrect.  Also I like the lack of
> basically anything being normative, it's easier to feel comfortable with
> something when you won't be stuck with it forever... the whole scheme could
> just be reworked every version with no particular harm because its effects
> are all ephemeral.  At worst you might have problems if you started IBD
> with one version and tried to complete it with another.
>
> I haven't seen any code,  but if hash() is a MD style hash function such
> as sha256 you should place the salt first so that it can be absorbed then
> the midstate reused. For sha256 you could get potentially double the
> hashing performance and I assume/hope that hash is actually fairly high in
> the profile cost.. maybe more like a 1/3 improvement considering the size
> of the entry, care should be taken to try to minimize compression function
> runs. ... but at least the utxo representation there is entirely
> implementation defined.
>
> You may be able to optimize the size of the hints further with the
> observation that false positives are harmless as long as they're rare,  as
> in you can save some extra txouts and if you later find them being spent,
> then just go ahead and remove them.  So for example ribbon filters give you
> a memory space and read efficient hash table that is constructed by solving
> a linear system to make sure all the inputs match.  One could solve for
> successively larger filters to target a specific false positive rate.  (I
> mean just a 2,4 cuckoo filter or similar would also work, but that's two
> memory accesses required and  you don't actually need runtime modification).
>
>
>
>
> On Wednesday, April 9, 2025 at 10:11:29 AM UTC Ruben Somsen wrote:
>
>> Hi everyone,
>>
>> SwiftSync is a new validation method that allows for near-stateless,
>> fully parallelizable validation of the Bitcoin blockchain via hints about
>> which outputs remain unspent (<100MB total). All other inputs/outputs are
>> efficiently crossed off inside a single hash aggregate that only reaches
>> zero if validation was successful and the hints were correct.
>>
>> The main observation is that it can be much easier to validate that a
>> given UTXO set is correct than to compute it yourself. It allows us to no
>> longer require a stateful moment-to-moment UTXO set during IBD and allows
>> everything to be order independent. I'll briefly summarize the protocol,
>> before sharing the link to the full write-up.
>>
>> Each output gets a boolean hint (e.g. committed into Bitcoin Core) about
>> whether or not it will still be in the UTXO set after validation completes.
>> If it does, we write it to disk (append-only - it won't be used until
>> SwiftSync finishes). If it does not, we hash the UTXO data and add it to an
>> aggregate. For each input, we once again hash the UTXO data and remove it
>> from the aggregate.
>>
>> At the end, for every added output there should have been exactly one
>> removed input, bringing the end total of the aggregate to zero. If this is
>> indeed the case, we will have validated that the hints, and the resulting
>> UTXO set, were correct.
>>
>> E.g. For spent outputs A, B and inputs C, D we calculate
>> hash(UTXO_A||salt) + hash(UTXO_B||salt) - hash(UTXO_C||salt) -
>> hash(UTXO_D||salt) == 0 (proving (A==C && B==D) || (A==D && B==C)).
>>
>> There is one missing step. The UTXO data is only available when
>> processing the output, but not when processing the input. We resolve this
>> by either downloading the outputs that were spent for each block
>> (equivalent to the undo data, maybe 10-15% of a block), or we lean on
>> assumevalid, making it sufficient to only hash the outpoints (which are
>> available in both the output and input) rather than the full UTXO data.
>>
>> Ignoring bandwidth, the expectation is that the speedup will be most
>> significant on either RAM limited devices and/or devices with many CPU
>> cores. Initial PoC benchmarks (thanks to theStack) show a 5.28x speed-up,
>> while currently still being largely sequential.
>>
>> Many more details are in the full write-up:
>> https://gist.github.com/RubenSomsen/a61a37d14182ccd78760e477c78133cd
>>
>> It will answer the following questions (and more):
>>
>> - How the hash aggregate can be made secure against the generalized
>> birthday problem
>> - How exactly assumevalid is utilized and what the implications are
>> - How we can still check for inflation when we skip the amounts with
>> assumevalid
>> - How we can validate transaction order while doing everything in parallel
>> - How we can perform the BIP30 duplicate output check without the UTXO set
>> - How this all relates to assumeutxo
>>
>> To my knowledge, every validation check involving the UTXO set is
>> covered, but I'd be curious to hear if anything was overlooked or if you
>> spot any other issues.
>>
>> Thanks for reading, and thanks to everyone who provided invaluable
>> feedback while the idea was coming together.
>>
>> -- Ruben Somsen
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Bitcoin Development Mailing List" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to bitcoindev+unsubscribe@googlegroups•com.
> To view this discussion visit
> https://groups.google.com/d/msgid/bitcoindev/cc2dfa79-89f0-4170-9725-894ea189a0e2n%40googlegroups.com
> <https://groups.google.com/d/msgid/bitcoindev/cc2dfa79-89f0-4170-9725-894ea189a0e2n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+unsubscribe@googlegroups•com.
To view this discussion visit https://groups.google.com/d/msgid/bitcoindev/CAPv7TjaDGr4HCdQ0rR6_ma5zh2umU9r3_529szdswn_GjjnuCw%40mail.gmail.com.

[-- Attachment #2: Type: text/html, Size: 10104 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [bitcoindev] Re: SwiftSync - smarter synchronization with hints
  2025-05-02 10:59   ` Ruben Somsen
@ 2025-05-02 13:38     ` Saint Wenhao
  2025-05-02 16:07     ` Greg Maxwell
  1 sibling, 0 replies; 16+ messages in thread
From: Saint Wenhao @ 2025-05-02 13:38 UTC (permalink / raw)
  To: Ruben Somsen; +Cc: Greg Maxwell, Bitcoin Development Mailing List

[-- Attachment #1: Type: text/plain, Size: 10945 bytes --]

> >sha256 you should place the salt first so that it can be absorbed then
the midstate reused
> Nice. That is a very good point that I had not yet considered.

Basically, if you put salt first, and if the size of the salt will be
aligned with the size of SHA-256 internal chunks, then you will reach
tagged hashes (where salt is just a tag's name).

> Is there a faster alternative to sha256?

Even if it is, then if you introduce another hash function, which is not
used anywhere else, then it will be another moving part, which means more
complexity.

> Can we get away with 16 byte hashes?

If you want to get just that, then you can always strip SHA-256 output, and
take just the first/last N bytes.

> I could be mistaken, but in theory it seems we could even get away with 4
byte hashes if we allowed for a 1 in 4 billion chance of accidentally
accepting an invalid chain.

Grinding 2^32 values seems to be too easy (it is just network difficulty
equal to one, in case of double SHA-256). Because then, even if someone
will not compromise the system, by grinding a matching hash, then still: it
can slow things down.

And also, if you take the sum of hashes, which should be finally zero, then
by grinding UTXOs, someone could make it zero, even if there is some
mismatch. And just for that reason, you probably want something bigger than
32-bit, and something at least big enough to be collision and preimage
resistant.

pt., 2 maj 2025 o 13:01 Ruben Somsen <rsomsen@gmail•com> napisał(a):

> Hi Greg,
>
> I appreciate your thoughts.
>
>
> >excellent idea that preserves the assume valid-like security properties
>
> Thanks. Yeah, the assumevalid version is particularly efficient. Without
> assumevalid there's a bandwidth tradeoff, but that also clearly seems worth
> it.
>
>
> >Also I like the lack of basically anything being normative, it's easier
> to feel comfortable with something when you won't be stuck with it forever
>
> Yes, that is a very pleasant property. There is essentially no difference
> between the end state that you reach with or without SwiftSync.
>
>
> >the hints themselves are not security relevant, if they're wrong you'll
> just fail
>
> I agree.
>
>
> >sha256 you should place the salt first so that it can be absorbed then
> the midstate reused
>
> Nice. That is a very good point that I had not yet considered.
>
>
> >false positives are harmless as long as they're rare,  as in you can save
> some extra txouts and if you later find them being spent, then just go
> ahead and remove them
>
> I don't see a practical way to do this without compromising the benefits
> of SwiftSync because of the "later find them being spent" part. For one, it
> would require doing a lookup for each input to see if it's in your UTXO
> set, something we currently don't need to do at all. Secondly, it would
> require blocks to be processed in order, limiting parallelization. The
> space savings would also be negligible, considering the hint data is
> already <100MB (compressed).
>
>
> I think most of the remaining optimization potential (other than the
> engineering work to enable parallel validation) is in the hash
> aggregate, like the midstate reuse. Is there a faster alternative to
> sha256? Can we get away with 16 byte hashes? I could be mistaken, but in
> theory it seems we could even get away with 4 byte hashes if we allowed for
> a 1 in 4 billion chance of accidentally accepting an invalid chain. Leaving
> consensus to chance feels philosophically improper, but it's a pretty safe
> bet considering it also involves PoW to trigger and even then would only
> affect one or two random unlucky people on Earth.
>
>
> Thanks for chiming in.
>
> Cheers,
> Ruben
>
> On Fri, May 2, 2025 at 8:48 AM Greg Maxwell <gmaxwell@gmail•com> wrote:
>
>> This sounds like an excellent idea that preserves the assume valid-like
>> security properties but with more performance and more optimization
>> oppturnities.
>>
>> I like particularly -- if I understand it correctly-- that the hints
>> themselves are not security relevant, if they're wrong you'll just fail
>> rather than end up with something incorrect.  Also I like the lack of
>> basically anything being normative, it's easier to feel comfortable with
>> something when you won't be stuck with it forever... the whole scheme could
>> just be reworked every version with no particular harm because its effects
>> are all ephemeral.  At worst you might have problems if you started IBD
>> with one version and tried to complete it with another.
>>
>> I haven't seen any code,  but if hash() is a MD style hash function such
>> as sha256 you should place the salt first so that it can be absorbed then
>> the midstate reused. For sha256 you could get potentially double the
>> hashing performance and I assume/hope that hash is actually fairly high in
>> the profile cost.. maybe more like a 1/3 improvement considering the size
>> of the entry, care should be taken to try to minimize compression function
>> runs. ... but at least the utxo representation there is entirely
>> implementation defined.
>>
>> You may be able to optimize the size of the hints further with the
>> observation that false positives are harmless as long as they're rare,  as
>> in you can save some extra txouts and if you later find them being spent,
>> then just go ahead and remove them.  So for example ribbon filters give you
>> a memory space and read efficient hash table that is constructed by solving
>> a linear system to make sure all the inputs match.  One could solve for
>> successively larger filters to target a specific false positive rate.  (I
>> mean just a 2,4 cuckoo filter or similar would also work, but that's two
>> memory accesses required and  you don't actually need runtime modification).
>>
>>
>>
>>
>> On Wednesday, April 9, 2025 at 10:11:29 AM UTC Ruben Somsen wrote:
>>
>>> Hi everyone,
>>>
>>> SwiftSync is a new validation method that allows for near-stateless,
>>> fully parallelizable validation of the Bitcoin blockchain via hints about
>>> which outputs remain unspent (<100MB total). All other inputs/outputs are
>>> efficiently crossed off inside a single hash aggregate that only reaches
>>> zero if validation was successful and the hints were correct.
>>>
>>> The main observation is that it can be much easier to validate that a
>>> given UTXO set is correct than to compute it yourself. It allows us to no
>>> longer require a stateful moment-to-moment UTXO set during IBD and allows
>>> everything to be order independent. I'll briefly summarize the protocol,
>>> before sharing the link to the full write-up.
>>>
>>> Each output gets a boolean hint (e.g. committed into Bitcoin Core) about
>>> whether or not it will still be in the UTXO set after validation completes.
>>> If it does, we write it to disk (append-only - it won't be used until
>>> SwiftSync finishes). If it does not, we hash the UTXO data and add it to an
>>> aggregate. For each input, we once again hash the UTXO data and remove it
>>> from the aggregate.
>>>
>>> At the end, for every added output there should have been exactly one
>>> removed input, bringing the end total of the aggregate to zero. If this is
>>> indeed the case, we will have validated that the hints, and the resulting
>>> UTXO set, were correct.
>>>
>>> E.g. For spent outputs A, B and inputs C, D we calculate
>>> hash(UTXO_A||salt) + hash(UTXO_B||salt) - hash(UTXO_C||salt) -
>>> hash(UTXO_D||salt) == 0 (proving (A==C && B==D) || (A==D && B==C)).
>>>
>>> There is one missing step. The UTXO data is only available when
>>> processing the output, but not when processing the input. We resolve this
>>> by either downloading the outputs that were spent for each block
>>> (equivalent to the undo data, maybe 10-15% of a block), or we lean on
>>> assumevalid, making it sufficient to only hash the outpoints (which are
>>> available in both the output and input) rather than the full UTXO data.
>>>
>>> Ignoring bandwidth, the expectation is that the speedup will be most
>>> significant on either RAM limited devices and/or devices with many CPU
>>> cores. Initial PoC benchmarks (thanks to theStack) show a 5.28x speed-up,
>>> while currently still being largely sequential.
>>>
>>> Many more details are in the full write-up:
>>> https://gist.github.com/RubenSomsen/a61a37d14182ccd78760e477c78133cd
>>>
>>> It will answer the following questions (and more):
>>>
>>> - How the hash aggregate can be made secure against the generalized
>>> birthday problem
>>> - How exactly assumevalid is utilized and what the implications are
>>> - How we can still check for inflation when we skip the amounts with
>>> assumevalid
>>> - How we can validate transaction order while doing everything in
>>> parallel
>>> - How we can perform the BIP30 duplicate output check without the UTXO
>>> set
>>> - How this all relates to assumeutxo
>>>
>>> To my knowledge, every validation check involving the UTXO set is
>>> covered, but I'd be curious to hear if anything was overlooked or if you
>>> spot any other issues.
>>>
>>> Thanks for reading, and thanks to everyone who provided invaluable
>>> feedback while the idea was coming together.
>>>
>>> -- Ruben Somsen
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Bitcoin Development Mailing List" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to bitcoindev+unsubscribe@googlegroups•com.
>> To view this discussion visit
>> https://groups.google.com/d/msgid/bitcoindev/cc2dfa79-89f0-4170-9725-894ea189a0e2n%40googlegroups.com
>> <https://groups.google.com/d/msgid/bitcoindev/cc2dfa79-89f0-4170-9725-894ea189a0e2n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Bitcoin Development Mailing List" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to bitcoindev+unsubscribe@googlegroups•com.
> To view this discussion visit
> https://groups.google.com/d/msgid/bitcoindev/CAPv7TjaDGr4HCdQ0rR6_ma5zh2umU9r3_529szdswn_GjjnuCw%40mail.gmail.com
> <https://groups.google.com/d/msgid/bitcoindev/CAPv7TjaDGr4HCdQ0rR6_ma5zh2umU9r3_529szdswn_GjjnuCw%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+unsubscribe@googlegroups•com.
To view this discussion visit https://groups.google.com/d/msgid/bitcoindev/CACgYNOLnV2JO%3Dn4UZhByLGMzEJ2vUXoaB%2BPCnLG2nzXvu8uAsg%40mail.gmail.com.

[-- Attachment #2: Type: text/html, Size: 12670 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [bitcoindev] Re: SwiftSync - smarter synchronization with hints
  2025-05-02 10:59   ` Ruben Somsen
  2025-05-02 13:38     ` Saint Wenhao
@ 2025-05-02 16:07     ` Greg Maxwell
  2025-05-02 19:15       ` Saint Wenhao
  2025-05-02 20:23       ` Sanket Kanjalkar
  1 sibling, 2 replies; 16+ messages in thread
From: Greg Maxwell @ 2025-05-02 16:07 UTC (permalink / raw)
  To: Bitcoin Development Mailing List

[-- Attachment #1.1: Type: text/plain, Size: 3807 bytes --]

On Friday, May 2, 2025 at 11:01:10 AM UTC Ruben Somsen wrote:

I don't see a practical way to do this without compromising the benefits of 
SwiftSync because of the "later find them being spent" part. For one, it 
would require doing a lookup for each input to see if it's in your UTXO 
set, something we currently don't need to do at all. Secondly, it would 
require blocks to be processed in order, limiting parallelization. The 
space savings would also be negligible, considering the hint data is 
already <100MB (compressed).

Doh. Right. Got too excited.

I think most of the remaining optimization potential (other than the 
engineering work to enable parallel validation) is in the hash 
aggregate, like the midstate reuse. Is there a faster alternative to 
sha256? Can we get away with 16 byte hashes? I could be mistaken, but in 
theory it seems we could even get away with 4 byte hashes if we allowed for 
a 1 in 4 billion chance of accidentally accepting an invalid chain. Leaving 
consensus to chance feels philosophically improper, but it's a pretty safe 
bet considering it also involves PoW to trigger and even then would only 
affect one or two random unlucky people on Earth.

I think the problem isn't so much the size of the hash output, but rather 
the property you need is that each salt gives you a different hash function 
such that the attacker cannot predictably create collisions. The most 
expedient way to get there is a cryptographic hash function, which limits 
you lower performance choices.  Your reduction function could just be xor 
and if it is I doubt the output size itself matters much for performance... 
and my guess is that e.g. xor with 32 byte hashes will have better 
performance than addition with 16 bytes.

It doesn't need to be so in the initial implementation but ultimately it 
may make sense for the host to benchmark available hashes and pick the 
fastest.  SHA-256 will easily be a winner on hardware with SHA-NI or 
similar.  But there are other cryptographic hashes in the codebase that 
might be faster on systems without sha256 hardware support.  

There are argument I can see to prove the security of simpler hashes that 
only work if you assume that the attacker could only insert specific finite 
numbers of bad changes... but they really have completely full control of 
the hash function input except for the salt and that I think makes it hard 
to say much positive about the security of any optimization except 
truncating a secure hash (and I don't think truncating will win you much 
security).

 In terms of security keep in mind that a prospective attacker needs to 
also perform POW to get their attack chain up to the users accepted chain 
tip, which means that they have to do the work between the AV point and the 
tip assuming the user isn't isolated.  This fact could be used to justify a 
rather weak hash function, but I think it's not really worth a lot of 
analysis ahead of the initial functionality.  I'm guessing that once this 
is implemented, even if its with a quite expensive hash function that the 
IBD performance will be heavily bottlenecked in network and parallelism 
related issues and be far from the lowest hanging fruit for a while,  
considering that this has eliminated the big sequential part and a number 
of annoying to optimize components entirely.

-- 
You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+unsubscribe@googlegroups•com.
To view this discussion visit https://groups.google.com/d/msgid/bitcoindev/69194329-4ce6-4272-acc5-fd913a7986f3n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 4550 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [bitcoindev] Re: SwiftSync - smarter synchronization with hints
  2025-05-02 16:07     ` Greg Maxwell
@ 2025-05-02 19:15       ` Saint Wenhao
  2025-05-02 20:23       ` Sanket Kanjalkar
  1 sibling, 0 replies; 16+ messages in thread
From: Saint Wenhao @ 2025-05-02 19:15 UTC (permalink / raw)
  To: Greg Maxwell; +Cc: Bitcoin Development Mailing List

[-- Attachment #1: Type: text/plain, Size: 5591 bytes --]

> Is there a faster alternative to sha256?

Hmm, maybe you can avoid hashing at all? If you take txid:vout as it is,
then it will take 36 bytes per output, and you can just treat it in the
same way, as you would some pseudorandom result of some 288-bit hash
function. And then, it is all about mixing the salt with that result.

Of course, then someone could potentially replace "(txid:1) + (txid:4)"
with "(txid:2) + (txid:3)", so maybe it should be replaced with something
else.

And also, there are some types of outputs, where you will know in advance,
how things will be hashed. Which means, that potentially something like
"hashPrevouts" could be used, and then it could make things a little bit
faster, if someone will create a transaction, which will spend some coin,
because some data will be already hashed, and ready to be fetched. But when
I read BIP-143, then it seems, that it could be hard to make such
optimizations (but nonetheless, it could be a good hint, to design future
protocols in a way, which would make such optimizations possible).

pt., 2 maj 2025 o 21:09 Greg Maxwell <gmaxwell@gmail•com> napisał(a):

> On Friday, May 2, 2025 at 11:01:10 AM UTC Ruben Somsen wrote:
>
> I don't see a practical way to do this without compromising the benefits
> of SwiftSync because of the "later find them being spent" part. For one, it
> would require doing a lookup for each input to see if it's in your UTXO
> set, something we currently don't need to do at all. Secondly, it would
> require blocks to be processed in order, limiting parallelization. The
> space savings would also be negligible, considering the hint data is
> already <100MB (compressed).
>
>
> Doh. Right. Got too excited.
>
>
> I think most of the remaining optimization potential (other than the
> engineering work to enable parallel validation) is in the hash
> aggregate, like the midstate reuse. Is there a faster alternative to
> sha256? Can we get away with 16 byte hashes? I could be mistaken, but in
> theory it seems we could even get away with 4 byte hashes if we allowed for
> a 1 in 4 billion chance of accidentally accepting an invalid chain. Leaving
> consensus to chance feels philosophically improper, but it's a pretty safe
> bet considering it also involves PoW to trigger and even then would only
> affect one or two random unlucky people on Earth.
>
>
> I think the problem isn't so much the size of the hash output, but rather
> the property you need is that each salt gives you a different hash function
> such that the attacker cannot predictably create collisions. The most
> expedient way to get there is a cryptographic hash function, which limits
> you lower performance choices.  Your reduction function could just be xor
> and if it is I doubt the output size itself matters much for performance...
> and my guess is that e.g. xor with 32 byte hashes will have better
> performance than addition with 16 bytes.
>
> It doesn't need to be so in the initial implementation but ultimately it
> may make sense for the host to benchmark available hashes and pick the
> fastest.  SHA-256 will easily be a winner on hardware with SHA-NI or
> similar.  But there are other cryptographic hashes in the codebase that
> might be faster on systems without sha256 hardware support.
>
> There are argument I can see to prove the security of simpler hashes that
> only work if you assume that the attacker could only insert specific finite
> numbers of bad changes... but they really have completely full control of
> the hash function input except for the salt and that I think makes it hard
> to say much positive about the security of any optimization except
> truncating a secure hash (and I don't think truncating will win you much
> security).
>
>  In terms of security keep in mind that a prospective attacker needs to
> also perform POW to get their attack chain up to the users accepted chain
> tip, which means that they have to do the work between the AV point and the
> tip assuming the user isn't isolated.  This fact could be used to justify a
> rather weak hash function, but I think it's not really worth a lot of
> analysis ahead of the initial functionality.  I'm guessing that once this
> is implemented, even if its with a quite expensive hash function that the
> IBD performance will be heavily bottlenecked in network and parallelism
> related issues and be far from the lowest hanging fruit for a while,
> considering that this has eliminated the big sequential part and a number
> of annoying to optimize components entirely.
>
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Bitcoin Development Mailing List" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to bitcoindev+unsubscribe@googlegroups•com.
> To view this discussion visit
> https://groups.google.com/d/msgid/bitcoindev/69194329-4ce6-4272-acc5-fd913a7986f3n%40googlegroups.com
> <https://groups.google.com/d/msgid/bitcoindev/69194329-4ce6-4272-acc5-fd913a7986f3n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+unsubscribe@googlegroups•com.
To view this discussion visit https://groups.google.com/d/msgid/bitcoindev/CACgYNOLqLbemTBbnC-63vXUfCebDPLevxJ4U9j3XM6mQ%2BFuWRA%40mail.gmail.com.

[-- Attachment #2: Type: text/html, Size: 6768 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [bitcoindev] Re: SwiftSync - smarter synchronization with hints
  2025-05-02 16:07     ` Greg Maxwell
  2025-05-02 19:15       ` Saint Wenhao
@ 2025-05-02 20:23       ` Sanket Kanjalkar
  2025-05-03 12:02         ` Greg Maxwell
  1 sibling, 1 reply; 16+ messages in thread
From: Sanket Kanjalkar @ 2025-05-02 20:23 UTC (permalink / raw)
  To: Bitcoin Development Mailing List

[-- Attachment #1: Type: text/plain, Size: 5108 bytes --]

> hash(UTXO_A||salt) + hash(UTXO_B||salt) - hash(UTXO_C||salt) -
hash(UTXO_D||salt) == 0 (proving (A==C && B==D) || (A==D && B==C))

What if instead of hash we encrypt with AES and modular add/subs? I cannot
prove it; but I also don't see a clear way this is broken.

1. Sample random symmetric key `k`
2. Instead of above; AES_k(UTXO_A) + AES_k(UTXO_B) - AES_k(UTXO_C) -
AES(UTXO_D) == 0 =>  (proving (A==C && B==D) || (A==D && B==C))?

As mentioned, I don't know if this works or not. But the idea is that AES
is faster than sha256 on most machines.

On Fri, May 2, 2025 at 12:09 PM Greg Maxwell <gmaxwell@gmail•com> wrote:

> On Friday, May 2, 2025 at 11:01:10 AM UTC Ruben Somsen wrote:
>
> I don't see a practical way to do this without compromising the benefits
> of SwiftSync because of the "later find them being spent" part. For one, it
> would require doing a lookup for each input to see if it's in your UTXO
> set, something we currently don't need to do at all. Secondly, it would
> require blocks to be processed in order, limiting parallelization. The
> space savings would also be negligible, considering the hint data is
> already <100MB (compressed).
>
>
> Doh. Right. Got too excited.
>
>
> I think most of the remaining optimization potential (other than the
> engineering work to enable parallel validation) is in the hash
> aggregate, like the midstate reuse. Is there a faster alternative to
> sha256? Can we get away with 16 byte hashes? I could be mistaken, but in
> theory it seems we could even get away with 4 byte hashes if we allowed for
> a 1 in 4 billion chance of accidentally accepting an invalid chain. Leaving
> consensus to chance feels philosophically improper, but it's a pretty safe
> bet considering it also involves PoW to trigger and even then would only
> affect one or two random unlucky people on Earth.
>
>
> I think the problem isn't so much the size of the hash output, but rather
> the property you need is that each salt gives you a different hash function
> such that the attacker cannot predictably create collisions. The most
> expedient way to get there is a cryptographic hash function, which limits
> you lower performance choices.  Your reduction function could just be xor
> and if it is I doubt the output size itself matters much for performance...
> and my guess is that e.g. xor with 32 byte hashes will have better
> performance than addition with 16 bytes.
>
> It doesn't need to be so in the initial implementation but ultimately it
> may make sense for the host to benchmark available hashes and pick the
> fastest.  SHA-256 will easily be a winner on hardware with SHA-NI or
> similar.  But there are other cryptographic hashes in the codebase that
> might be faster on systems without sha256 hardware support.
>
> There are argument I can see to prove the security of simpler hashes that
> only work if you assume that the attacker could only insert specific finite
> numbers of bad changes... but they really have completely full control of
> the hash function input except for the salt and that I think makes it hard
> to say much positive about the security of any optimization except
> truncating a secure hash (and I don't think truncating will win you much
> security).
>
>  In terms of security keep in mind that a prospective attacker needs to
> also perform POW to get their attack chain up to the users accepted chain
> tip, which means that they have to do the work between the AV point and the
> tip assuming the user isn't isolated.  This fact could be used to justify a
> rather weak hash function, but I think it's not really worth a lot of
> analysis ahead of the initial functionality.  I'm guessing that once this
> is implemented, even if its with a quite expensive hash function that the
> IBD performance will be heavily bottlenecked in network and parallelism
> related issues and be far from the lowest hanging fruit for a while,
> considering that this has eliminated the big sequential part and a number
> of annoying to optimize components entirely.
>
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Bitcoin Development Mailing List" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to bitcoindev+unsubscribe@googlegroups•com.
> To view this discussion visit
> https://groups.google.com/d/msgid/bitcoindev/69194329-4ce6-4272-acc5-fd913a7986f3n%40googlegroups.com
> <https://groups.google.com/d/msgid/bitcoindev/69194329-4ce6-4272-acc5-fd913a7986f3n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>


-- 
Sanket Kanjalkar

-- 
You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+unsubscribe@googlegroups•com.
To view this discussion visit https://groups.google.com/d/msgid/bitcoindev/CAExE9c8XfEH__onX3DhUQh0OnvpoOLwRRp8%2BZ6PozyKGtqpspw%40mail.gmail.com.

[-- Attachment #2: Type: text/html, Size: 6435 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [bitcoindev] Re: SwiftSync - smarter synchronization with hints
  2025-05-02 20:23       ` Sanket Kanjalkar
@ 2025-05-03 12:02         ` Greg Maxwell
  2025-05-03 13:42           ` Ruben Somsen
                             ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Greg Maxwell @ 2025-05-03 12:02 UTC (permalink / raw)
  To: Bitcoin Development Mailing List

[-- Attachment #1.1: Type: text/plain, Size: 1303 bytes --]

On Saturday, May 3, 2025 at 11:55:28 AM UTC Sanket Kanjalkar wrote:

> hash(UTXO_A||salt) + hash(UTXO_B||salt) - hash(UTXO_C||salt) - 
hash(UTXO_D||salt) == 0 (proving (A==C && B==D) || (A==D && B==C))

What if instead of hash we encrypt with AES and modular add/subs? I cannot 
prove it; but I also don't see a clear way this is broken. 

1. Sample random symmetric key `k`
2. Instead of above; AES_k(UTXO_A) + AES_k(UTXO_B) - AES_k(UTXO_C) - 
AES(UTXO_D) == 0 =>  (proving (A==C && B==D) || (A==D && B==C))?

AES in CTR mode is, I'm not sure about other modes? Obviously CTR mode 
would be unsuitable! (I mean sure modular add/sub and xor are different 
operations but they are quite close).  I think that in many modes the 
collision resistance would have to at least be restricted by the birthday 
bound with the small block size. I think CMC might be needed to avoid that 
sort of issue.

-- 
You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+unsubscribe@googlegroups•com.
To view this discussion visit https://groups.google.com/d/msgid/bitcoindev/fbf06c5b-57b6-4615-99bb-3a7ea31ebf22n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 1852 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [bitcoindev] Re: SwiftSync - smarter synchronization with hints
  2025-05-03 12:02         ` Greg Maxwell
@ 2025-05-03 13:42           ` Ruben Somsen
  2025-05-03 13:57             ` Greg Maxwell
  2025-05-03 13:53           ` Weikeng Chen
  2025-05-04  1:06           ` Nagaev Boris
  2 siblings, 1 reply; 16+ messages in thread
From: Ruben Somsen @ 2025-05-03 13:42 UTC (permalink / raw)
  To: Bitcoin Development Mailing List
  Cc: Greg Maxwell, saintwenhao, Sanket Kanjalkar

[-- Attachment #1: Type: text/plain, Size: 4100 bytes --]

  Hey all,

@Saint Wenhao

>if you take the sum of hashes, which should be finally zero, then by
grinding UTXOs, someone could make it zero

That's what the secret salt prevents. You can't grind for a certain number
if you don't know what the number is you are trying to collide with.

>maybe you can avoid hashing at all [...] And then, it is all about mixing
the salt

Without a concrete solution I'm afraid that's wishful thinking. That last
part of the sentence is why we currently need the hash, as well as for
adding more data in the non-assumevalid version.

@Sanket Kanjalkar

>What if instead of hash we encrypt with AES

I can't really evaluate whether this might work, but I can see the line of
reasoning. Conceptually, I think what we need is something that transforms
the data into fixed length blocks for which an attacker can't know the
relationship between each block (i.e. via a secret). The transformation
needs to be the same on the input and output side.

@Greg Maxwell

>Your reduction function could just be xor

I had initially ruled XOR out. Reason for this is that XOR would lead one
to conclude that sets [A, B, C, C], [A, B], [A, B, D, D], etc. are all
equivalent because any two values cancel each other out, regardless of
whether the sets are on the input or output side. Modular add/sub doesn't
have this issue. If the speedup actually turns out to be significant then
there may be some clever way to salvage it like by counting the total
number of inputs and outputs and relying on the knowledge that every txid
must be unique, but that's a lot harder to reason about.

>even if its with a quite expensive hash function that the IBD performance
will be heavily bottlenecked in network and parallelism related issues and
be far from the lowest hanging fruit for a while,  considering that this
has eliminated the big sequential part and a number of annoying to optimize
components entirely

Very true, and as you said, we can easily drop-in replace the hash function
at any future point we like, without adverse consequences.

Cheers,
Ruben

On Sat, May 3, 2025 at 2:07 PM Greg Maxwell <gmaxwell@gmail•com> wrote:

> On Saturday, May 3, 2025 at 11:55:28 AM UTC Sanket Kanjalkar wrote:
>
> > hash(UTXO_A||salt) + hash(UTXO_B||salt) - hash(UTXO_C||salt) -
> hash(UTXO_D||salt) == 0 (proving (A==C && B==D) || (A==D && B==C))
>
> What if instead of hash we encrypt with AES and modular add/subs? I cannot
> prove it; but I also don't see a clear way this is broken.
>
> 1. Sample random symmetric key `k`
> 2. Instead of above; AES_k(UTXO_A) + AES_k(UTXO_B) - AES_k(UTXO_C) -
> AES(UTXO_D) == 0 =>  (proving (A==C && B==D) || (A==D && B==C))?
>
>
> AES in CTR mode is, I'm not sure about other modes? Obviously CTR mode
> would be unsuitable! (I mean sure modular add/sub and xor are different
> operations but they are quite close).  I think that in many modes the
> collision resistance would have to at least be restricted by the birthday
> bound with the small block size. I think CMC might be needed to avoid that
> sort of issue.
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Bitcoin Development Mailing List" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to bitcoindev+unsubscribe@googlegroups•com.
> To view this discussion visit
> https://groups.google.com/d/msgid/bitcoindev/fbf06c5b-57b6-4615-99bb-3a7ea31ebf22n%40googlegroups.com
> <https://groups.google.com/d/msgid/bitcoindev/fbf06c5b-57b6-4615-99bb-3a7ea31ebf22n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+unsubscribe@googlegroups•com.
To view this discussion visit https://groups.google.com/d/msgid/bitcoindev/CAPv7TjYCaAsJFo3t6A6HmoojnbMNjSRkXHeOW%3DjrbGBpPYzQVg%40mail.gmail.com.

[-- Attachment #2: Type: text/html, Size: 5644 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [bitcoindev] Re: SwiftSync - smarter synchronization with hints
  2025-05-03 13:42           ` Ruben Somsen
@ 2025-05-03 13:57             ` Greg Maxwell
  2025-05-03 14:36               ` Greg Maxwell
  2025-05-03 14:55               ` Ruben Somsen
  0 siblings, 2 replies; 16+ messages in thread
From: Greg Maxwell @ 2025-05-03 13:57 UTC (permalink / raw)
  To: Ruben Somsen
  Cc: Bitcoin Development Mailing List, saintwenhao, Sanket Kanjalkar

[-- Attachment #1: Type: text/plain, Size: 4958 bytes --]

Hm. Fair point, but I think you cannot escape the assumption that A is
unique (well A and its spend) thanks to TXID uniqueness without weakening
the security argument, since A*n eventually collides.  It's essentially the
same argument you make for characteristic 2, it just takes more
repetitions.  Without knowing the salt you'd need 2^256 repetitions to be
certain, but e.g. see the prior suggestions on truncating the hash to 32
bits or whatever, a practical number of A repeats would then self cancel.

To be clear I'm not arguing that it should be xor here, but pointing out
there is structure here you might not have expected.





On Sat, May 3, 2025 at 1:42 PM Ruben Somsen <rsomsen@gmail•com> wrote:

>   Hey all,
>
>
> @Saint Wenhao
>
> >if you take the sum of hashes, which should be finally zero, then by
> grinding UTXOs, someone could make it zero
>
> That's what the secret salt prevents. You can't grind for a certain number
> if you don't know what the number is you are trying to collide with.
>
> >maybe you can avoid hashing at all [...] And then, it is all about mixing
> the salt
>
> Without a concrete solution I'm afraid that's wishful thinking. That last
> part of the sentence is why we currently need the hash, as well as for
> adding more data in the non-assumevalid version.
>
>
> @Sanket Kanjalkar
>
> >What if instead of hash we encrypt with AES
>
> I can't really evaluate whether this might work, but I can see the line of
> reasoning. Conceptually, I think what we need is something that transforms
> the data into fixed length blocks for which an attacker can't know the
> relationship between each block (i.e. via a secret). The transformation
> needs to be the same on the input and output side.
>
>
> @Greg Maxwell
>
> >Your reduction function could just be xor
>
> I had initially ruled XOR out. Reason for this is that XOR would lead one
> to conclude that sets [A, B, C, C], [A, B], [A, B, D, D], etc. are all
> equivalent because any two values cancel each other out, regardless of
> whether the sets are on the input or output side. Modular add/sub doesn't
> have this issue. If the speedup actually turns out to be significant then
> there may be some clever way to salvage it like by counting the total
> number of inputs and outputs and relying on the knowledge that every txid
> must be unique, but that's a lot harder to reason about.
>
> >even if its with a quite expensive hash function that the IBD performance
> will be heavily bottlenecked in network and parallelism related issues and
> be far from the lowest hanging fruit for a while,  considering that this
> has eliminated the big sequential part and a number of annoying to optimize
> components entirely
>
> Very true, and as you said, we can easily drop-in replace the hash
> function at any future point we like, without adverse consequences.
>
>
> Cheers,
> Ruben
>
> On Sat, May 3, 2025 at 2:07 PM Greg Maxwell <gmaxwell@gmail•com> wrote:
>
>> On Saturday, May 3, 2025 at 11:55:28 AM UTC Sanket Kanjalkar wrote:
>>
>> > hash(UTXO_A||salt) + hash(UTXO_B||salt) - hash(UTXO_C||salt) -
>> hash(UTXO_D||salt) == 0 (proving (A==C && B==D) || (A==D && B==C))
>>
>> What if instead of hash we encrypt with AES and modular add/subs? I
>> cannot prove it; but I also don't see a clear way this is broken.
>>
>> 1. Sample random symmetric key `k`
>> 2. Instead of above; AES_k(UTXO_A) + AES_k(UTXO_B) - AES_k(UTXO_C) -
>> AES(UTXO_D) == 0 =>  (proving (A==C && B==D) || (A==D && B==C))?
>>
>>
>> AES in CTR mode is, I'm not sure about other modes? Obviously CTR mode
>> would be unsuitable! (I mean sure modular add/sub and xor are different
>> operations but they are quite close).  I think that in many modes the
>> collision resistance would have to at least be restricted by the birthday
>> bound with the small block size. I think CMC might be needed to avoid that
>> sort of issue.
>>
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Bitcoin Development Mailing List" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to bitcoindev+unsubscribe@googlegroups•com.
>> To view this discussion visit
>> https://groups.google.com/d/msgid/bitcoindev/fbf06c5b-57b6-4615-99bb-3a7ea31ebf22n%40googlegroups.com
>> <https://groups.google.com/d/msgid/bitcoindev/fbf06c5b-57b6-4615-99bb-3a7ea31ebf22n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+unsubscribe@googlegroups•com.
To view this discussion visit https://groups.google.com/d/msgid/bitcoindev/CAAS2fgQ1CvyxoRckZT2_6dNgKxDaKWvuK46FYMeaHc8Np_gDPg%40mail.gmail.com.

[-- Attachment #2: Type: text/html, Size: 6813 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [bitcoindev] Re: SwiftSync - smarter synchronization with hints
  2025-05-03 13:57             ` Greg Maxwell
@ 2025-05-03 14:36               ` Greg Maxwell
  2025-05-03 14:55               ` Ruben Somsen
  1 sibling, 0 replies; 16+ messages in thread
From: Greg Maxwell @ 2025-05-03 14:36 UTC (permalink / raw)
  To: Bitcoin Development Mailing List


[-- Attachment #1.1: Type: text/plain, Size: 5443 bytes --]

If you go look at the original muhash thread on the list there is some 
pretty extensive discussion around accumulators, including this point.  But 
I don't think it applies here because there is no reason for the salt to be 
anything but unique and private to the user.

On Saturday, May 3, 2025 at 2:26:18 PM UTC Greg Maxwell wrote:

> Hm. Fair point, but I think you cannot escape the assumption that A is 
> unique (well A and its spend) thanks to TXID uniqueness without weakening 
> the security argument, since A*n eventually collides.  It's essentially the 
> same argument you make for characteristic 2, it just takes more 
> repetitions.  Without knowing the salt you'd need 2^256 repetitions to be 
> certain, but e.g. see the prior suggestions on truncating the hash to 32 
> bits or whatever, a practical number of A repeats would then self cancel.
>
> To be clear I'm not arguing that it should be xor here, but pointing out 
> there is structure here you might not have expected.
>
>
>
>
>
> On Sat, May 3, 2025 at 1:42 PM Ruben Somsen <rso...@gmail•com> wrote:
>
>>   Hey all,
>>
>>
>> @Saint Wenhao
>>
>> >if you take the sum of hashes, which should be finally zero, then by 
>> grinding UTXOs, someone could make it zero
>>
>> That's what the secret salt prevents. You can't grind for a certain 
>> number if you don't know what the number is you are trying to collide with.
>>
>> >maybe you can avoid hashing at all [...] And then, it is all about 
>> mixing the salt
>>
>> Without a concrete solution I'm afraid that's wishful thinking. That last 
>> part of the sentence is why we currently need the hash, as well as for 
>> adding more data in the non-assumevalid version.
>>
>>
>> @Sanket Kanjalkar
>>
>> >What if instead of hash we encrypt with AES
>>
>> I can't really evaluate whether this might work, but I can see the line 
>> of reasoning. Conceptually, I think what we need is something that 
>> transforms the data into fixed length blocks for which an attacker can't 
>> know the relationship between each block (i.e. via a secret). The 
>> transformation needs to be the same on the input and output side.
>>
>>
>> @Greg Maxwell
>>
>> >Your reduction function could just be xor
>>
>> I had initially ruled XOR out. Reason for this is that XOR would lead one 
>> to conclude that sets [A, B, C, C], [A, B], [A, B, D, D], etc. are all 
>> equivalent because any two values cancel each other out, regardless of 
>> whether the sets are on the input or output side. Modular add/sub doesn't 
>> have this issue. If the speedup actually turns out to be significant then 
>> there may be some clever way to salvage it like by counting the total 
>> number of inputs and outputs and relying on the knowledge that every txid 
>> must be unique, but that's a lot harder to reason about.
>>
>> >even if its with a quite expensive hash function that the IBD 
>> performance will be heavily bottlenecked in network and parallelism related 
>> issues and be far from the lowest hanging fruit for a while,  considering 
>> that this has eliminated the big sequential part and a number of annoying 
>> to optimize components entirely
>>
>> Very true, and as you said, we can easily drop-in replace the hash 
>> function at any future point we like, without adverse consequences.
>>
>>
>> Cheers,
>> Ruben
>>
>> On Sat, May 3, 2025 at 2:07 PM Greg Maxwell <gmax...@gmail•com> wrote:
>>
>>> On Saturday, May 3, 2025 at 11:55:28 AM UTC Sanket Kanjalkar wrote:
>>>
>>> > hash(UTXO_A||salt) + hash(UTXO_B||salt) - hash(UTXO_C||salt) - 
>>> hash(UTXO_D||salt) == 0 (proving (A==C && B==D) || (A==D && B==C))
>>>
>>> What if instead of hash we encrypt with AES and modular add/subs? I 
>>> cannot prove it; but I also don't see a clear way this is broken. 
>>>
>>> 1. Sample random symmetric key `k`
>>> 2. Instead of above; AES_k(UTXO_A) + AES_k(UTXO_B) - AES_k(UTXO_C) - 
>>> AES(UTXO_D) == 0 =>  (proving (A==C && B==D) || (A==D && B==C))?
>>>
>>>
>>> AES in CTR mode is, I'm not sure about other modes? Obviously CTR mode 
>>> would be unsuitable! (I mean sure modular add/sub and xor are different 
>>> operations but they are quite close).  I think that in many modes the 
>>> collision resistance would have to at least be restricted by the birthday 
>>> bound with the small block size. I think CMC might be needed to avoid that 
>>> sort of issue.
>>>
>>>  
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "Bitcoin Development Mailing List" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to bitcoindev+...@googlegroups•com.
>>> To view this discussion visit 
>>> https://groups.google.com/d/msgid/bitcoindev/fbf06c5b-57b6-4615-99bb-3a7ea31ebf22n%40googlegroups.com 
>>> <https://groups.google.com/d/msgid/bitcoindev/fbf06c5b-57b6-4615-99bb-3a7ea31ebf22n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+unsubscribe@googlegroups•com.
To view this discussion visit https://groups.google.com/d/msgid/bitcoindev/cf54cf8a-72ae-4687-a2e5-5511d68542can%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 7508 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [bitcoindev] Re: SwiftSync - smarter synchronization with hints
  2025-05-03 13:57             ` Greg Maxwell
  2025-05-03 14:36               ` Greg Maxwell
@ 2025-05-03 14:55               ` Ruben Somsen
  2025-05-03 15:54                 ` Greg Maxwell
  1 sibling, 1 reply; 16+ messages in thread
From: Ruben Somsen @ 2025-05-03 14:55 UTC (permalink / raw)
  To: Bitcoin Development Mailing List; +Cc: Greg Maxwell

[-- Attachment #1: Type: text/plain, Size: 5569 bytes --]

>I think you cannot escape the assumption that A is unique (well A and its
spend) thanks to TXID uniqueness

A counter-example would be a chain with two transactions with the exact
same input A and output B. SwiftSync with XOR would basically treat these
two transactions as non-existent (the two A's and B's cancel each other
out), while a regular full node (or non-XOR SwiftSync) would reject the
chain.

On Sat, May 3, 2025 at 3:57 PM Greg Maxwell <gmaxwell@gmail•com> wrote:

> Hm. Fair point, but I think you cannot escape the assumption that A is
> unique (well A and its spend) thanks to TXID uniqueness without weakening
> the security argument, since A*n eventually collides.  It's essentially the
> same argument you make for characteristic 2, it just takes more
> repetitions.  Without knowing the salt you'd need 2^256 repetitions to be
> certain, but e.g. see the prior suggestions on truncating the hash to 32
> bits or whatever, a practical number of A repeats would then self cancel.
>
> To be clear I'm not arguing that it should be xor here, but pointing out
> there is structure here you might not have expected.
>
>
>
>
>
> On Sat, May 3, 2025 at 1:42 PM Ruben Somsen <rsomsen@gmail•com> wrote:
>
>>   Hey all,
>>
>>
>> @Saint Wenhao
>>
>> >if you take the sum of hashes, which should be finally zero, then by
>> grinding UTXOs, someone could make it zero
>>
>> That's what the secret salt prevents. You can't grind for a certain
>> number if you don't know what the number is you are trying to collide with.
>>
>> >maybe you can avoid hashing at all [...] And then, it is all about
>> mixing the salt
>>
>> Without a concrete solution I'm afraid that's wishful thinking. That last
>> part of the sentence is why we currently need the hash, as well as for
>> adding more data in the non-assumevalid version.
>>
>>
>> @Sanket Kanjalkar
>>
>> >What if instead of hash we encrypt with AES
>>
>> I can't really evaluate whether this might work, but I can see the line
>> of reasoning. Conceptually, I think what we need is something that
>> transforms the data into fixed length blocks for which an attacker can't
>> know the relationship between each block (i.e. via a secret). The
>> transformation needs to be the same on the input and output side.
>>
>>
>> @Greg Maxwell
>>
>> >Your reduction function could just be xor
>>
>> I had initially ruled XOR out. Reason for this is that XOR would lead one
>> to conclude that sets [A, B, C, C], [A, B], [A, B, D, D], etc. are all
>> equivalent because any two values cancel each other out, regardless of
>> whether the sets are on the input or output side. Modular add/sub doesn't
>> have this issue. If the speedup actually turns out to be significant then
>> there may be some clever way to salvage it like by counting the total
>> number of inputs and outputs and relying on the knowledge that every txid
>> must be unique, but that's a lot harder to reason about.
>>
>> >even if its with a quite expensive hash function that the IBD
>> performance will be heavily bottlenecked in network and parallelism related
>> issues and be far from the lowest hanging fruit for a while,  considering
>> that this has eliminated the big sequential part and a number of annoying
>> to optimize components entirely
>>
>> Very true, and as you said, we can easily drop-in replace the hash
>> function at any future point we like, without adverse consequences.
>>
>>
>> Cheers,
>> Ruben
>>
>> On Sat, May 3, 2025 at 2:07 PM Greg Maxwell <gmaxwell@gmail•com> wrote:
>>
>>> On Saturday, May 3, 2025 at 11:55:28 AM UTC Sanket Kanjalkar wrote:
>>>
>>> > hash(UTXO_A||salt) + hash(UTXO_B||salt) - hash(UTXO_C||salt) -
>>> hash(UTXO_D||salt) == 0 (proving (A==C && B==D) || (A==D && B==C))
>>>
>>> What if instead of hash we encrypt with AES and modular add/subs? I
>>> cannot prove it; but I also don't see a clear way this is broken.
>>>
>>> 1. Sample random symmetric key `k`
>>> 2. Instead of above; AES_k(UTXO_A) + AES_k(UTXO_B) - AES_k(UTXO_C) -
>>> AES(UTXO_D) == 0 =>  (proving (A==C && B==D) || (A==D && B==C))?
>>>
>>>
>>> AES in CTR mode is, I'm not sure about other modes? Obviously CTR mode
>>> would be unsuitable! (I mean sure modular add/sub and xor are different
>>> operations but they are quite close).  I think that in many modes the
>>> collision resistance would have to at least be restricted by the birthday
>>> bound with the small block size. I think CMC might be needed to avoid that
>>> sort of issue.
>>>
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Bitcoin Development Mailing List" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to bitcoindev+unsubscribe@googlegroups•com.
>>> To view this discussion visit
>>> https://groups.google.com/d/msgid/bitcoindev/fbf06c5b-57b6-4615-99bb-3a7ea31ebf22n%40googlegroups.com
>>> <https://groups.google.com/d/msgid/bitcoindev/fbf06c5b-57b6-4615-99bb-3a7ea31ebf22n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+unsubscribe@googlegroups•com.
To view this discussion visit https://groups.google.com/d/msgid/bitcoindev/CAPv7TjZ_Z_zvBVnH2fH9EeSbOaVuvrVvHsuQD1bOmVq72uc2JA%40mail.gmail.com.

[-- Attachment #2: Type: text/html, Size: 7687 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [bitcoindev] Re: SwiftSync - smarter synchronization with hints
  2025-05-03 14:55               ` Ruben Somsen
@ 2025-05-03 15:54                 ` Greg Maxwell
  2025-05-03 16:24                   ` Ruben Somsen
  0 siblings, 1 reply; 16+ messages in thread
From: Greg Maxwell @ 2025-05-03 15:54 UTC (permalink / raw)
  To: Bitcoin Development Mailing List


[-- Attachment #1.1: Type: text/plain, Size: 6273 bytes --]

Understood. My point was that this exists with the additive type too, If 
you repeat it N times it will cancel. If the salt isn't known you don't 
know the fewest repeats to achieve that. But even if you don't know the 
salt the size of the ring will do it.  So for example, this is a break of 
the passing suggestion of truncating to 32-bits (even 64-bits perhaps). You 
include the transaction with unknown accumulator impact x 2^32 times.   x * 
2^32 % 2^32 = 0.





On Saturday, May 3, 2025 at 2:56:13 PM UTC Ruben Somsen wrote:

> >I think you cannot escape the assumption that A is unique (well A and its 
> spend) thanks to TXID uniqueness
>
> A counter-example would be a chain with two transactions with the exact 
> same input A and output B. SwiftSync with XOR would basically treat these 
> two transactions as non-existent (the two A's and B's cancel each other 
> out), while a regular full node (or non-XOR SwiftSync) would reject the 
> chain.
>
> On Sat, May 3, 2025 at 3:57 PM Greg Maxwell <gmax...@gmail•com> wrote:
>
>> Hm. Fair point, but I think you cannot escape the assumption that A is 
>> unique (well A and its spend) thanks to TXID uniqueness without weakening 
>> the security argument, since A*n eventually collides.  It's essentially the 
>> same argument you make for characteristic 2, it just takes more 
>> repetitions.  Without knowing the salt you'd need 2^256 repetitions to be 
>> certain, but e.g. see the prior suggestions on truncating the hash to 32 
>> bits or whatever, a practical number of A repeats would then self cancel.
>>
>> To be clear I'm not arguing that it should be xor here, but pointing out 
>> there is structure here you might not have expected.
>>
>>
>>
>>
>>
>> On Sat, May 3, 2025 at 1:42 PM Ruben Somsen <rso...@gmail•com> wrote:
>>
>>>   Hey all,
>>>
>>>
>>> @Saint Wenhao
>>>
>>> >if you take the sum of hashes, which should be finally zero, then by 
>>> grinding UTXOs, someone could make it zero
>>>
>>> That's what the secret salt prevents. You can't grind for a certain 
>>> number if you don't know what the number is you are trying to collide with.
>>>
>>> >maybe you can avoid hashing at all [...] And then, it is all about 
>>> mixing the salt
>>>
>>> Without a concrete solution I'm afraid that's wishful thinking. That 
>>> last part of the sentence is why we currently need the hash, as well as for 
>>> adding more data in the non-assumevalid version.
>>>
>>>
>>> @Sanket Kanjalkar
>>>
>>> >What if instead of hash we encrypt with AES
>>>
>>> I can't really evaluate whether this might work, but I can see the line 
>>> of reasoning. Conceptually, I think what we need is something that 
>>> transforms the data into fixed length blocks for which an attacker can't 
>>> know the relationship between each block (i.e. via a secret). The 
>>> transformation needs to be the same on the input and output side.
>>>
>>>
>>> @Greg Maxwell
>>>
>>> >Your reduction function could just be xor
>>>
>>> I had initially ruled XOR out. Reason for this is that XOR would lead 
>>> one to conclude that sets [A, B, C, C], [A, B], [A, B, D, D], etc. are all 
>>> equivalent because any two values cancel each other out, regardless of 
>>> whether the sets are on the input or output side. Modular add/sub doesn't 
>>> have this issue. If the speedup actually turns out to be significant then 
>>> there may be some clever way to salvage it like by counting the total 
>>> number of inputs and outputs and relying on the knowledge that every txid 
>>> must be unique, but that's a lot harder to reason about.
>>>
>>> >even if its with a quite expensive hash function that the IBD 
>>> performance will be heavily bottlenecked in network and parallelism related 
>>> issues and be far from the lowest hanging fruit for a while,  considering 
>>> that this has eliminated the big sequential part and a number of annoying 
>>> to optimize components entirely
>>>
>>> Very true, and as you said, we can easily drop-in replace the hash 
>>> function at any future point we like, without adverse consequences.
>>>
>>>
>>> Cheers,
>>> Ruben
>>>
>>> On Sat, May 3, 2025 at 2:07 PM Greg Maxwell <gmax...@gmail•com> wrote:
>>>
>>>> On Saturday, May 3, 2025 at 11:55:28 AM UTC Sanket Kanjalkar wrote:
>>>>
>>>> > hash(UTXO_A||salt) + hash(UTXO_B||salt) - hash(UTXO_C||salt) - 
>>>> hash(UTXO_D||salt) == 0 (proving (A==C && B==D) || (A==D && B==C))
>>>>
>>>> What if instead of hash we encrypt with AES and modular add/subs? I 
>>>> cannot prove it; but I also don't see a clear way this is broken. 
>>>>
>>>> 1. Sample random symmetric key `k`
>>>> 2. Instead of above; AES_k(UTXO_A) + AES_k(UTXO_B) - AES_k(UTXO_C) - 
>>>> AES(UTXO_D) == 0 =>  (proving (A==C && B==D) || (A==D && B==C))?
>>>>
>>>>
>>>> AES in CTR mode is, I'm not sure about other modes? Obviously CTR mode 
>>>> would be unsuitable! (I mean sure modular add/sub and xor are different 
>>>> operations but they are quite close).  I think that in many modes the 
>>>> collision resistance would have to at least be restricted by the birthday 
>>>> bound with the small block size. I think CMC might be needed to avoid that 
>>>> sort of issue.
>>>>
>>>>  
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "Bitcoin Development Mailing List" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to bitcoindev+...@googlegroups•com.
>>>> To view this discussion visit 
>>>> https://groups.google.com/d/msgid/bitcoindev/fbf06c5b-57b6-4615-99bb-3a7ea31ebf22n%40googlegroups.com 
>>>> <https://groups.google.com/d/msgid/bitcoindev/fbf06c5b-57b6-4615-99bb-3a7ea31ebf22n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+unsubscribe@googlegroups•com.
To view this discussion visit https://groups.google.com/d/msgid/bitcoindev/a1e589d5-1bd9-4e4b-9824-db22fda77646n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 8595 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [bitcoindev] Re: SwiftSync - smarter synchronization with hints
  2025-05-03 15:54                 ` Greg Maxwell
@ 2025-05-03 16:24                   ` Ruben Somsen
  0 siblings, 0 replies; 16+ messages in thread
From: Ruben Somsen @ 2025-05-03 16:24 UTC (permalink / raw)
  To: Greg Maxwell; +Cc: Bitcoin Development Mailing List

[-- Attachment #1: Type: text/plain, Size: 7120 bytes --]

Ah I see, interesting observation. That's something I hadn't considered as
a potential problem if we truncated the hashes.

On Sat, May 3, 2025 at 6:11 PM Greg Maxwell <gmaxwell@gmail•com> wrote:

> Understood. My point was that this exists with the additive type too, If
> you repeat it N times it will cancel. If the salt isn't known you don't
> know the fewest repeats to achieve that. But even if you don't know the
> salt the size of the ring will do it.  So for example, this is a break of
> the passing suggestion of truncating to 32-bits (even 64-bits perhaps). You
> include the transaction with unknown accumulator impact x 2^32 times.   x *
> 2^32 % 2^32 = 0.
>
>
>
>
>
> On Saturday, May 3, 2025 at 2:56:13 PM UTC Ruben Somsen wrote:
>
>> >I think you cannot escape the assumption that A is unique (well A and
>> its spend) thanks to TXID uniqueness
>>
>> A counter-example would be a chain with two transactions with the exact
>> same input A and output B. SwiftSync with XOR would basically treat these
>> two transactions as non-existent (the two A's and B's cancel each other
>> out), while a regular full node (or non-XOR SwiftSync) would reject the
>> chain.
>>
>> On Sat, May 3, 2025 at 3:57 PM Greg Maxwell <gmax...@gmail•com> wrote:
>>
>>> Hm. Fair point, but I think you cannot escape the assumption that A is
>>> unique (well A and its spend) thanks to TXID uniqueness without weakening
>>> the security argument, since A*n eventually collides.  It's essentially the
>>> same argument you make for characteristic 2, it just takes more
>>> repetitions.  Without knowing the salt you'd need 2^256 repetitions to be
>>> certain, but e.g. see the prior suggestions on truncating the hash to 32
>>> bits or whatever, a practical number of A repeats would then self cancel.
>>>
>>> To be clear I'm not arguing that it should be xor here, but pointing out
>>> there is structure here you might not have expected.
>>>
>>>
>>>
>>>
>>>
>>> On Sat, May 3, 2025 at 1:42 PM Ruben Somsen <rso...@gmail•com> wrote:
>>>
>>>>   Hey all,
>>>>
>>>>
>>>> @Saint Wenhao
>>>>
>>>> >if you take the sum of hashes, which should be finally zero, then by
>>>> grinding UTXOs, someone could make it zero
>>>>
>>>> That's what the secret salt prevents. You can't grind for a certain
>>>> number if you don't know what the number is you are trying to collide with.
>>>>
>>>> >maybe you can avoid hashing at all [...] And then, it is all about
>>>> mixing the salt
>>>>
>>>> Without a concrete solution I'm afraid that's wishful thinking. That
>>>> last part of the sentence is why we currently need the hash, as well as for
>>>> adding more data in the non-assumevalid version.
>>>>
>>>>
>>>> @Sanket Kanjalkar
>>>>
>>>> >What if instead of hash we encrypt with AES
>>>>
>>>> I can't really evaluate whether this might work, but I can see the line
>>>> of reasoning. Conceptually, I think what we need is something that
>>>> transforms the data into fixed length blocks for which an attacker can't
>>>> know the relationship between each block (i.e. via a secret). The
>>>> transformation needs to be the same on the input and output side.
>>>>
>>>>
>>>> @Greg Maxwell
>>>>
>>>> >Your reduction function could just be xor
>>>>
>>>> I had initially ruled XOR out. Reason for this is that XOR would lead
>>>> one to conclude that sets [A, B, C, C], [A, B], [A, B, D, D], etc. are all
>>>> equivalent because any two values cancel each other out, regardless of
>>>> whether the sets are on the input or output side. Modular add/sub doesn't
>>>> have this issue. If the speedup actually turns out to be significant then
>>>> there may be some clever way to salvage it like by counting the total
>>>> number of inputs and outputs and relying on the knowledge that every txid
>>>> must be unique, but that's a lot harder to reason about.
>>>>
>>>> >even if its with a quite expensive hash function that the IBD
>>>> performance will be heavily bottlenecked in network and parallelism related
>>>> issues and be far from the lowest hanging fruit for a while,  considering
>>>> that this has eliminated the big sequential part and a number of annoying
>>>> to optimize components entirely
>>>>
>>>> Very true, and as you said, we can easily drop-in replace the hash
>>>> function at any future point we like, without adverse consequences.
>>>>
>>>>
>>>> Cheers,
>>>> Ruben
>>>>
>>>> On Sat, May 3, 2025 at 2:07 PM Greg Maxwell <gmax...@gmail•com> wrote:
>>>>
>>>>> On Saturday, May 3, 2025 at 11:55:28 AM UTC Sanket Kanjalkar wrote:
>>>>>
>>>>> > hash(UTXO_A||salt) + hash(UTXO_B||salt) - hash(UTXO_C||salt) -
>>>>> hash(UTXO_D||salt) == 0 (proving (A==C && B==D) || (A==D && B==C))
>>>>>
>>>>> What if instead of hash we encrypt with AES and modular add/subs? I
>>>>> cannot prove it; but I also don't see a clear way this is broken.
>>>>>
>>>>> 1. Sample random symmetric key `k`
>>>>> 2. Instead of above; AES_k(UTXO_A) + AES_k(UTXO_B) - AES_k(UTXO_C) -
>>>>> AES(UTXO_D) == 0 =>  (proving (A==C && B==D) || (A==D && B==C))?
>>>>>
>>>>>
>>>>> AES in CTR mode is, I'm not sure about other modes? Obviously CTR mode
>>>>> would be unsuitable! (I mean sure modular add/sub and xor are different
>>>>> operations but they are quite close).  I think that in many modes the
>>>>> collision resistance would have to at least be restricted by the birthday
>>>>> bound with the small block size. I think CMC might be needed to avoid that
>>>>> sort of issue.
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Bitcoin Development Mailing List" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to bitcoindev+...@googlegroups•com.
>>>>> To view this discussion visit
>>>>> https://groups.google.com/d/msgid/bitcoindev/fbf06c5b-57b6-4615-99bb-3a7ea31ebf22n%40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/bitcoindev/fbf06c5b-57b6-4615-99bb-3a7ea31ebf22n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>> --
> You received this message because you are subscribed to the Google Groups
> "Bitcoin Development Mailing List" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to bitcoindev+unsubscribe@googlegroups•com.
> To view this discussion visit
> https://groups.google.com/d/msgid/bitcoindev/a1e589d5-1bd9-4e4b-9824-db22fda77646n%40googlegroups.com
> <https://groups.google.com/d/msgid/bitcoindev/a1e589d5-1bd9-4e4b-9824-db22fda77646n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+unsubscribe@googlegroups•com.
To view this discussion visit https://groups.google.com/d/msgid/bitcoindev/CAPv7TjZes6F0%3DgMsM6t19H89AWVJ30jYTJqb4fmVMVRm0ZeD6w%40mail.gmail.com.

[-- Attachment #2: Type: text/html, Size: 9419 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [bitcoindev] Re: SwiftSync - smarter synchronization with hints
  2025-05-03 12:02         ` Greg Maxwell
  2025-05-03 13:42           ` Ruben Somsen
@ 2025-05-03 13:53           ` Weikeng Chen
  2025-05-04  1:06           ` Nagaev Boris
  2 siblings, 0 replies; 16+ messages in thread
From: Weikeng Chen @ 2025-05-03 13:53 UTC (permalink / raw)
  To: Bitcoin Development Mailing List


[-- Attachment #1.1: Type: text/plain, Size: 2942 bytes --]

I want to add a footnote that, there could be a security complication for 
either using SHA-256 or AES:

for this to be secure:
hash(UTXO_A||salt) + hash(UTXO_B||salt) - hash(UTXO_C||salt) - 
hash(UTXO_D||salt) == 0

either:
- using a regular hash or AES_k, which should work in the same way, but 
salt/AES key needs to have sufficient security and only known by trusted 
parties (e.g., the user who is computing the sum). aka, the hash sum would 
be the user's own bookkeeper, and other people should not trust that result.

or:
- using a significantly longer hash function, although it should still be 
performant enough. A paper from Facebook "Securing Update Propagation with 
Homomorphic Hashing" has cited that:

> AdHash initially received the most attention by several works which aimed 
to implement the construction [SY98, CL99, GSC01], each using a 128-bit or 
256-bit modulus. However, Wagner [Wag02] later showed an attack on the 
generalized birthday problem which could be used to find collisions for 
AdHash on an n-bit modulus in time O(2^{2\sqrt{n})), and that the AdHash 
modulus needs to be greater than 1600 bits long to provide 80-bit security. 
Lyubashevsky [Lyu05] and Shallue [Sha08] showed how to solve the Random 
Modular Subset Sum problem (essentially equivalent to finding collisions in 
AdHash) in time O(2^{n^\epsilon}) for any \epsilon < 1, which indicates 
that AdHash requires several more orders of magnitude larger of a modulus 
just to provide 80-bit security.

On Saturday, May 3, 2025 at 8:07:54 PM UTC+8 Greg Maxwell wrote:

> On Saturday, May 3, 2025 at 11:55:28 AM UTC Sanket Kanjalkar wrote:
>
> > hash(UTXO_A||salt) + hash(UTXO_B||salt) - hash(UTXO_C||salt) - 
> hash(UTXO_D||salt) == 0 (proving (A==C && B==D) || (A==D && B==C))
>
> What if instead of hash we encrypt with AES and modular add/subs? I cannot 
> prove it; but I also don't see a clear way this is broken. 
>
> 1. Sample random symmetric key `k`
> 2. Instead of above; AES_k(UTXO_A) + AES_k(UTXO_B) - AES_k(UTXO_C) - 
> AES(UTXO_D) == 0 =>  (proving (A==C && B==D) || (A==D && B==C))?
>
>
> AES in CTR mode is, I'm not sure about other modes? Obviously CTR mode 
> would be unsuitable! (I mean sure modular add/sub and xor are different 
> operations but they are quite close).  I think that in many modes the 
> collision resistance would have to at least be restricted by the birthday 
> bound with the small block size. I think CMC might be needed to avoid that 
> sort of issue.
>
>  
>

-- 
You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+unsubscribe@googlegroups•com.
To view this discussion visit https://groups.google.com/d/msgid/bitcoindev/9f9f0b4d-98b0-4e41-a1c3-903ee05da462n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 3765 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [bitcoindev] Re: SwiftSync - smarter synchronization with hints
  2025-05-03 12:02         ` Greg Maxwell
  2025-05-03 13:42           ` Ruben Somsen
  2025-05-03 13:53           ` Weikeng Chen
@ 2025-05-04  1:06           ` Nagaev Boris
  2 siblings, 0 replies; 16+ messages in thread
From: Nagaev Boris @ 2025-05-04  1:06 UTC (permalink / raw)
  To: Greg Maxwell; +Cc: Bitcoin Development Mailing List

On Sat, May 3, 2025 at 9:07 AM Greg Maxwell <gmaxwell@gmail•com> wrote:
>
> On Saturday, May 3, 2025 at 11:55:28 AM UTC Sanket Kanjalkar wrote:
>
> > hash(UTXO_A||salt) + hash(UTXO_B||salt) - hash(UTXO_C||salt) - hash(UTXO_D||salt) == 0 (proving (A==C && B==D) || (A==D && B==C))
>
> What if instead of hash we encrypt with AES and modular add/subs? I cannot prove it; but I also don't see a clear way this is broken.
>
> 1. Sample random symmetric key `k`
> 2. Instead of above; AES_k(UTXO_A) + AES_k(UTXO_B) - AES_k(UTXO_C) - AES(UTXO_D) == 0 =>  (proving (A==C && B==D) || (A==D && B==C))?
>
>
> AES in CTR mode is, I'm not sure about other modes? Obviously CTR mode would be unsuitable! (I mean sure modular add/sub and xor are different operations but they are quite close).  I think that in many modes the collision resistance would have to at least be restricted by the birthday bound with the small block size. I think CMC might be needed to avoid that sort of issue.

Can Haraka V2 [1] hash function be used? It is based on AES and
supports 256 or 512 bit inputs. UTXO (txid + index) has a fixed size
and fits into 320 bits. We can use the 512 bit version and just leave
the remaining bytes zero.

[1] https://github.com/kste/haraka

-- 
Best regards,
Boris Nagaev

-- 
You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+unsubscribe@googlegroups•com.
To view this discussion visit https://groups.google.com/d/msgid/bitcoindev/CAFC_Vt6BgUFt5%2BzbSKzHyN4Sk1nFw5hbXVcv_jdUzA5RJvwFBg%40mail.gmail.com.


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-05-04  9:12 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-04-09 10:10 [bitcoindev] SwiftSync - smarter synchronization with hints Ruben Somsen
2025-05-02  6:47 ` [bitcoindev] " Greg Maxwell
2025-05-02 10:59   ` Ruben Somsen
2025-05-02 13:38     ` Saint Wenhao
2025-05-02 16:07     ` Greg Maxwell
2025-05-02 19:15       ` Saint Wenhao
2025-05-02 20:23       ` Sanket Kanjalkar
2025-05-03 12:02         ` Greg Maxwell
2025-05-03 13:42           ` Ruben Somsen
2025-05-03 13:57             ` Greg Maxwell
2025-05-03 14:36               ` Greg Maxwell
2025-05-03 14:55               ` Ruben Somsen
2025-05-03 15:54                 ` Greg Maxwell
2025-05-03 16:24                   ` Ruben Somsen
2025-05-03 13:53           ` Weikeng Chen
2025-05-04  1:06           ` Nagaev Boris

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox