On Friday, May 2, 2025 at 11:01:10 AM UTC Ruben Somsen wrote:
I don't see a practical way to do this without compromising the benefits of SwiftSync because of the "later find them being spent" part. For one, it would require doing a lookup for each input to see if it's in your UTXO set, something we currently don't need to do at all. Secondly, it would require blocks to be processed in order, limiting parallelization. The space savings would also be negligible, considering the hint data is already <100MB (compressed).
Doh. Right. Got too excited.
I think most of the remaining optimization potential (other than the engineering work to enable parallel validation) is in the hash aggregate, like the midstate reuse. Is there a faster alternative to sha256? Can we get away with 16 byte hashes? I could be mistaken, but in theory it seems we could even get away with 4 byte hashes if we allowed for a 1 in 4 billion chance of accidentally accepting an invalid chain. Leaving consensus to chance feels philosophically improper, but it's a pretty safe bet considering it also involves PoW to trigger and even then would only affect one or two random unlucky people on Earth.
I think the problem isn't so much the size of the hash output, but rather the property you need is that each salt gives you a different hash function such that the attacker cannot predictably create collisions. The most expedient way to get there is a cryptographic hash function, which limits you lower performance choices. Your reduction function could just be xor and if it is I doubt the output size itself matters much for performance... and my guess is that e.g. xor with 32 byte hashes will have better performance than addition with 16 bytes.
It doesn't need to be so in the initial implementation but ultimately it may make sense for the host to benchmark available hashes and pick the fastest. SHA-256 will easily be a winner on hardware with SHA-NI or similar. But there are other cryptographic hashes in the codebase that might be faster on systems without sha256 hardware support.
There are argument I can see to prove the security of simpler hashes that only work if you assume that the attacker could only insert specific finite numbers of bad changes... but they really have completely full control of the hash function input except for the salt and that I think makes it hard to say much positive about the security of any optimization except truncating a secure hash (and I don't think truncating will win you much security).
In terms of security keep in mind that a prospective attacker needs to also perform POW to get their attack chain up to the users accepted chain tip, which means that they have to do the work between the AV point and the tip assuming the user isn't isolated. This fact could be used to justify a rather weak hash function, but I think it's not really worth a lot of analysis ahead of the initial functionality. I'm guessing that once this is implemented, even if its with a quite expensive hash function that the IBD performance will be heavily bottlenecked in network and parallelism related issues and be far from the lowest hanging fruit for a while, considering that this has eliminated the big sequential part and a number of annoying to optimize components entirely.