Everything you say is true.

However, branchless does reduce the attack surface considerably - if nothing else, it significantly ups the difficulty of an attack for a relatively low cost in program complexity, and that might still make it worth doing.

As for uniform memory access, if we avoided any kind of heap allocation, wouldn't we avoid such issues?

Anyhow, without having gone into the full details of this particular attack, it seems the main attack point is differences in how squaring and multiplication (in the case of field exponentiation) or doubling and point addition (in the case of ECDSA) are performed. I believe using a branchless implementation where each phase of the operation executes the exact same code and accesses the exact same stack frames would not be vulnerable to FLUSH+RELOAD.

"To be able to recover the sequence of point additions and doublings, the spy program should distinguish
between consecutive doubling operations and must be able to order them with respect to point additions.
Our spy program achieves this by setting the time slot to less than half the length of the group operations.
With the selected curve, group add operations take 7,928 cycles on average, while group double operation
take 7,601 cycles. Setting the time slot to 3,000 cycles ensures that there is an empty time slot within any
group operation, allowing our spy to correctly distinguish consecutive doubles"

The approach I've suggested makes doubling operations indistinguishable from point additions from the perspective of cache access.

On Mar 5, 2014, at 1:44 PM, Gregory Maxwell <gmaxwell@gmail.com> wrote:

> On Wed, Mar 5, 2014 at 1:31 PM, Eric Lombrozo <elombrozo@gmail.com> wrote:
>> If we don't mind sacrificing some performance when signing, there's a fairly
>> simple way to implement a constant-time constant-cache-access-pattern
>> secp256k1.
>> It is based on the idea of branchless implementations of the field and group
>> operations.
> 
> Do take care that branchless doesn't mean side-channel free: On
> non-trivial hardware you must have uniform memory accesses too.
> 
> (and that itself isn't enough for sidechannel freeness against an
> attacker that can do power analysis... then you star worrying about
> the internal structure your primitive adders and the hamming weight of
> your numbers, and needing to build hardware that uses differential
> logic, and yuck yuck yuck:  This is why you still shouldn't reuse
> addresses, and why a blinding approach may still be sensible, even if
> you believe your implementation is hardened against side-channels)