[bitcoin-dev] Libre/Open blockchain / cryptographic ASICs

public inbox for bitcoindev@googlegroups.com
 help / color / mirror / Atom feed

* [bitcoin-dev] Libre/Open blockchain / cryptographic ASICs
@ 2021-01-25 18:00 Luke Kenneth Casson Leighton
  2021-01-26 10:47 ` Pavol Rusnak
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Luke Kenneth Casson Leighton @ 2021-01-25 18:00 UTC (permalink / raw)
  To: bitcoin-dev

[-- Attachment #1: Type: text/plain, Size: 5300 bytes --]

folks, hi, please do cc me as i am subscribed "digest", apologies for the
inconvenience.

i've been speaking on and off with kanzure, asking his advice about a libre
/ transparently-developed ASIC / SoC, for some time, since meeting a very
interesting person at the Barcelona RISC-V Workshop in 2018.

this person pointed out that FIPS-approved algorithms, implemented in
FIPS-approved crypto-chips used in hardware wallets to protect billions to
trillions in cryptocurrency assets world-wide are basically asking for
trouble.  i heard 3rd-hand that the constants used in the original bitcoin
protocol were very deliberately changed from those approved by FIPS and the
NSA for exactly the reasons that drive people to question whether it is a
good idea to trust closed and secretive crypto-chips, no matter how
well-intentioned the company that manufactures them.  the person i met was
there to "sound out" interested parties willing to help with such a
venture, even to the extent of actually buying a Foundry, in order to
guarantee that the crypto-chip they would like to see made had not been
tampered with at any point during manufacturing.

at FOSDEM2019 i was also approached by a team that also wanted to do a
basic "embedded" processor, entirely libre-licensed, only in 350nm or
180nm, with just enough horsepower to do digital signing and so on.  since
then, fascinatingly, NLnet has obtained a new EU Horizon Grant and started
their "Assure" Programme:
https://nlnet.nl/assure/

(our application may be found here):
https://libre-soc.org/nlnet_2021_crypto_router/

in addition, betrusted (headed by Bunnie Huang) is also funded by NLnet and
is along similar lines:
https://betrusted.io/

NLnet is even funding LibreSOC with a 180nm test chip tape-out of the
LibreSOC Core, with help from Sorbonne University and
https://chips4makers.io
https://bugs.libre-soc.org/show_bug.cgi?id=199

and we also have funding to do Formal Correctness Proofs for the low-level
portions of the HDL (similar to c++ and python "assert", but for hardware)
https://bugs.libre-soc.org/show_bug.cgi?id=158

the point being that where even one year ago the idea of an open source
developer creating and paying for an actual ASIC was so ridiculous they
would be laughed at and viewed in a derisive fashion thereafter, reality is
that things are opening up to the point where even Foundry PDKs are now
open source:
https://github.com/google/skywater-pdk

technically it is possible to use Open Hardware to create commercial
(closed) products.  Richard Herveille, most well-known for his early
involvement in Opencores, was the Open Hardware developer responsible for
the HDL behind the first Antminer product by Bitmain, for example.  It used
his RV32 core and i believe he also developed the SHA256 HDL for them.
however that is different in that it was a closed product, not open for
independent public audit and review.

what i am therefore trying to say is that it is a genuinely achievable
goal, now, to create fully transparently-openly-developed ASICs that could
perform crytographic tasks such as mining and hardware wallet key
protection *and have a full audit trail* even to the extent of having
mathematical Formal Correctness Proofs.

my question is - therefore - with all that background in mind - is: is this
something that is of interest?

now, before getting all excited about the possibilities, it's critically
important to provide a reality-check on the costs involved:

* 350nm ASICs: https://chips4makers.io - EUR 1750 for 20 samples
* 180nm ASICs: EUR $600 per mm^2 MPW Shuttle (test ASICs) and EUR 50,000
for production masks
* ... exponential curve going through 130nm, 65nm, 45nm gets to around
$500k...
* 28nm ASICs: USD 100,000 for MPW and USD $1 million for production masks
* 22nm ASICs: double 28nm
* 14nm: double 22nm
* 7nm: quadruple 14nm

you get where that is going.  where higher geometries are now easily within
reach even of a hobbyist ASIC developer, USD 20 million is a bare minimum
to design, develop and bring to manufacture a 7nm Custom ASIC.  full-custom
silicon, as carried out regularly by Intel, is USD 100 million.

this is not to say that it is completely outside the realm of possibility
to do something in these lower geometries: you either simply have to have a
damn good reason, or a hell of a lot of money, or a product that's so
compelling that customers really *really* want it, or you have OEMs lining
up to sign LOIs or put up cash-with-preorder.

[my personal favourite is a focus on power-efficiency: battery-operated
hand-held devices at or below 3.5 watts (thus not requiring thermal pipes
or fans - which tend to break). i have to admit i am a little alarmed at
the world-wide energy consumption of bitcoin: personally i would very much
prefer to be involved in eco-conscious blockchain and crypto-currency
products].

so - as an open question: what would people really like to see happen,
here, what do people feel would be of interest to the wider bitcoin
community, and, crucially, is there a realistic way to bridge (fund) the
gap and actually deliver to the bitcoin user community?

best,

l.

---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68

-- 
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68

[-- Attachment #2: Type: text/html, Size: 6056 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [bitcoin-dev] Libre/Open blockchain / cryptographic ASICs
  2021-01-25 18:00 [bitcoin-dev] Libre/Open blockchain / cryptographic ASICs Luke Kenneth Casson Leighton
@ 2021-01-26 10:47 ` Pavol Rusnak
  2021-02-03  2:06 ` ZmnSCPxj
  2021-02-03  3:17 ` ZmnSCPxj
  2 siblings, 0 replies; 13+ messages in thread
From: Pavol Rusnak @ 2021-01-26 10:47 UTC (permalink / raw)
  To: Luke Kenneth Casson Leighton, Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 811 bytes --]

On Tue, 26 Jan 2021 at 00:10, Luke Kenneth Casson Leighton via bitcoin-dev <
bitcoin-dev@lists•linuxfoundation.org> wrote:

> so - as an open question: what would people really like to see happen,
> here, what do people feel would be of interest to the wider bitcoin
> community, and, crucially, is there a realistic way to bridge (fund) the
> gap and actually deliver to the bitcoin user community?

Hi Luke!

Very excited to hear more about your effort! Recently, SatoshiLabs
(creators of Trezor) spinned off a new entity Tropic Square[1], which has
the same goal - to bring the truly open security chips to the general
public. I'll send you another email, where we can arrange the call to
exchange ideas.

[1] https://tropicsquare.com

-- 
Best Regards / S pozdravom,

Pavol "stick" Rusnak
CTO, SatoshiLabs

[-- Attachment #2: Type: text/html, Size: 1450 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [bitcoin-dev] Libre/Open blockchain / cryptographic ASICs
  2021-01-25 18:00 [bitcoin-dev] Libre/Open blockchain / cryptographic ASICs Luke Kenneth Casson Leighton
  2021-01-26 10:47 ` Pavol Rusnak
@ 2021-02-03  2:06 ` ZmnSCPxj
  2021-02-03 13:24   ` Luke Kenneth Casson Leighton
  2021-02-03  3:17 ` ZmnSCPxj
  2 siblings, 1 reply; 13+ messages in thread
From: ZmnSCPxj @ 2021-02-03  2:06 UTC (permalink / raw)
  To: Luke Kenneth Casson Leighton, Bitcoin Protocol Discussion

Good morning Luke,

I happen to have experience designing digital ASICs, mostly pipelined data processing.
However my experience is limited to larger geometries and in SystemVerilog.

On the technical side, as I understand it (I have been out of that industry for 4 years now, so my knowledge may be obsolete) as you approach lower geometries, you also start approaching analog design.
In our case we were already manually laying out gates and flip-flops (or replacing flip-flops with level-triggered latches and being extra careful with clocks) to squeeze performance (and area) for some of the more boring parts (i.e. just deserialization of data from a high-frequency low bus width to a lower-frequency wide bus width).

Formal correctness proofs are nice, but we were impeded from using those because of the need to manually lay out devices, meaning the netlist did not correspond exactly to an RTL that formal correctness could understand.
Though to be fair most of the circuit was standard RTL->synthesized netlist and formal correctness proofs worked perfectly well for those.
Many of the formal correctness proofs were really about the formal equivalence of the netlist to the RTL; the correctness of the RTL was "proved" by simulation testing.
(to be fair, there were tools to force you to improve coverage by injecting faults to your RTL, e.g. it would virtually flip an `&&` to an `||` and if none of your tests signaled an error it would complain that your test coverage sucked.)
Things might have changed.

A good RTL would embed SystemVerilog Assertions or PSL Assertions as well.
Some formal verification tools can understand a subset of SystemVerilog Assertions / PSL assertions and validate that your RTL conformed to the assertions, which would probably help cut down on the need for RTL simulation.

Overall, my understanding is that smaller geometries are needed only if you want to target a really high performance / unit cost and performance / energy consumption ratios.
That is, you would target smaller geometries for mining.

If you need a secure tr\*sted computing module that does not need to be fast or cheap, just very accurate to the required specification, the larger geometries should be fine and you would be able to live almost entirely in RTL-land without diving into netlist and layout specifications.

A wrinkle here is that licenses for tools from tr\*sted vendors like Synopsys or Cadence are ***expensive***.
What is more, you should really buy two sets of licenses, e.g. do logic synthesis with Synopsys and then formal verification with Cadence, because you do not want to fully tr\*st just one vendor.
Synthesis in particular is a black box and each vendor keeps their particular implementations and tricks secret.

Pointing some funding at the open-source Icarus Verilog might also fit, as it lost its ability to do synthesis more than a decade ago due to inability to maintain.
Icarus Verilog only supports Verilog-2001 and only has very very partial support for SystemVerilog (though to be fair, there is little that SystemVerilog adds that can be used in RTL --- `always_comb` and `always_ff` come to mind, as well as assertions, and I think recent Icarus has started experimental support for those for `always` variants).
Note as well that I heard (at the time when I was in the industry) that some foundries will not even accept a netlist unless it was created by a synthesis tool from one of the major vendors (Synopsys, Cadence, Mentor Graphics, maybe more I have forgotten since).

Regards,
ZmnSCPxj

> folks, hi, please do cc me as i am subscribed "digest", apologies for the inconvenience.
>
> i've been speaking on and off with kanzure, asking his advice about a libre / transparently-developed ASIC / SoC, for some time, since meeting a very interesting person at the Barcelona RISC-V Workshop in 2018.
>
> this person pointed out that FIPS-approved algorithms, implemented in FIPS-approved crypto-chips used in hardware wallets to protect billions to trillions in cryptocurrency assets world-wide are basically asking for trouble.  i heard 3rd-hand that the constants used in the original bitcoin protocol were very deliberately changed from those approved by FIPS and the NSA for exactly the reasons that drive people to question whether it is a good idea to trust closed and secretive crypto-chips, no matter how well-intentioned the company that manufactures them.  the person i met was there to "sound out" interested parties willing to help with such a venture, even to the extent of actually buying a Foundry, in order to guarantee that the crypto-chip they would like to see made had not been tampered with at any point during manufacturing.
>
> at FOSDEM2019 i was also approached by a team that also wanted to do a basic "embedded" processor, entirely libre-licensed, only in 350nm or 180nm, with just enough horsepower to do digital signing and so on.  since then, fascinatingly, NLnet has obtained a new EU Horizon Grant and started their "Assure" Programme:
> https://nlnet.nl/assure/
>
> (our application may be found here):
> https://libre-soc.org/nlnet_2021_crypto_router/
>
> in addition, betrusted (headed by Bunnie Huang) is also funded by NLnet and is along similar lines:
> https://betrusted.io/
>
> NLnet is even funding LibreSOC with a 180nm test chip tape-out of the LibreSOC Core, with help from Sorbonne University and https://chips4makers.io
> https://bugs.libre-soc.org/show_bug.cgi?id=199
>
> and we also have funding to do Formal Correctness Proofs for the low-level portions of the HDL (similar to c++ and python "assert", but for hardware)
> https://bugs.libre-soc.org/show_bug.cgi?id=158
>
> the point being that where even one year ago the idea of an open source developer creating and paying for an actual ASIC was so ridiculous they would be laughed at and viewed in a derisive fashion thereafter, reality is that things are opening up to the point where even Foundry PDKs are now open source:
> https://github.com/google/skywater-pdk
>
> technically it is possible to use Open Hardware to create commercial (closed) products.  Richard Herveille, most well-known for his early involvement in Opencores, was the Open Hardware developer responsible for the HDL behind the first Antminer product by Bitmain, for example.  It used his RV32 core and i believe he also developed the SHA256 HDL for them.  however that is different in that it was a closed product, not open for independent public audit and review.
>
> what i am therefore trying to say is that it is a genuinely achievable goal, now, to create fully transparently-openly-developed ASICs that could perform crytographic tasks such as mining and hardware wallet key protection *and have a full audit trail* even to the extent of having mathematical Formal Correctness Proofs.
>
> my question is - therefore - with all that background in mind - is: is this something that is of interest?
>
> now, before getting all excited about the possibilities, it's critically important to provide a reality-check on the costs involved:
>
> * 350nm ASICs: https://chips4makers.io - EUR 1750 for 20 samples
> * 180nm ASICs: EUR $600 per mm^2 MPW Shuttle (test ASICs) and EUR 50,000 for production masks
> * ... exponential curve going through 130nm, 65nm, 45nm gets to around $500k...
> * 28nm ASICs: USD 100,000 for MPW and USD $1 million for production masks
> * 22nm ASICs: double 28nm
> * 14nm: double 22nm
> * 7nm: quadruple 14nm
>
> you get where that is going.  where higher geometries are now easily within reach even of a hobbyist ASIC developer, USD 20 million is a bare minimum to design, develop and bring to manufacture a 7nm Custom ASIC.  full-custom silicon, as carried out regularly by Intel, is USD 100 million.
>
> this is not to say that it is completely outside the realm of possibility to do something in these lower geometries: you either simply have to have a damn good reason, or a hell of a lot of money, or a product that's so compelling that customers really *really* want it, or you have OEMs lining up to sign LOIs or put up cash-with-preorder.
>
> [my personal favourite is a focus on power-efficiency: battery-operated hand-held devices at or below 3.5 watts (thus not requiring thermal pipes or fans - which tend to break). i have to admit i am a little alarmed at the world-wide energy consumption of bitcoin: personally i would very much prefer to be involved in eco-conscious blockchain and crypto-currency products].
>
> so - as an open question: what would people really like to see happen, here, what do people feel would be of interest to the wider bitcoin community, and, crucially, is there a realistic way to bridge (fund) the gap and actually deliver to the bitcoin user community?
>
> best,
>
> l.
>
> ---
> crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
>
> --
> ---
> crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [bitcoin-dev] Libre/Open blockchain / cryptographic ASICs
  2021-01-25 18:00 [bitcoin-dev] Libre/Open blockchain / cryptographic ASICs Luke Kenneth Casson Leighton
  2021-01-26 10:47 ` Pavol Rusnak
  2021-02-03  2:06 ` ZmnSCPxj
@ 2021-02-03  3:17 ` ZmnSCPxj
  2021-02-03 14:07   ` Luke Kenneth Casson Leighton
  2 siblings, 1 reply; 13+ messages in thread
From: ZmnSCPxj @ 2021-02-03  3:17 UTC (permalink / raw)
  To: Luke Kenneth Casson Leighton, Bitcoin Protocol Discussion

Good morning again Luke,

> [my personal favourite is a focus on power-efficiency: battery-operated hand-held devices at or below 3.5 watts (thus not requiring thermal pipes or fans - which tend to break). i have to admit i am a little alarmed at the world-wide energy consumption of bitcoin: personally i would very much prefer to be involved in eco-conscious blockchain and crypto-currency products].

If you mean miner power usage, then power efficiency will not reduce energy consumption.

Suppose you are a miner.
Suppose you have access to 1 watt of energy at a particular fixed cost of 1 BTC per watt, and you have a current hardware that gives 1 Exahash for 1 watt of energy usage.
Suppose this 1 Exahash earns 2 BTC (and that is why you mine, you earn 1 BTC).

Now suppose there is a new technology where a hardware can give 1 Exohash for only 0.5 watt of energy usage.
Your choices are:

* Buy only one unit, get 1 Exohash for 0.5 watt, thus getting 2.0 BTC while only paying 0.5 BTC in electricity fees for a net of 1.5 BTC.
* Buy two units, get 2 Exohash for 1.0 watt, thus getting 4.0 BTC while only paying 1.0 BTC in electricity fees for a net of 3.0 BTC.

What do you think your better choice is?

That assumes that difficulty adjustments do not occur.
If difficulty adjustments are put into consideration, then if everyone *else* does the second choice, global mining hashrate doubles and the difficulty adjustment matches, and if you took the first choice, you would end up earning far less than 2.0 BTC after the difficulty adjustment.

Thus, any rational miner will just pack more miners in the same number of watts rather than reduce their watt consumption.
There may be physical limits involved (only so many miners you can put in an amount of space, or whatever other limits) but absent those, a rational miner will not reduce their energy expenditure with higher-efficiency units, they will buy more units.

Thus, increasing power efficiency for mining does not reduce the amount of actual energy that will be consumed by Bitcoin mining.

If you are not referring to mining energy, then I think a computer running BitTorrent software 24/7 would consume about the same amount of energy as a fullnode running Bitcoin software 24/7, and I do not think the energy consumed thus is actually particularly high relative to a lot of other things.

Regards,
ZmnSCPxj

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [bitcoin-dev] Libre/Open blockchain / cryptographic ASICs
  2021-02-03  2:06 ` ZmnSCPxj
@ 2021-02-03 13:24   ` Luke Kenneth Casson Leighton
  2021-02-11  8:20     ` ZmnSCPxj
  0 siblings, 1 reply; 13+ messages in thread
From: Luke Kenneth Casson Leighton @ 2021-02-03 13:24 UTC (permalink / raw)
  To: ZmnSCPxj; +Cc: Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 6933 bytes --]

(hi folks do cc me, i am subscribed digest, thank you for doing that,
ZmnSCPxj)

On Wednesday, February 3, 2021, ZmnSCPxj <ZmnSCPxj@protonmail•com> wrote:
> Good morning Luke,
>
> I happen to have experience designing digital ASICs, mostly pipelined
data processing.
> However my experience is limited to larger geometries and in
SystemVerilog.

larger geometries for a hardware wallet ASIC is ok (as long as it is not
retail based and trying to run e.g. RSA, taking so long to complete that
the retail customer walks out)

> On the technical side, as I understand it (I have been out of that
industry for 4 years now, so my knowledge may be obsolete)

not at all! still very valuable

> as you approach lower geometries, you also start approaching analog
design.

yyeah i could intuitively tell/guess there might be something like this
which would throw a spanner in the works, it is why the grant request i put
in specifically excluded data-dependent constant time analysis and also
power analysis.

> In our case we were already manually laying out gates and flip-flops (or
replacing flip-flops with level-triggered latches and being extra careful
with clocks) to squeeze performance (and area) ...

ya-howw :)

> Many of the formal correctness proofs were really about the formal
equivalence of the netlist to the RTL; the correctness of the RTL was
"proved" by simulation testing.

thanks to Symbiyosys we are using formal proofs much more extensively, as
effectively a 100% coverage replacement for unit tests.

an example is popcount.  we did two versions.  one is a recursive tree
algorithm, almost impossible to read and understand what the hell it does.

the other is a total braindead 1-liner "x = x + input[i]", rubbish
performance though.

running a formal proof on these gave us 100% confidence that the complex
optimised version does the damn job.

yes we still do unit tests, these are more "demo code".

now, the caveat is that you have to have a model of the "dut" (device under
test) against which to compare, and if the dut is ridiculously complex then
the formal model variant, which has to do the same job, ends up equally as
complex (or effectively a duplicate of the dut) and the exercise is a bit
of a waste of time...

...*unless*... there happens to be other implementations out there.  then
the proof can be run against those and everybody wins through collaboration.

now, here's why i put in the NLnet Grant request to explore going back to
the mathematics of crypto-primitives.

many ISAs e.g. intel AVX2 have added GFMULT8 etc etc because that does
S-Boxes for Rijndael.  they have gone mad by analysing algorithms trying to
fit them to standard ISAs.

nobody does Rijndael S-Boxes any way other than 256-entry lookup tables
because no standard ISA has general-purpose Galois Field Multiply.

consequently implementations in assembler get completely divorced from the
original mathematics on which the cryptographic algorithm was based.

the approach i would like to take is, "hang on a minute: how far would you
get if you actually added *general-purpose* instructions that *directly*
provided the underlying mathematical principles, and then wrapped a
Vector-Matrix Engine around them?".

would this drastically simplify algorithms to the point where *READABLE* c
code compiles directly to opcodes that run screamingly fast, outperforming
hand-optimised SIMD code using standard ISAs?

then, given the Formal Correctness approach above, can we verify that the
mathematically-related opcodes do the job?

> (to be fair, there were tools to force you to improve coverage by
injecting faults to your RTL, e.g. it would virtually flip an `&&` to an
`||` and if none of your tests signaled an error it would complain that
your test coverage sucked.)

nice!

> Things might have changed.

nah.  this is such a complex area, run by few incumbent players, that
innovation is rare.  not least, innovation is different and cannot be
trusted by the Foundries!

> A good RTL would embed SystemVerilog Assertions or PSL Assertions as well.
> Some formal verification tools can understand a subset of SystemVerilog
Assertions / PSL assertions and validate that your RTL conformed to the
assertions, which would probably help cut down on the need for RTL
simulation.

interesting.

> Overall, my understanding is that smaller geometries are needed only if
you want to target a really high performance / unit cost and performance /
energy consumption ratios.
> That is, you would target smaller geometries for mining.

yes.

> If you need a secure tr\*sted computing module that does not need to be
fast or cheap, just very accurate to the required specification, the larger
geometries should be fine and you would be able to live almost entirely in
RTL-land without diving into netlist and layout specifications.

hardware wallet ASICs.

i concur.

> A wrinkle here is that licenses for tools from tr\*sted vendors like
Synopsys or Cadence are ***expensive***.

yes they are :)  we are currently working with Sorbonne University LIP6.fr
and Staf Verhaegen from Chips4Makers, trying a different approach:
coriolis2.

this will do fine up to 130nm (skywater).  beyond that, mmm, we need a few
more years.

> What is more, you should really buy two sets of licenses, e.g. do logic
synthesis with Synopsys and then formal verification with Cadence, because
you do not want to fully tr\*st just one vendor.

interesting, good advice.

> Synthesis in particular is a black box and each vendor keeps their
particular implementations and tricks secret.

sigh.  i think that's partly because they have to insert diodes, and
buffers, and generally mess with the netlist.

i was stunned to learn that in a 28nm ASIC, 50% of it is repeater-buffers!

plus, they make an awful lot of money, it is good business.

> Pointing some funding at the open-source Icarus Verilog might also fit,
as it lost its ability to do synthesis more than a decade ago due to
inability to maintain.

ah i didn't know it could do synthesis at all! i thought it was simulation
only.

> Note as well that I heard (at the time when I was in the industry) that
some foundries will not even accept a netlist unless it was created by a
synthesis tool from one of the major vendors (Synopsys, Cadence, Mentor
Graphics, maybe more I have forgotten since).

yes i heard this too, they don't want their time wasted: after all they
only make money by selling wafers, and if they can't sell any they have to
run empty wafers to keep the equipment at operating temperature.

if you book a slot 18 months in advance and the RTL doesn't work during
testing 3 months before the deadline they may not be able to find someone
else in time.

anything to reduce the risk there is good, so i totally get why.

thank you for the insights and the discussion, really appreciated.

l.

-- 
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68

[-- Attachment #2: Type: text/html, Size: 7570 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [bitcoin-dev] Libre/Open blockchain / cryptographic ASICs
  2021-02-03  3:17 ` ZmnSCPxj
@ 2021-02-03 14:07   ` Luke Kenneth Casson Leighton
  0 siblings, 0 replies; 13+ messages in thread
From: Luke Kenneth Casson Leighton @ 2021-02-03 14:07 UTC (permalink / raw)
  To: ZmnSCPxj; +Cc: Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 1178 bytes --]

On Wednesday, February 3, 2021, ZmnSCPxj <ZmnSCPxj@protonmail•com> wrote:
> Good morning again Luke,

:)

> If you mean miner power usage, then power efficiency will not reduce
energy consumption.

> Thus, any rational miner will just pack more miners in the same number of
watts rather than reduce their watt consumption.

yes, of course.  the same non-consumer-computing-intuitive logic applies to
purchasing decisions for beowulf clusters.

> Thus, increasing power efficiency for mining does not reduce the amount
of actual energy that will be consumed by Bitcoin mining.

arse.

and if everybody does that, then no matter the performance/watt nobody
"wins".  in fact a case could be made that everybody "loses".

my biggest concern here is that the inherent "arms race" results in very
few players being able to create bitcoin mining ASICs *at all*.

i mentioned earlier that geometry costs are an exponential scale.  3nm must
be somewhere around USD 16 million for production masks.

if there are only a few players that leaves the entirety of bitcoin open to
hardware backdoors.

l.

-- 
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68

[-- Attachment #2: Type: text/html, Size: 1458 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [bitcoin-dev] Libre/Open blockchain / cryptographic ASICs
  2021-02-03 13:24   ` Luke Kenneth Casson Leighton
@ 2021-02-11  8:20     ` ZmnSCPxj
  2021-02-13  6:10       ` ZmnSCPxj
  2021-02-13 17:19       ` Luke Kenneth Casson Leighton
  0 siblings, 2 replies; 13+ messages in thread
From: ZmnSCPxj @ 2021-02-11  8:20 UTC (permalink / raw)
  To: Luke Kenneth Casson Leighton; +Cc: Bitcoin Protocol Discussion

Good morning Luke,

> > (to be fair, there were tools to force you to improve coverage by injecting faults to your RTL, e.g. it would virtually flip an `&&` to an `||` and if none of your tests signaled an error it would complain that your test coverage sucked.)
>
> nice!

It should be possible for a tool to be developed to parse a Verilog RTL design, then generate a new version of it with one change.
Then you could add some automation to run a set of testcases around mutated variants of the design.
For example, it could create a "wrapper" module that connects to an unmutated differently-named version of the design, and various mutated versions, wire all their inputs together, then compare outputs.
If the testcase could trigger an output of a mutated version to be different from the reference version, then we would consider that mutation covered by that testcase.
Possibly that could be done with Verilog-2001 file writing code in the wrapper module to dump out which mutations were covered, then a summary program could just read in the generated file.
Or Verilog plugins could be used as well (Icarus supports this, that is how it implements all `$` functions).

A drawback is that just because an output is different does not mean the testcase actually ***checks*** that output.
If the testcase does not detect the diverging output it could still not be properly covering that.

The point of this is to check coverage of the tests.
Not sure how well this works with formal validation.

> > Synthesis in particular is a black box and each vendor keeps their particular implementations and tricks secret.
>
> sigh.  i think that's partly because they have to insert diodes, and buffers, and generally mess with the netlist.
>
> i was stunned to learn that in a 28nm ASIC, 50% of it is repeater-buffers!

Well, that surprises me as well.

On the other hand, smaller technologies consistently have lower raw output current driving capability due to the smaller size, and as trace width goes down and frequency goes up they stop acting like ideal 0-impedance traces and start acting more like transmission lines.
So I suppose at some point something like that would occur and I should not actually be surprised.
(Maybe I am more surprised that it reached that level at that technology size, I would have thought 33% at 7nm.)

In the modules where we were doing manual netlist+layout, we used inverting buffers instead (slightly smaller than non-inverrting buffers, in most technologies a non-inverting buffer is just an inverter followed by an inverting buffer), it was an advantage of manual design since it looks like synthesis tools are not willing to invert the contents of intermediate flip-lfops even if it could give theoretical speed+size advantage to use an inverting buffer rather than an non-inverting one (it looks like synthesis optimization starts at the output of flip-flops and ends at their input, so a manual designer could achieve slightly better performance if they were willing to invert an intermediate flip-flop).
Another was that inverting latches were smaller in the technology we were using than non-inverting latches, so it was perfectly natural for us to use an inverting latch and an inverting buffer on those parts where we needed higher fan-out (t was equivalent to a "custom" latch that had higher-than-normal driving capability).

Scan chain test generation was impossible though, as those require flip-flops, not latches.
Fortunately this was "just" deserialization of high-frequency low-width data with no transformation of the data (that was done after the deserialization, at lower clock speeds but higher data width, in pure RTL so flip-flops), so it was judged acceptable that it would not be covered by scan chain, since scan chain is primarily for testing combinational logic between flip-flops.
So we just had flip-flops at the input, and flip-flops at the output, and forced all latches to pass-through mode, during scan mode.
We just needed to have enough coverage to uncover stuck-at faults (which was still a pain, since additional test vectors slow down manufacturing so we had to reduce the test vectors to the minimum possible) in non-scan-momde testing.

Man, making ASICs was tough.

>
> plus, they make an awful lot of money, it is good business.
>
> > Pointing some funding at the open-source Icarus Verilog might also fit, as it lost its ability to do synthesis more than a decade ago due to inability to maintain.
>
> ah i didn't know it could do synthesis at all! i thought it was simulation only.

Icarus was the only open-source synthesis tool I could find back then, and it dropped synthesis capability fairly early due to maintenance burden (I never managed to get the old version with synthesis compiled and never managed actual synthesis on it, so my knowledge of it is theoretical).

There is an argument that open-source software is not truly open-source unless it can be compiled by open-source compilers or executed by open-source interpreters.
Similarly, I think open-source hardware RTL designs are not truly open-source if there are no open-source synthesis tools that can synthesize it to netlist and then lay it out.

Icarus can interpret most Veriog RTL designs, though.
However, at the time I left, I had already mandated that new code should use `always_comb` and `always_ff` (previously I had mandated that new code should use `always @*` for combinational logic) and was encouraging my subordinates to use loops and `generate`.
Icarus did not support `always_comb` and `always_ff` at the time (though worked perfectly fine with loops and `generate`).
In addition, at the time, we (actually just me in that company haha) were dabbling in object-oriented testing methodologies (which Icarus has no plans on ever implementing, which is understandable since it is a massive increase in complexity, it is much much harder than the scheduling shenanigans of `always_comb` and the "just treat it as `always`" of `always_ff`).

(Particularly, you need object-oriented testbenches since SystemVerilog includes a fuzz-testing framework to randomize fields of objects according to certain engineer-provided constraints, and then you would use those object fields to derive the test vectors your test framework would feed into the DUT, this was a massive increase in code coverage for a largish up-front cost but once you built the test framework you could just dump various constraints on your test specification objects, I actually caught a few bugs that we would not have otherwise found with our previous checklist-based testing methodology.)
(Unfortunately it turned out that it required a more expensive license and I ended up hogging the only one we had of that more expensive license (which, if I remember correctly, was the same license needed for formal verification of netlist<->RTL equivalence) for this, which killed enthusiasm for this technique, sigh, this is another argument for getting open-source hardware design tools developed; not much sense in having open-source RTL for a crypto device if you have to pay through the nose for a license just to synthesize it, never mind the manufacturing cost.)

-----------------------

Another point to ponder is test modes.

In mass production you **need** test modes.
There will always be some number of manufacturing defects because even the cleanest of cleanrooms *will* have a tiny amount of contaminants (what can go wrong will go wrong).
Test modes are run in manufacturing to filter out chips with failing circuitry due to contamination.

Now, a typical way of implementing test modes is to have a special command sent over, say, the "normal" serial port interface of a chip, which then enters various test modes to allow, say, scan chain testing.
Of course, scan chain testing is done by pushing test vectors into all flip-flops, and then the test is validated by pulsing global clock once (often the test mode forces all flip-flops on the same clock), then pulling data from all flip-flops to verify that all the circuitry works as designed.

The "pulling data from all flip-flops" is of course just another way of saying that all mass-produced chips have a way of letting ***anyone*** exfiltrate data from their flip-flops via test modes.

Thus, for a secure environment, you need to ensure that test modes cannot be entered after the device enters normal operation.
For example, you might have a dedicated pad which is normally pulled-down, but if at reset it is pulled up, the device enters test mode.
If at reset the pad is pulled down, the device is in normal mode and even if the pad is pulled up afterwards the device will not enter test mode.
This ensures that only reset data can be read from the device, without possibility of exfiltration of sensitive (key material or midstate) data.
The pad should also not be exposed as a package pinout except perhaps on DS and ES packages, and the pulldown resistor has to be on-chip.

As an additional precaution, we can also create a small secure memory (maybe 256 octet addressable would be more than enough).
It is possible to exempt flip-flops from scan chain generation (usually by explicitly instantiating flip-flops in a separate module and telling post-synthesis tools to exempt the module from scan chain synthesis).
This gives an extra layer of protection against test mode accessing sensitive data; even if we manage to screw up test mode and it is possible to force reset on the test mode circuit without resetting the rest of the design, sensitive data is still out of the scan chain.
Of course, since they are not on scan, it is possible they have undetectable manufacturing defects, so you would need to use some kind of ECC, or better triple-redundancy best-of-three, to protect against manufacturing defects on the non-scan flip-flops.
Fortunately non-scan flip-flops are often a good bit smaller than scan flip-flops, so the redundancy is not so onerous.
Since the ECC / best-of-three circuit itself would need to be tested, you would multiplex their inputs, in normal mode they get inputs from the non-scan-chain flip-flops, in test mode they get inputs from separate scan-chain flip-flops, so that the ECC / best-of-three circuit is testable at scan mode.
You would also need a separate test of the secure memory, this time running in normal mode with a special test program in the CPU, just in case.
Finally, you would explicitly lay them out "distributed" around the chip, since manufacturing defects tend to correlate in space (they are usually from dust, and dust particles can be large relative to cell size), you do not want all three of the best-of-three to have manufacturing defects.
For example, you could have a 256 x 8 non-scan-chain flip-flop module, instantiate three of those, and explicitly place them in corners of the digital area, then use a best-of-three circuit to resolve the "correct" value.

The test mode circuit itself could ensure that the device enters test mode if and only if the secure memory contains all 0 data after the test mode circuit is reset.
For example, the 256 x 8 non-scan-chain flip-flop module could have a large OR circuit that ORs all the flip-flops, then outputs a single bit that is the bitwise OR of all the flip-flop contents.
Then the test mode circuit gets the `in_use` outputs fo the three secure flip-flop modules, and if at reset any of them are `1` then it will refuse to enter test mode even if the test mode pad is pulled high.
This ensures that even if an attacker is somehow able to reset *only* the test mode circuit somehow (this is basic engineering, always assume something will go wrong), if the secure memory has any non-0 data (we presume it resets to 0), the device will still not enter test mode.

Of course, if the secure memory itself is accessible from the CPU, then it remains possible that a CPU program is reading from the secure area, keeping raw data in CPU registers, from which a test-mode might be able to extract if the device is somehow forced into test mode even after normal mode.
You could redesign your implementations of field multiplication and SHA midstate computation so that they directly read from the secure memory and write to the secure memory without using any flip-flops along the way, and have only the cryptographic circuit have access to the secure memory.
That way there is reduced possibility that intermediate flip-flops (that are part of the scan chain) outside the secure memory having sensitive key material or midstate data.
You would need to use a custom bus with separate read and write addresses, and non-pipelined unbuffered access, and since you want to distribute your secure memory physically distant, that translates to wide and long buses (it might be better to use 64 x 32 or 32 x 64 addressable memories, to increase what the cryptographic circuit has access to per clock cycle) screwing with your layout, and probably having to run the secure memory + crypto circuit at a ***much*** slower clock domain (but more secure is a good tradeoff for slowness).
Of course, that is a major design headache (the crypto circuit has to act mostly as a reduced-functionality processor), so you might just want to have the CPU directly access the secure memory and in early boot poke a `0x01` in some part of the memory, in the hope that the `in_use` flag in the previous paragraph is enough to suppress test modes from exfiltrating CPU registers.

Do note that with enough power-cycles and ESD noise you can put digital circuitry into really weird and unexpected states (seen it happen, though fairly hard to replicate, we had an ESD gun you could point at a chip to make it go into weird states), so being extra paranoid about test modes is important.
What can go wrong will go wrong!
In particular with "`TESTMODE_PAD` is only checked at reset" you would have to store `TESTMODE` in a non-scan flip-flop, and with enough targeted ESD that flip-flop can be jostled, setting `TESTMODE` even after normal operation.
You might instead want to use, say, a byte pattern instead of a single bit to represent `TESTMODE`, so the `TESTMODE` register has to have a specific value such as `0xA5`, so that targeted ESD has to be very lucky in order to force your device into test mode.
For example, since you need to check the `TESTMODE` pad at reset anyway, you could do something like this:

    input CLK, RESET_N, TESTMODE_PAD, IN_USE0, IN_USE1, IN_USE2;
    output reg TESTMODE;

    wire in_use = IN_USE0 || IN_USE1 || IN_USE2;

    reg [7:0] testmode_ff;
    wire [7:0] next_testmode_ff =
        (testmode_ff == 8'hA5 || testmode_ff == 8'h00) ?
          (TESTMODE_PAD && !in_use) ?                      8'hA5 :
          /*otherwise*/                                    8'h5A :
        /*otherwise*/                                      testmode_ff ;
    always_ff @(posedge CLK, negedge RESET_N) begin
        if (!RESET_N) testmode_ff <= 0x00;
        else          testmode_ff <= next_testmode_ff; end

    wire next_TESTMODE = (testmode_ff == 8'hA5);
    always_ff @(posedge CLK, negedge RESET_N) begin
        if (!RESET_N) TESTMODE <= 1'b0;
        else          TESTMODE <= next_TESTMODE; end

Do note that the `TESTMODE` is a flip-flop, since you do ***not*** want glitches on the `TESTMODE` signal line, it would be horribly unsafe to output it from combinational circuitry directly, please do not do that.
Of course that flip-flop can instead be the target of ESD gunnery, but since you need many clock pulses to read the scan chain, it should with good probability also get set to `0` on the next clock pulse and leave test mode (and probably crash the device as well until full reset, but this "fails safe" since at least sensitive data cannot be extracted).
`TESTMODE` has no feedback, thus cannot be stuck in a state loop.
`testmode_ff` *can* be stuck in a state loop, but that is deliberate, as it would "fail safe" if it gets a value other than `0xA5`, it would not enter test mode (and if it enters `0xA5` it can easily leave test mode by either `TESTMODE_PAD` or `in_use`).

(Sure, an attacker can try targeted ESD at the `TESTMODE` flip-flop repeatedly, but this risks also flipping other scan flip-flops that contain the data that is being extracted, so this might be sufficient protection in practice.)

If you are really going to open-source the hardware design then the layout is also open and attackers can probably target specific chip area for ESD pulse to try a flip-flop upset, so you need to be extra careful.
Note as well that even closed-source "secure" elements can be reverse-engineered (I used to do this in the IC design job as a junior engineer, it was the sort of shitty brain-numbing work forced on new hires), so security-by-obscurity does have a limit as well, it should be possible to try to figure out the testmode circuitry on "secure" elements and try to get targeted ESD upsets at flip-flops on the testmode circuit.

Test mode design is something of an arcane art, especially if you are trying to build a security device, on the one hand you need to ensure you deliver devices without manufacturing defects, on the other hand you need to ensure that the test mode is not entered inadvertently by strange conditions.

In general, because test modes are such a pain to deal with securely, and are an absolute necessity for mass production, you should assume that any "secure" chip can be broken by physical access and shooting short-range ESD pulses at it to try to get it into some test mode, unless it is openly designed to prevent test mode from persisting after entering normal mode, as above.

(No idea how that ESD gun thing worked or what it was formally called, we just called it the ESD gun, it was an amusing toy, you point it at the DUT and pull the trigger and suddenly it would switch modes, this of course was a bad thing since you want to make sure that as much as possible such upsets do not cause the chip to enter an irrecoverable mode but an amusing thing to do still, we even had small amounts of flash memory containing register settings that we would load into the settings registers periodically at the end of each display frame to protect against this kind of ESD gun thing since the flip-flops backing the settings registers were vulnerable to it and we needed a way to preserve the settings of the customer for the IC, the expected effect would be to cause the display to flicker.)

Regards,
ZmnSCPxj

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [bitcoin-dev] Libre/Open blockchain / cryptographic ASICs
  2021-02-11  8:20     ` ZmnSCPxj
@ 2021-02-13  6:10       ` ZmnSCPxj
  2021-02-13  9:29         ` Luke Kenneth Casson Leighton
  2021-02-13 17:19       ` Luke Kenneth Casson Leighton
  1 sibling, 1 reply; 13+ messages in thread
From: ZmnSCPxj @ 2021-02-13  6:10 UTC (permalink / raw)
  To: Luke Kenneth Casson Leighton; +Cc: Bitcoin Protocol Discussion

Good morning Luke,

Another thing we can do with scan mode would be something like the below masking:

    input CLK, RESET_N;
    input TESTMODE;
    input SCANOUT_INTERNAL;
    output SCANOUT_PAD;

    reg gating;
    wire n_gating = gating && TESTMODE;
    always_ff @(posedge CLK, negedge RESET_N) begin
      if (!RESET_N)   gating <= 1'b1; /*RESET-HIGH*/
      else            gating <= n_gating; end

    assign SCANOUT_PAD = SCANOUT_INTERNAL && gating;

The `gating` means that after reset, if we are not in test mode, `gating` becomes 0 permanently and prevents any scan data from being extracted.
Assuming scan is not used in normal operation (it should not) then inadvertent ESD noise on the `gating` flip-flop would not have an effect.

Output being combinational should be fine as the output is "just" an AND gate, as long as `gating` does not transition from 0->1 (impossible in normal operation, only at reset condition) then glitching is impossible, and when scan is running then `TESTMODE` should not be exited which means `gating` should remain high as well, thus output is still glitch-free.

Since the flip-flop resets to 1, and in some technologies I have seen a reset-to-0 FF is slightly smaller than a reset-to-1 FF, it might do good to invert the sense of `gating` instead, and use a NOR gate at the output (which might also be smaller than an AND gate, look it up in the technology you are targeting).
On the other hand the above is a tiny circuit already and it is unlikely you need more than one of it (well for large enough ICs you might want more than one scan chain but still, even the largest ICs we handled never had more than 8 scan chains, usually just 4 to 6) so overoptimizing this is not necessary.

Regards,
ZmnSCPxj

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [bitcoin-dev] Libre/Open blockchain / cryptographic ASICs
  2021-02-13  6:10       ` ZmnSCPxj
@ 2021-02-13  9:29         ` Luke Kenneth Casson Leighton
       [not found]           ` <CAPweEDymve0zRaqN9yEGHyOeuaSLEYWQ0K2h6usWbXiV=HkOzA@mail.gmail.com>
  0 siblings, 1 reply; 13+ messages in thread
From: Luke Kenneth Casson Leighton @ 2021-02-13  9:29 UTC (permalink / raw)
  To: ZmnSCPxj; +Cc: Bitcoin Protocol Discussion

On Sat, Feb 13, 2021 at 6:10 AM ZmnSCPxj <ZmnSCPxj@protonmail•com> wrote:
>
> Good morning Luke,

morning - can i ask you a favour because moderated (off-topic)
messages are being forwarded
https://lists.ozlabs.org/pipermail/bitcoin-dev-moderation/

could you send these instead to libre-soc-dev@lists•libre-soc.org?

many thanks,

l.

> Another thing we can do with scan mode would be something like the below masking:
>
>     input CLK, RESET_N;
>     input TESTMODE;
>     input SCANOUT_INTERNAL;
>     output SCANOUT_PAD;
>
>     reg gating;
>     wire n_gating = gating && TESTMODE;
>     always_ff @(posedge CLK, negedge RESET_N) begin
>       if (!RESET_N)   gating <= 1'b1; /*RESET-HIGH*/
>       else            gating <= n_gating; end
>
>     assign SCANOUT_PAD = SCANOUT_INTERNAL && gating;
>
> The `gating` means that after reset, if we are not in test mode, `gating` becomes 0 permanently and prevents any scan data from being extracted.
> Assuming scan is not used in normal operation (it should not) then inadvertent ESD noise on the `gating` flip-flop would not have an effect.
>
> Output being combinational should be fine as the output is "just" an AND gate, as long as `gating` does not transition from 0->1 (impossible in normal operation, only at reset condition) then glitching is impossible, and when scan is running then `TESTMODE` should not be exited which means `gating` should remain high as well, thus output is still glitch-free.
>
> Since the flip-flop resets to 1, and in some technologies I have seen a reset-to-0 FF is slightly smaller than a reset-to-1 FF, it might do good to invert the sense of `gating` instead, and use a NOR gate at the output (which might also be smaller than an AND gate, look it up in the technology you are targeting).
> On the other hand the above is a tiny circuit already and it is unlikely you need more than one of it (well for large enough ICs you might want more than one scan chain but still, even the largest ICs we handled never had more than 8 scan chains, usually just 4 to 6) so overoptimizing this is not necessary.
>
>
> Regards,
> ZmnSCPxj


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [bitcoin-dev] Libre/Open blockchain / cryptographic ASICs
       [not found]           ` <CAPweEDymve0zRaqN9yEGHyOeuaSLEYWQ0K2h6usWbXiV=HkOzA@mail.gmail.com>
@ 2021-02-13 14:59             ` Bryan Bishop
  2021-02-13 16:44               ` Luke Kenneth Casson Leighton
  0 siblings, 1 reply; 13+ messages in thread
From: Bryan Bishop @ 2021-02-13 14:59 UTC (permalink / raw)
  To: Luke Kenneth Casson Leighton, ZmnSCPxj, Bitcoin Dev, libre-soc-dev

[-- Attachment #1: Type: text/plain, Size: 896 bytes --]

On Sat, Feb 13, 2021 at 4:18 AM Luke Kenneth Casson Leighton <lkcl@lkcl•net>
wrote:

> ... actually i don't see them in the bounces.  what's happening there?
>
> On Saturday, February 13, 2021, Luke Kenneth Casson Leighton <
> lkcl@lkcl•net> wrote:
> > On Sat, Feb 13, 2021 at 6:10 AM ZmnSCPxj <ZmnSCPxj@protonmail•com>
> wrote:
> >> Good morning Luke,
> >
> > morning - can i ask you a favour because moderated (off-topic)
> > messages are being forwarded
> > https://lists.ozlabs.org/pipermail/bitcoin-dev-moderation/
> >
> > could you send these instead to libre-soc-dev@lists•libre-soc.org?
>

I don't see what you're talking about? None of your February emails were
sent to ozlabs according to the archives there. Threads for the bitcoin-dev
mailing list are stored here:
https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2021-February/thread.html

- Bryan
https://twitter.com/kanzure

[-- Attachment #2: Type: text/html, Size: 1724 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [bitcoin-dev] Libre/Open blockchain / cryptographic ASICs
  2021-02-13 14:59             ` Bryan Bishop
@ 2021-02-13 16:44               ` Luke Kenneth Casson Leighton
  0 siblings, 0 replies; 13+ messages in thread
From: Luke Kenneth Casson Leighton @ 2021-02-13 16:44 UTC (permalink / raw)
  To: Bryan Bishop; +Cc: Bitcoin Dev, Libre-Soc General Development

On Sat, Feb 13, 2021 at 3:01 PM Bryan Bishop <kanzure@gmail•com> wrote:

> I don't see what you're talking about? None of your February emails
> were sent to ozlabs according to the archives there. Threads for the
> bitcoin-dev mailing list are stored here:
> https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2021-February/thread.html

... i am very confused, and also did not mean to send this to the list
at all!  with many apologies for taking up peoples' time here.

l.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [bitcoin-dev] Libre/Open blockchain / cryptographic ASICs
  2021-02-11  8:20     ` ZmnSCPxj
  2021-02-13  6:10       ` ZmnSCPxj
@ 2021-02-13 17:19       ` Luke Kenneth Casson Leighton
  2021-02-14  0:27         ` ZmnSCPxj
  1 sibling, 1 reply; 13+ messages in thread
From: Luke Kenneth Casson Leighton @ 2021-02-13 17:19 UTC (permalink / raw)
  To: ZmnSCPxj, Libre-Soc General Development; +Cc: Bitcoin Protocol Discussion

(cc'ing over to libre-soc-dev)
https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2021-February/018392.html

On Thu, Feb 11, 2021 at 8:21 AM ZmnSCPxj <ZmnSCPxj@protonmail•com> wrote:

> > i was stunned to learn that in a 28nm ASIC, 50% of it is repeater-buffers!
>
> Well, that surprises me as well.
> [...]
> So I suppose at some point something like that would occur and I should not actually be surprised.
> (Maybe I am more surprised that it reached that level at that technology size, I would have thought 33% at 7nm.)

it's about line-drive strength: lower geometries are even *less* able
to line-drive long distances.

> Another point to ponder is test modes.
> In mass production you **need** test modes.

> (Sure, an attacker can try targeted ESD at the `TESTMODE` flip-flop repeatedly, but this risks also flipping other scan flip-flops that contain the data that is being extracted, so this might be sufficient protection in practice.)

if however the ASIC can be flipped into TESTMODE and yet it carries on
otherwise working, an algorithm can be re-run and the exposed data
will be clean.

> If you are really going to open-source the hardware design then the layout
> is also open and attackers can probably target specific chip area for ESD
> pulse to try a flip-flop upset, so you need to be extra careful.

this is extremely valuable advice.  in the followup [1] you describe a
gating method: this we have already deployed on a couple of places in
case the Libre Cell Library (also being developed at the same time by
Staf Verhaegen of Chips4Makers) causes errors: we do not want, for
example, an error in a Cell Library to cause a permanent HI which
locks us from being able to perform testing of other areas of the
ASIC.

the idea of being able to actually randomly flip bits inside an ASIC
from outside is both hilarious and entirely news to me, yet it sounds
to be exactly the kind of thing that would allow an attacker to
compromise a hardware wallet.  potentially destructively, mind, but
compromise all the same.

beyond even what the trezor team discovered [2] it makes it even more
important that wallet ASICs be Libre/Open.

l.

[1] https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2021-February/018412.html
[2] https://blog.trezor.io/introducing-tropic-square-why-transparency-matters-a895dab12dd3

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [bitcoin-dev] Libre/Open blockchain / cryptographic ASICs
  2021-02-13 17:19       ` Luke Kenneth Casson Leighton
@ 2021-02-14  0:27         ` ZmnSCPxj
  0 siblings, 0 replies; 13+ messages in thread
From: ZmnSCPxj @ 2021-02-14  0:27 UTC (permalink / raw)
  To: Luke Kenneth Casson Leighton
  Cc: Bitcoin Protocol Discussion, Libre-Soc General Development

Good morning Luke,

> > Another point to ponder is test modes.
> > In mass production you need test modes.
>
> > (Sure, an attacker can try targeted ESD at the `TESTMODE` flip-flop repeatedly, but this risks also flipping other scan flip-flops that contain the data that is being extracted, so this might be sufficient protection in practice.)
>
> if however the ASIC can be flipped into TESTMODE and yet it carries on
> otherwise working, an algorithm can be re-run and the exposed data
> will be clean.

But in most testmodes I have seen (and designed) all clocks are driven externally from a different pin (usually the serial interface) when in testmode.
If the CPU clock is now controlled by the attacker, how do you run any kind of algorithm?

(This could be an artifact of how my old design company designed testmodes, YMMV.)

Really the concern here is that testmode is entered while the CPU has key material loaded into registers, or caches, then it is possible, if those registers/caches are in the scan chain, to exfiltrate data.
Does not matter if the chip is now in a mode that cannot execute algorithms, if it was doing any kind of computation involving privkeys (including say deriving its public key so that PC-side hardware can get the `xpub`) then key material may be in scan chain registers, clock is now controlled by the attacker, and possibly scan mode as well (which disables combinational circuitry thus none of your algorithms can run).

>
> > If you are really going to open-source the hardware design then the layout
> > is also open and attackers can probably target specific chip area for ESD
> > pulse to try a flip-flop upset, so you need to be extra careful.
>
> this is extremely valuable advice. in the followup [1] you describe a
> gating method: this we have already deployed on a couple of places in
> case the Libre Cell Library (also being developed at the same time by
> Staf Verhaegen of Chips4Makers) causes errors: we do not want, for
> example, an error in a Cell Library to cause a permanent HI which
> locks us from being able to perform testing of other areas of the
> ASIC.
>
> the idea of being able to actually randomly flip bits inside an ASIC
> from outside is both hilarious and entirely news to me, yet it sounds
> to be exactly the kind of thing that would allow an attacker to
> compromise a hardware wallet. potentially destructively, mind, but
> compromise all the same.

Certainly outside of the the old company design philosophy I have seen many experts strongly protest against a design philosophy which assumes that any flip-flop could randomly switch.

Yet the design philosophy within the old company always had this assumption, supposedly (according to in-company lore) because previous engineers had actually found the hard way that random bitflips did occur, and for e.g. automobile chips the risk was too great to not have strong mitigations:

* State machines had to force unused states into known states.
  For example a state machine with 3 states needs 2 bits of state, but 2 bits of state is actually 4 states, so there is a 4th unused state.
  * Not all state machines needed this rule but during planning we had to identify state machines that needed this rule, and often we just targeted having 2^n states just to ensure that there were no unused states.
  * I even suggested the use of ECC encoding for important state machines and it was something being investigated at the time I left.
* State machines that otherwise did not need the above rule were strongly encouraged to clear state at display frame vsync.
  This ensured that any unexpected states they had would only last up to one display frame, which was considered acceptable.
* Flip-flops that held settings were periodically reloaded at each display frame vsync from a flash memory (which apparently as a lot more immune to bitflips).

It could be an artifact as well that the company had its own in-house foundry rather than delegate out to TSMC or whatnot --- maybe the technology we had was just suckier than state-of-the-art so bitflips were more common.

The reason why this stuck to mind is because at one time we had a DS test where shooting the ESD gun could sometimes cause the chip to fail (blank display) until reset, when the expectation was that at most it would flicker for one display frame.
And afterwards we had to go through the entire RTL looking for which state machine or settings register was the culprit.
I even wrote a little Verilog-PLI plugin that would inject deterministically random data into flip-flops in the model to try to catch it.
Eventually we found a bunch of possible root causes, and on the next DS iteration testing we had fun shooting the chip with the ESD gun over and over again and sighing in relief that the display was not failing for more than one frame.

The chip was a display driver for automotive, apparently at the time cars were starting to transition to using LCD for things like speedometer and accelerometer rather than physical dials.
And of course the display suddenly switching off while the car is running at high speed due to some extra-powerful pulse elsewhere was potentially dangerous and could distract the driver, so that is why we were paranoid about such sudden bitflips potentially leading to such massive cascade of upsets as to make the display fail permanently.

I think being excessively cautious for cryptographic chips should be standard as well.
And certainly test mode exfiltration of data is always an issue, JTAG is very standard way of reading memory.

Regards,
ZmnSCPxj

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-02-14  0:27 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-25 18:00 [bitcoin-dev] Libre/Open blockchain / cryptographic ASICs Luke Kenneth Casson Leighton
2021-01-26 10:47 ` Pavol Rusnak
2021-02-03  2:06 ` ZmnSCPxj
2021-02-03 13:24   ` Luke Kenneth Casson Leighton
2021-02-11  8:20     ` ZmnSCPxj
2021-02-13  6:10       ` ZmnSCPxj
2021-02-13  9:29         ` Luke Kenneth Casson Leighton
     [not found]           ` <CAPweEDymve0zRaqN9yEGHyOeuaSLEYWQ0K2h6usWbXiV=HkOzA@mail.gmail.com>
2021-02-13 14:59             ` Bryan Bishop
2021-02-13 16:44               ` Luke Kenneth Casson Leighton
2021-02-13 17:19       ` Luke Kenneth Casson Leighton
2021-02-14  0:27         ` ZmnSCPxj
2021-02-03  3:17 ` ZmnSCPxj
2021-02-03 14:07   ` Luke Kenneth Casson Leighton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox