public inbox for bitcoindev@googlegroups.com
 help / color / mirror / Atom feed
* [bitcoin-dev] Beyond Jets: Microcode: Consensus-Critical Jets Without Softforks
@ 2022-03-22  5:37 ZmnSCPxj
  2022-03-22 15:08 ` Russell O'Connor
  2022-03-22 23:11 ` Anthony Towns
  0 siblings, 2 replies; 8+ messages in thread
From: ZmnSCPxj @ 2022-03-22  5:37 UTC (permalink / raw)
  To: bitcoin-dev

Good morning list,

It is entirely possible that I have gotten into the deep end and am now drowning in insanity, but here goes....

Subject: Beyond Jets: Microcode: Consensus-Critical Jets Without Softforks

Introduction
============

Recent (Early 2022) discussions on the bitcoin-dev mailing
list have largely focused on new constructs that enable new
functionality.

One general idea can be summarized this way:

* We should provide a very general language.
  * Then later, once we have learned how to use this language,
    we can softfork in new opcodes that compress sections of
    programs written in this general language.

There are two arguments against this style:

1.  One of the most powerful arguments the "general" side of
    the "general v specific" debate is that softforks are
    painful because people are going to keep reiterating the
    activation parameters debate in a memoryless process, so
    we want to keep the number of softforks low.
    * So, we should just provide a very general language and
      never softfork in any other change ever again.
2.  One of the most powerful arguments the "general" side of
    the "general v specific" debate is that softforks are
    painful because people are going to keep reiterating the
    activation parameters debate in a memoryless process, so
    we want to keep the number of softforks low.
    * So, we should just skip over the initial very general
      language and individually activate small, specific
      constructs, reducing the needed softforks by one.

By taking a page from microprocessor design, it seems to me
that we can use the same above general idea (a general base
language where we later "bless" some sequence of operations)
while avoiding some of the arguments against it.

Digression: Microcodes In CISC Microprocessors
----------------------------------------------

In the 1980s and 1990s, two competing microprocessor design
paradigms arose:

* Complex Instruction Set Computing (CISC)
  - Few registers, many addressing/indexing modes, variable
    instruction length, many obscure instructions.
* Reduced Instruction Set Computing (RISC)
  - Many registers, usually only immediate and indexed
    addressing modes, fixed instruction length, few
    instructions.

In CISC, the microprocessor provides very application-specific
instructions, often with a small number of registers with
specific uses.
The instruction set was complicated, and often required
multiple specific circuits for each application-specific
instruction.
Instructions had varying sizes and varying number of cycles.

In RISC, the micrprocessor provides fewer instructions, and
programmers (or compilers) are supposed to generate the code
for all application-specific needs.
The processor provided large register banks which could be
used very generically and interchangeably.
Instructions had the same size and every instruction took a
fixed number of cycles.

In CISC you usually had shorter code which could be written
by human programmers in assembly language or machine language.
In RISC, you generally had longer code, often difficult for
human programmers to write, and you *needed* a compiler to
generate it (unless you were very careful, or insane enough
you could scroll over multiple pages of instructions without
becoming more insane), or else you might forget about stuff
like jump slots.

For the most part, RISC lost, since most modern processors
today are x86 or x86-64, an instruction set with varying
instruction sizes, varying number of cycles per instruction,
and complex instructions with application-specific uses.

Or at least, it *looks like* RISC lost.
In the 90s, Intel was struggling since their big beefy CISC
designs were becoming too complicated.
Bugs got past testing and into mass-produced silicon.
RISC processors were beating the pants off 386s in terms of
raw number of computations per second.

RISC processors had the major advantage that they were
inherently simpler, due to having fewer specific circuits
and filling up their silicon with general-purpose registers
(which are large but very simple circuits) to compensate.
This meant that processor designers could fit more of the
design in their merely human meat brains, and were less
likely to make mistakes.
The fixed number of cycles per instruction made it trivial
to create a fixed-length pipeline for instruction processing,
and practical RISC processors could deliver one instruction
per clock cycle.
Worse, the simplicity of RISC meant that smaller and less
experienced teams could produce viable competitors to the
Intel x86s.

So what Intel did was to use a RISC processor, and add a
special Instruction Decoder unit.
The Instruction Decoder would take the CISC instruction
stream accepted by classic Intel x86 processors, and emit
RISC instructions for the internal RISC processor.
CISC instructions might be variable length and have variable
number of cycles, but the emitted RISC instructions were
individually fixed length and fixed number of cycles.
A CISC instruction might be equivalent to a single RISC
instruction, or several.

With this technique, Intel could deliver performance
approaching their RISC-only competition, while retaining
back-compatibility with existing software written for their
classic CISC processors.

At its core, the Instruction Decoder was a table-driven
parser.
This lookup table could be stored into on-chip flash memory.
This had the advantage that the on-chip flash memory could be
updated in case of bugs in the implementation of CISC
instructions.
This on-chip flash memory was then termed "microcode".

Important advantages of this "microcode" technique were:

* Back-compatibility with existing instruction sets.
* Easier and more scalable underlying design due to ability
  to use RISC techniques while still supporting CISC instruction
  sets.
* Possible to fix bugs in implementations of complex CISC
  instructions by uploading new microcode.

(Obviously I have elided a bunch of stuff, but the above
rough sketch should be sufficient as introduction.)

Bitcoin Consensus Layer As Hardware
-----------------------------------

While Bitcoin fullnode implementations are software, because
of the need for consensus, this software is not actually very
"soft".
One can consider that, just as it would take a long time for
new hardware to be designed with a changed instruction set,
it is similarly taking a long time to change Bitcoin to
support changed feature sets.

Thus, we should really consider the Bitcoin consensus layer,
and its SCRIPT, as hardware that other Bitcoin software and
layers run on top of.

This thus opens up the thought of using techniques that were
useful in hardware design.
Such as microcode: a translation layer from "old" instruction
sets to "new" instruction sets, with the ability to modify this
mapping.

Microcode For Bitcoin SCRIPT
============================

I propose:

* Define a generic, low-level language (the "RISC language").
* Define a mapping from a specific, high-level language to
  the above language (the microcode).
* Allow users to sacrifice Bitcoins to define a new microcode.
* Have users indicate the microcode they wish to use to
  interpret their Tapscripts.

As a concrete example, let us consider the current Bitcoin
SCRIPT as the "CISC" language.

We can then support a "RISC" language that is composed of
general instructions, such as arithmetic, SECP256K1 scalar
and point math, bytevector concatenation, sha256 midstates,
bytevector bit manipulation, transaction introspection, and
so on.
This "RISC" language would also be stack-based.
As the "RISC" language would have more possible opcodes,
we may need to use 2-byte opcodes for the "RISC" language
instead of 1-byte opcodes.
Let us call this "RISC" language the micro-opcode language.

Then, the "microcode" simply maps the existing Bitcoin
SCRIPT `OP_` codes to one or more `UOP_` micro-opcodes.

An interesting fact is that stack-based languages have
automatic referential transparency; that is, if I define
some new word in a stack-based language and use that word,
I can replace verbatim the text of the new word in that
place without issue.
Compare this to a language like C, where macro authors
have to be very careful about inadvertent variable
capture, wrapping `do { ... } while(0)` to avoid problems
with `if` and multiple statements, multiple execution, and
so on.

Thus, a sequence of `OP_` opcodes can be mapped to a
sequence of equivalent `UOP_` micro-opcodes without
changing the interpretation of the source language, an
important property when considering such a "compiled"
language.

We start with a default microcode which is equivalent
to the current Bitcoin language.
When users want to define a new microcode to implement
new `OP_` codes or change existing `OP_` codes, they
can refer to a "base" microcode, and only have to
provide the new mappings.

A microcode is fundamentally just a mapping from an
`OP_` code to a variable-length sequence of `UOP_`
micro-opcodes.

```Haskell
import Data.Map
-- type Opcode
-- type UOpcode
newtype Microcode = Microcode (Map.Map Opcode [UOpcode])
```

Semantically, the SCRIPT interpreter processes `UOP_`
micro-opcodes.

```Haskell
-- instance Monad Interpreter -- can `fail`.
interpreter :: Transaction -> TxInput -> [UOpcode] -> Interpreter ()
```

Example
-------

Suppose a user wants to re-enable `OP_CAT`, and nothing
else.

That user creates a microcode, referring to the current
default Bitcoin SCRIPT microcode as the "base".
The base microcode defines `OP_CAT` as equal to the
sequence `UOP_FAIL` i.e. a micro-opcode that always fails.
However, the new microcode will instead redefine the
`OP_CAT` as the micro-opcode sequence `UOP_CAT`.

Microcodes then have a standard way of being represented
as a byte sequence.
The user serializes their new microcode as a byte
sequence.

Then, the user creates a new transaction where one of
the outputs contains, say, 1.0 Bitcoins (exact required
value TBD), and has the `scriptPubKey` of
`OP_TRUE OP_RETURN <serialized_microcode>`.
This output is a "microcode introduction output", which
is provably unspendable, thus burning the Bitcoins.

(It need not be a single user, multiple users can
coordinate by signing a single transaction that commits
their funds to the microcode introduction.)

Once the above transaction has been deeply confirmed,
the user can then take the hash of the microcode
serialization.
Then the user can use a SCRIPT with `OP_CAT` enabled,
by using a Tapscript with, say, version `0xce`, and
with the SCRIPT having the microcode hash as its first
bytes, followed by the `OP_` codes.

Fullnodes will then process recognized microcode
introduction outputs and store mappings from their
hashes to the microcodes in a new microcodes index.
Fullnodes can then process version-`0xce` Tapscripts
by checking if the microcodes index has the indicated
microcode hash.

Semantically, fullnodes take the SCRIPT, and for each
`OP_` code in it, expands it to a sequence of `UOP_`
micro-opcodes, then concatenates each such sequence.
Then, the SCRIPT interpreter operates over a sequence
of `UOP_` micro-opcodes.

Optimizing Microcodes
---------------------

Suppose there is some new microcode that users have
published onchain.

We want to be able to execute the defined microcode
faster than expanding an `OP_`-code SCRIPT to a
`UOP_`-code SCRIPT and having an interpreter loop
over the `UOP_`-code SCRIPT.

We can use LLVM.

WARNING: LLVM might not be appropriate for
network-facing security-sensitive applications.
In particular, LLVM bugs. especially nondeterminism
bugs, can lead to consensus divergence and disastrous
chainsplits!
On the other hand, LLVM bugs are compiler bugs and
the same bugs can hit the static compiler `cc`, too,
since the same LLVM code runs in both JIT and static
compilation, so this risk already exists for Bitcoin.
(i.e. we already rely on LLVM not being buggy enough
to trigger Bitcoin consensus divergence, else we would
have written Bitcoin Core SCRIPT interpreter in
assembly.)

Each `UOP_`-code has an equivalent tree of LLVM code.
For each `Opcode` in the microcode, we take its
sequence of `UOpcode`s and expand them to this tree,
concatenating the equivalent trees for each `UOpcode`
in the sequence.
Then we ask LLVM to JIT-compile this code to a new
function, running LLVM-provided optimizers.
Then we put a pointer to this compiled function to a
256-long array of functions, where the array index is
the `OP_` code.

The SCRIPT interpreter then simply iterates over the
`OP_` code SCRIPT and calls each of the JIT-compiled
functions.
This reduces much of the overhead of the `UOP_` layer
and makes it approach the current performance of the
existing `OP_` interpreter.

For the default Bitcoin SCRIPT, the opcodes array
contains pointers to statically-compiled functions.
A microcode that is based on the default Bitcoin
SCRIPT copies this opcodes array, then overwrites
the entries.

Future versions of Bitcoin Core can "bless"
particular microcodes by providing statically-compiled
functions for those microcodes.
This leads to even better performance (there is
no need to recompile ancient onchain microcodes each
time Bitcoin Core starts) without any consensus
divergence.
It is a pure optimization and does not imply a
tightening of rules, and is thus not a softfork.

(To reduce the chance of network faults being used
to poke into `W|X` memory (since `W|X` memory is
needed in order to actually JIT compile) we can
isolate the SCRIPT interpreter into its own process
separate from the network-facing code.
This does imply additional overhead in serializing
transactions we want to ask the SCRIPT interpreter
to validate.)

Comparison To Jets
------------------

This technique allows users to define "jets", i.e.
sequences of low-level general operations that users
have determined are common enough they should just
be implemented as faster code that is executed
directly by the underlying hardware processor rather
than via a software interpreter.
Basically, each redefined `OP_` code is a jet of a
sequence of `UOP_` micro-opcodes.

We implement this by dynamically JIT-compiling the
proposed jets, as described above.
SCRIPTs using jetted code remain smaller, as the
jet definition is done in a previous transaction and
does not require copy-pasta (Do Not Repeat Yourself!).
At the same time, jettification is not tied to
developers, thus removing the need to keep softforking
new features --- we only need define a sufficiently
general language and then we can implement pretty much
anything worth implementing (and a bunch of other things
that should not be implemented, but hey, users gonna
use...).

Bugs in existing microcodes can be fixed by basing a
new microcode from the existing microcode, and
redefining the buggy implementation.
Existing Tapscripts need to be re-spent to point to
the new bugfixed microcode, but if you used the
point-spend branch as an N-of-N of all participants
you have an upgrade mechanism for free.

In order to ensure that the JIT-compilation of new
microcodes is not triggered trivially, we require
that users petitioning for the jettification of some
operations (i.e. introducing a new microcode) must
sacrifice Bitcoins.

Burning Bitcoins is better than increasing the weight
of microcode introduction outputs; all fullnodes are
affected by the need to JIT-compile the new microcode,
so they benefit from the reduction in supply, thus
getting compensated for the work of JIT-compiling the
new microcode.
Ohter mechanisms for making microcode introduction
outputs expensive are also possible.

Nothing really requires that we use a stack-based
language for this; any sufficiently FP language
should allow referential transparency.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [bitcoin-dev] Beyond Jets: Microcode: Consensus-Critical Jets Without Softforks
  2022-03-22  5:37 [bitcoin-dev] Beyond Jets: Microcode: Consensus-Critical Jets Without Softforks ZmnSCPxj
@ 2022-03-22 15:08 ` Russell O'Connor
  2022-03-22 16:22   ` ZmnSCPxj
  2022-03-22 23:11 ` Anthony Towns
  1 sibling, 1 reply; 8+ messages in thread
From: Russell O'Connor @ 2022-03-22 15:08 UTC (permalink / raw)
  To: ZmnSCPxj, Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 18815 bytes --]

Setting aside my thoughts that something like Simplicity would make a
better platform than Bitcoin Script (due to expression operating on a more
narrow interface than the entire stack (I'm looking at you OP_DEPTH)) there
is an issue with namespace management.

If I understand correctly, your implication was that once opcodes are
redefined by an OP_RETURN transaction, subsequent transactions of that
opcode refer to the new microtransaction.  But then we have a race
condition between people submitting transactions expecting the outputs to
refer to the old code and having their code redefined by the time they do
get confirmed  (or worse having them reorged).

I've partially addressed this issue in my Simplicity design where the
commitment of a Simplicity program in a scriptpubkey covers the hash of the
specification of the jets used, which makes commits unambiguously to the
semantics (rightly or wrongly).  But the issue resurfaces at redemption
time where I (currently) have a consensus critical map of codes to jets
that is used to decode the witness data into a Simplicity program.  If one
were to allow this map of codes to jets to be replaced (rather than just
extended) then it would cause redemption to fail, because the hash of the
new jets would no longer match the hash of the jets appearing the the
input's scriptpubkey commitment.  While this is still not good and I don't
recommend it, it is probably better than letting the semantics of your
programs be changed out from under you.

This comment is not meant as an endorsement of ths idea, which is a little
bit out there, at least as far as Bitcoin is concerned. :)

My long term plans are to move this consensus critical map of codes out of
the consensus layer and into the p2p layer where peers can negotiate their
own encodings between each other.  But that plan is also a little bit out
there, and it still doesn't solve the issue of how to weight reused jets,
where weight is still consensus critical.

On Tue, Mar 22, 2022 at 1:37 AM ZmnSCPxj via bitcoin-dev <
bitcoin-dev@lists•linuxfoundation.org> wrote:

> Good morning list,
>
> It is entirely possible that I have gotten into the deep end and am now
> drowning in insanity, but here goes....
>
> Subject: Beyond Jets: Microcode: Consensus-Critical Jets Without Softforks
>
> Introduction
> ============
>
> Recent (Early 2022) discussions on the bitcoin-dev mailing
> list have largely focused on new constructs that enable new
> functionality.
>
> One general idea can be summarized this way:
>
> * We should provide a very general language.
>   * Then later, once we have learned how to use this language,
>     we can softfork in new opcodes that compress sections of
>     programs written in this general language.
>
> There are two arguments against this style:
>
> 1.  One of the most powerful arguments the "general" side of
>     the "general v specific" debate is that softforks are
>     painful because people are going to keep reiterating the
>     activation parameters debate in a memoryless process, so
>     we want to keep the number of softforks low.
>     * So, we should just provide a very general language and
>       never softfork in any other change ever again.
> 2.  One of the most powerful arguments the "general" side of
>     the "general v specific" debate is that softforks are
>     painful because people are going to keep reiterating the
>     activation parameters debate in a memoryless process, so
>     we want to keep the number of softforks low.
>     * So, we should just skip over the initial very general
>       language and individually activate small, specific
>       constructs, reducing the needed softforks by one.
>
> By taking a page from microprocessor design, it seems to me
> that we can use the same above general idea (a general base
> language where we later "bless" some sequence of operations)
> while avoiding some of the arguments against it.
>
> Digression: Microcodes In CISC Microprocessors
> ----------------------------------------------
>
> In the 1980s and 1990s, two competing microprocessor design
> paradigms arose:
>
> * Complex Instruction Set Computing (CISC)
>   - Few registers, many addressing/indexing modes, variable
>     instruction length, many obscure instructions.
> * Reduced Instruction Set Computing (RISC)
>   - Many registers, usually only immediate and indexed
>     addressing modes, fixed instruction length, few
>     instructions.
>
> In CISC, the microprocessor provides very application-specific
> instructions, often with a small number of registers with
> specific uses.
> The instruction set was complicated, and often required
> multiple specific circuits for each application-specific
> instruction.
> Instructions had varying sizes and varying number of cycles.
>
> In RISC, the micrprocessor provides fewer instructions, and
> programmers (or compilers) are supposed to generate the code
> for all application-specific needs.
> The processor provided large register banks which could be
> used very generically and interchangeably.
> Instructions had the same size and every instruction took a
> fixed number of cycles.
>
> In CISC you usually had shorter code which could be written
> by human programmers in assembly language or machine language.
> In RISC, you generally had longer code, often difficult for
> human programmers to write, and you *needed* a compiler to
> generate it (unless you were very careful, or insane enough
> you could scroll over multiple pages of instructions without
> becoming more insane), or else you might forget about stuff
> like jump slots.
>
> For the most part, RISC lost, since most modern processors
> today are x86 or x86-64, an instruction set with varying
> instruction sizes, varying number of cycles per instruction,
> and complex instructions with application-specific uses.
>
> Or at least, it *looks like* RISC lost.
> In the 90s, Intel was struggling since their big beefy CISC
> designs were becoming too complicated.
> Bugs got past testing and into mass-produced silicon.
> RISC processors were beating the pants off 386s in terms of
> raw number of computations per second.
>
> RISC processors had the major advantage that they were
> inherently simpler, due to having fewer specific circuits
> and filling up their silicon with general-purpose registers
> (which are large but very simple circuits) to compensate.
> This meant that processor designers could fit more of the
> design in their merely human meat brains, and were less
> likely to make mistakes.
> The fixed number of cycles per instruction made it trivial
> to create a fixed-length pipeline for instruction processing,
> and practical RISC processors could deliver one instruction
> per clock cycle.
> Worse, the simplicity of RISC meant that smaller and less
> experienced teams could produce viable competitors to the
> Intel x86s.
>
> So what Intel did was to use a RISC processor, and add a
> special Instruction Decoder unit.
> The Instruction Decoder would take the CISC instruction
> stream accepted by classic Intel x86 processors, and emit
> RISC instructions for the internal RISC processor.
> CISC instructions might be variable length and have variable
> number of cycles, but the emitted RISC instructions were
> individually fixed length and fixed number of cycles.
> A CISC instruction might be equivalent to a single RISC
> instruction, or several.
>
> With this technique, Intel could deliver performance
> approaching their RISC-only competition, while retaining
> back-compatibility with existing software written for their
> classic CISC processors.
>
> At its core, the Instruction Decoder was a table-driven
> parser.
> This lookup table could be stored into on-chip flash memory.
> This had the advantage that the on-chip flash memory could be
> updated in case of bugs in the implementation of CISC
> instructions.
> This on-chip flash memory was then termed "microcode".
>
> Important advantages of this "microcode" technique were:
>
> * Back-compatibility with existing instruction sets.
> * Easier and more scalable underlying design due to ability
>   to use RISC techniques while still supporting CISC instruction
>   sets.
> * Possible to fix bugs in implementations of complex CISC
>   instructions by uploading new microcode.
>
> (Obviously I have elided a bunch of stuff, but the above
> rough sketch should be sufficient as introduction.)
>
> Bitcoin Consensus Layer As Hardware
> -----------------------------------
>
> While Bitcoin fullnode implementations are software, because
> of the need for consensus, this software is not actually very
> "soft".
> One can consider that, just as it would take a long time for
> new hardware to be designed with a changed instruction set,
> it is similarly taking a long time to change Bitcoin to
> support changed feature sets.
>
> Thus, we should really consider the Bitcoin consensus layer,
> and its SCRIPT, as hardware that other Bitcoin software and
> layers run on top of.
>
> This thus opens up the thought of using techniques that were
> useful in hardware design.
> Such as microcode: a translation layer from "old" instruction
> sets to "new" instruction sets, with the ability to modify this
> mapping.
>
> Microcode For Bitcoin SCRIPT
> ============================
>
> I propose:
>
> * Define a generic, low-level language (the "RISC language").
> * Define a mapping from a specific, high-level language to
>   the above language (the microcode).
> * Allow users to sacrifice Bitcoins to define a new microcode.
> * Have users indicate the microcode they wish to use to
>   interpret their Tapscripts.
>
> As a concrete example, let us consider the current Bitcoin
> SCRIPT as the "CISC" language.
>
> We can then support a "RISC" language that is composed of
> general instructions, such as arithmetic, SECP256K1 scalar
> and point math, bytevector concatenation, sha256 midstates,
> bytevector bit manipulation, transaction introspection, and
> so on.
> This "RISC" language would also be stack-based.
> As the "RISC" language would have more possible opcodes,
> we may need to use 2-byte opcodes for the "RISC" language
> instead of 1-byte opcodes.
> Let us call this "RISC" language the micro-opcode language.
>
> Then, the "microcode" simply maps the existing Bitcoin
> SCRIPT `OP_` codes to one or more `UOP_` micro-opcodes.
>
> An interesting fact is that stack-based languages have
> automatic referential transparency; that is, if I define
> some new word in a stack-based language and use that word,
> I can replace verbatim the text of the new word in that
> place without issue.
> Compare this to a language like C, where macro authors
> have to be very careful about inadvertent variable
> capture, wrapping `do { ... } while(0)` to avoid problems
> with `if` and multiple statements, multiple execution, and
> so on.
>
> Thus, a sequence of `OP_` opcodes can be mapped to a
> sequence of equivalent `UOP_` micro-opcodes without
> changing the interpretation of the source language, an
> important property when considering such a "compiled"
> language.
>
> We start with a default microcode which is equivalent
> to the current Bitcoin language.
> When users want to define a new microcode to implement
> new `OP_` codes or change existing `OP_` codes, they
> can refer to a "base" microcode, and only have to
> provide the new mappings.
>
> A microcode is fundamentally just a mapping from an
> `OP_` code to a variable-length sequence of `UOP_`
> micro-opcodes.
>
> ```Haskell
> import Data.Map
> -- type Opcode
> -- type UOpcode
> newtype Microcode = Microcode (Map.Map Opcode [UOpcode])
> ```
>
> Semantically, the SCRIPT interpreter processes `UOP_`
> micro-opcodes.
>
> ```Haskell
> -- instance Monad Interpreter -- can `fail`.
> interpreter :: Transaction -> TxInput -> [UOpcode] -> Interpreter ()
> ```
>
> Example
> -------
>
> Suppose a user wants to re-enable `OP_CAT`, and nothing
> else.
>
> That user creates a microcode, referring to the current
> default Bitcoin SCRIPT microcode as the "base".
> The base microcode defines `OP_CAT` as equal to the
> sequence `UOP_FAIL` i.e. a micro-opcode that always fails.
> However, the new microcode will instead redefine the
> `OP_CAT` as the micro-opcode sequence `UOP_CAT`.
>
> Microcodes then have a standard way of being represented
> as a byte sequence.
> The user serializes their new microcode as a byte
> sequence.
>
> Then, the user creates a new transaction where one of
> the outputs contains, say, 1.0 Bitcoins (exact required
> value TBD), and has the `scriptPubKey` of
> `OP_TRUE OP_RETURN <serialized_microcode>`.
> This output is a "microcode introduction output", which
> is provably unspendable, thus burning the Bitcoins.
>
> (It need not be a single user, multiple users can
> coordinate by signing a single transaction that commits
> their funds to the microcode introduction.)
>
> Once the above transaction has been deeply confirmed,
> the user can then take the hash of the microcode
> serialization.
> Then the user can use a SCRIPT with `OP_CAT` enabled,
> by using a Tapscript with, say, version `0xce`, and
> with the SCRIPT having the microcode hash as its first
> bytes, followed by the `OP_` codes.
>
> Fullnodes will then process recognized microcode
> introduction outputs and store mappings from their
> hashes to the microcodes in a new microcodes index.
> Fullnodes can then process version-`0xce` Tapscripts
> by checking if the microcodes index has the indicated
> microcode hash.
>
> Semantically, fullnodes take the SCRIPT, and for each
> `OP_` code in it, expands it to a sequence of `UOP_`
> micro-opcodes, then concatenates each such sequence.
> Then, the SCRIPT interpreter operates over a sequence
> of `UOP_` micro-opcodes.
>
> Optimizing Microcodes
> ---------------------
>
> Suppose there is some new microcode that users have
> published onchain.
>
> We want to be able to execute the defined microcode
> faster than expanding an `OP_`-code SCRIPT to a
> `UOP_`-code SCRIPT and having an interpreter loop
> over the `UOP_`-code SCRIPT.
>
> We can use LLVM.
>
> WARNING: LLVM might not be appropriate for
> network-facing security-sensitive applications.
> In particular, LLVM bugs. especially nondeterminism
> bugs, can lead to consensus divergence and disastrous
> chainsplits!
> On the other hand, LLVM bugs are compiler bugs and
> the same bugs can hit the static compiler `cc`, too,
> since the same LLVM code runs in both JIT and static
> compilation, so this risk already exists for Bitcoin.
> (i.e. we already rely on LLVM not being buggy enough
> to trigger Bitcoin consensus divergence, else we would
> have written Bitcoin Core SCRIPT interpreter in
> assembly.)
>
> Each `UOP_`-code has an equivalent tree of LLVM code.
> For each `Opcode` in the microcode, we take its
> sequence of `UOpcode`s and expand them to this tree,
> concatenating the equivalent trees for each `UOpcode`
> in the sequence.
> Then we ask LLVM to JIT-compile this code to a new
> function, running LLVM-provided optimizers.
> Then we put a pointer to this compiled function to a
> 256-long array of functions, where the array index is
> the `OP_` code.
>
> The SCRIPT interpreter then simply iterates over the
> `OP_` code SCRIPT and calls each of the JIT-compiled
> functions.
> This reduces much of the overhead of the `UOP_` layer
> and makes it approach the current performance of the
> existing `OP_` interpreter.
>
> For the default Bitcoin SCRIPT, the opcodes array
> contains pointers to statically-compiled functions.
> A microcode that is based on the default Bitcoin
> SCRIPT copies this opcodes array, then overwrites
> the entries.
>
> Future versions of Bitcoin Core can "bless"
> particular microcodes by providing statically-compiled
> functions for those microcodes.
> This leads to even better performance (there is
> no need to recompile ancient onchain microcodes each
> time Bitcoin Core starts) without any consensus
> divergence.
> It is a pure optimization and does not imply a
> tightening of rules, and is thus not a softfork.
>
> (To reduce the chance of network faults being used
> to poke into `W|X` memory (since `W|X` memory is
> needed in order to actually JIT compile) we can
> isolate the SCRIPT interpreter into its own process
> separate from the network-facing code.
> This does imply additional overhead in serializing
> transactions we want to ask the SCRIPT interpreter
> to validate.)
>
> Comparison To Jets
> ------------------
>
> This technique allows users to define "jets", i.e.
> sequences of low-level general operations that users
> have determined are common enough they should just
> be implemented as faster code that is executed
> directly by the underlying hardware processor rather
> than via a software interpreter.
> Basically, each redefined `OP_` code is a jet of a
> sequence of `UOP_` micro-opcodes.
>
> We implement this by dynamically JIT-compiling the
> proposed jets, as described above.
> SCRIPTs using jetted code remain smaller, as the
> jet definition is done in a previous transaction and
> does not require copy-pasta (Do Not Repeat Yourself!).
> At the same time, jettification is not tied to
> developers, thus removing the need to keep softforking
> new features --- we only need define a sufficiently
> general language and then we can implement pretty much
> anything worth implementing (and a bunch of other things
> that should not be implemented, but hey, users gonna
> use...).
>
> Bugs in existing microcodes can be fixed by basing a
> new microcode from the existing microcode, and
> redefining the buggy implementation.
> Existing Tapscripts need to be re-spent to point to
> the new bugfixed microcode, but if you used the
> point-spend branch as an N-of-N of all participants
> you have an upgrade mechanism for free.
>
> In order to ensure that the JIT-compilation of new
> microcodes is not triggered trivially, we require
> that users petitioning for the jettification of some
> operations (i.e. introducing a new microcode) must
> sacrifice Bitcoins.
>
> Burning Bitcoins is better than increasing the weight
> of microcode introduction outputs; all fullnodes are
> affected by the need to JIT-compile the new microcode,
> so they benefit from the reduction in supply, thus
> getting compensated for the work of JIT-compiling the
> new microcode.
> Ohter mechanisms for making microcode introduction
> outputs expensive are also possible.
>
> Nothing really requires that we use a stack-based
> language for this; any sufficiently FP language
> should allow referential transparency.
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists•linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>

[-- Attachment #2: Type: text/html, Size: 21047 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [bitcoin-dev] Beyond Jets: Microcode: Consensus-Critical Jets Without Softforks
  2022-03-22 15:08 ` Russell O'Connor
@ 2022-03-22 16:22   ` ZmnSCPxj
  2022-03-22 16:28     ` Russell O'Connor
  0 siblings, 1 reply; 8+ messages in thread
From: ZmnSCPxj @ 2022-03-22 16:22 UTC (permalink / raw)
  To: Russell O'Connor; +Cc: Bitcoin Protocol Discussion

Good morning Russell,

> Setting aside my thoughts that something like Simplicity would make a better platform than Bitcoin Script (due to expression operating on a more narrow interface than the entire stack (I'm looking at you OP_DEPTH)) there is an issue with namespace management.
>
> If I understand correctly, your implication was that once opcodes are redefined by an OP_RETURN transaction, subsequent transactions of that opcode refer to the new microtransaction.  But then we have a race condition between people submitting transactions expecting the outputs to refer to the old code and having their code redefined by the time they do get confirmed  (or worse having them reorged).

No, use of specific microcodes is opt-in: you have to use a specific `0xce` Tapscript version, ***and*** refer to the microcode you want to use via the hash of the microcode.

The only race condition is reorging out a newly-defined microcode.
This can be avoided by waiting for deep confirmation of a newly-defined microcode before actually using it.

But once the microcode introduction outpoint of a particular microcode has been deeply confirmed, then your Tapscript can refer to the microcode, and its meaning does not change.

Fullnodes may need to maintain multiple microcodes, which is why creating new microcodes is expensive; they not only require JIT compilation, they also require that fullnodes keep an index that cannot have items deleted.


The advantage of the microcode scheme is that the size of the SCRIPT can be used as a proxy for CPU load ---- just as it is done for current Bitcoin SCRIPT.
As long as the number of `UOP_` micro-opcodes that an `OP_` code can expand to is bounded, and we avoid looping constructs, then the CPU load is also bounded and the size of the SCRIPT approximates the amount of processing needed, thus microcode does not require a softfork to modify weight calculations in the future.

Regards,
ZmnSCPxj


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [bitcoin-dev] Beyond Jets: Microcode: Consensus-Critical Jets Without Softforks
  2022-03-22 16:22   ` ZmnSCPxj
@ 2022-03-22 16:28     ` Russell O'Connor
  2022-03-22 16:39       ` ZmnSCPxj
  0 siblings, 1 reply; 8+ messages in thread
From: Russell O'Connor @ 2022-03-22 16:28 UTC (permalink / raw)
  To: ZmnSCPxj; +Cc: Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 2262 bytes --]

Thanks for the clarification.

You don't think referring to the microcode via its hash, effectively using
32-byte encoding of opcodes, is still rather long winded?

On Tue, Mar 22, 2022 at 12:23 PM ZmnSCPxj <ZmnSCPxj@protonmail•com> wrote:

> Good morning Russell,
>
> > Setting aside my thoughts that something like Simplicity would make a
> better platform than Bitcoin Script (due to expression operating on a more
> narrow interface than the entire stack (I'm looking at you OP_DEPTH)) there
> is an issue with namespace management.
> >
> > If I understand correctly, your implication was that once opcodes are
> redefined by an OP_RETURN transaction, subsequent transactions of that
> opcode refer to the new microtransaction.  But then we have a race
> condition between people submitting transactions expecting the outputs to
> refer to the old code and having their code redefined by the time they do
> get confirmed  (or worse having them reorged).
>
> No, use of specific microcodes is opt-in: you have to use a specific
> `0xce` Tapscript version, ***and*** refer to the microcode you want to use
> via the hash of the microcode.
>
> The only race condition is reorging out a newly-defined microcode.
> This can be avoided by waiting for deep confirmation of a newly-defined
> microcode before actually using it.
>
> But once the microcode introduction outpoint of a particular microcode has
> been deeply confirmed, then your Tapscript can refer to the microcode, and
> its meaning does not change.
>
> Fullnodes may need to maintain multiple microcodes, which is why creating
> new microcodes is expensive; they not only require JIT compilation, they
> also require that fullnodes keep an index that cannot have items deleted.
>
>
> The advantage of the microcode scheme is that the size of the SCRIPT can
> be used as a proxy for CPU load ---- just as it is done for current Bitcoin
> SCRIPT.
> As long as the number of `UOP_` micro-opcodes that an `OP_` code can
> expand to is bounded, and we avoid looping constructs, then the CPU load is
> also bounded and the size of the SCRIPT approximates the amount of
> processing needed, thus microcode does not require a softfork to modify
> weight calculations in the future.
>
> Regards,
> ZmnSCPxj
>

[-- Attachment #2: Type: text/html, Size: 2641 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [bitcoin-dev] Beyond Jets: Microcode: Consensus-Critical Jets Without Softforks
  2022-03-22 16:28     ` Russell O'Connor
@ 2022-03-22 16:39       ` ZmnSCPxj
  2022-03-22 16:47         ` ZmnSCPxj
  0 siblings, 1 reply; 8+ messages in thread
From: ZmnSCPxj @ 2022-03-22 16:39 UTC (permalink / raw)
  To: Russell O'Connor; +Cc: Bitcoin Protocol Discussion

Good morning Russell,

> Thanks for the clarification.
>
> You don't think referring to the microcode via its hash, effectively using 32-byte encoding of opcodes, is still rather long winded?

A microcode is a *mapping* of `OP_` codes to a variable-length sequence of `UOP_` micro-opcodes.
So a microcode hash refers to an entire language of redefined `OP_` codes, not each individual opcode in the language.

If it costs 1 Bitcoin to create a new microcode, then there are only 21 million possible microcodes, and I think about 50 bits of hash is sufficient to specify those with low probability of collision.
We could use a 20-byte RIPEMD . SHA256 instead for 160 bits, that should be more than sufficient with enough margin.
Though perhaps it is now easier to deliberately attack...

Also, if you have a common SCRIPT whose non-`OP_PUSH` opcodes are more than say 32 + 1 bytes (or 20 + 1 if using RIPEMD), and you can fit their equivalent `UOP_` codes into the max limit for a *single* opcode, you can save bytes by redefining some random `OP_` code into the sequence of all the `UOP_` codes.
You would have a hash reference to the microcode, and a single byte for the actual "SCRIPT" which is just a jet of the entire SCRIPT.
Users of multiple *different* such SCRIPTs can band together to define a single microcode, mapping their SCRIPTs to different `OP_` codes and sharing the cost of defining the new microcode that shortens all their SCRIPTs.

Regards,
ZmnSCPxj


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [bitcoin-dev] Beyond Jets: Microcode: Consensus-Critical Jets Without Softforks
  2022-03-22 16:39       ` ZmnSCPxj
@ 2022-03-22 16:47         ` ZmnSCPxj
  0 siblings, 0 replies; 8+ messages in thread
From: ZmnSCPxj @ 2022-03-22 16:47 UTC (permalink / raw)
  To: Russell O'Connor; +Cc: Bitcoin Protocol Discussion


Good morning again Russell,

> Good morning Russell,
>
> > Thanks for the clarification.
> > You don't think referring to the microcode via its hash, effectively using 32-byte encoding of opcodes, is still rather long winded?

For that matter, since an entire microcode represents a language (based on the current OG Bitcoin SCRIPT language), with a little more coordination, we could entirely replace Tapscript versions --- every Tapscript version is a slot for a microcode, and the current OG Bitcoin SCRIPT is just the one in slot `0xc2`.
Filled slots cannot be changed, but new microcodes can use some currently-empty Tapscript version slot, and have it properly defined in a microcode introduction outpoint.

Then indication of a microcode would take only one byte, that is already needed currently anyway.

That does limit us to only 255 new microcodes, thus the cost of one microcode would have to be a good bit higher.

Again, remember, microcodes represent an entire language that is an extension of OG Bitcoin SCRIPT, not individual operations in that language.

Regards,
ZmnSCPxj


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [bitcoin-dev] Beyond Jets: Microcode: Consensus-Critical Jets Without Softforks
  2022-03-22  5:37 [bitcoin-dev] Beyond Jets: Microcode: Consensus-Critical Jets Without Softforks ZmnSCPxj
  2022-03-22 15:08 ` Russell O'Connor
@ 2022-03-22 23:11 ` Anthony Towns
  2022-03-23  0:20   ` ZmnSCPxj
  1 sibling, 1 reply; 8+ messages in thread
From: Anthony Towns @ 2022-03-22 23:11 UTC (permalink / raw)
  To: ZmnSCPxj, Bitcoin Protocol Discussion

On Tue, Mar 22, 2022 at 05:37:03AM +0000, ZmnSCPxj via bitcoin-dev wrote:
> Subject: Beyond Jets: Microcode: Consensus-Critical Jets Without Softforks

(Have you considered applying a jit or some other compression algorithm
to your emails?)

> Microcode For Bitcoin SCRIPT
> ============================
> I propose:
> * Define a generic, low-level language (the "RISC language").

This is pretty much what Simplicity does, if you optimise the low-level
language to minimise the number of primitives and maximise the ability
to apply tooling to reason about it, which seem like good things for a
RISC language to optimise.

> * Define a mapping from a specific, high-level language to
>   the above language (the microcode).
> * Allow users to sacrifice Bitcoins to define a new microcode.

I think you're defining "the microcode" as the "mapping" here.

This is pretty similar to the suggestion Bram Cohen was making a couple
of months ago:

https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2021-December/019722.html
https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2022-January/019773.html
https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2022-January/019803.html

I believe this is done in chia via the block being able to
include-by-reference prior blocks' transaction generators:

] transactions_generator_ref_list: List[uint32]: A list of block heights of previous generators referenced by this block's generator.
  - https://docs.chia.net/docs/05block-validation/block_format

(That approach comes at the cost of not being able to do full validation
if you're running a pruning node. The alternative is to effectively
introduce a parallel "utxo" set -- where you're mapping the "sacrificed"
BTC as the nValue and instead of just mapping it to a scriptPubKey for
a later spend, you're permanently storing the definition of the new
CISC opcode)

> We can then support a "RISC" language that is composed of
> general instructions, such as arithmetic, SECP256K1 scalar
> and point math, bytevector concatenation, sha256 midstates,
> bytevector bit manipulation, transaction introspection, and
> so on.

A language that includes instructions for each operation we can think
of isn't very "RISC"... More importantly it gets straight back to the
"we've got a new zk system / ECC curve / ... that we want to include,
let's do a softfork" problem you were trying to avoid in the first place.

> Then, the user creates a new transaction where one of
> the outputs contains, say, 1.0 Bitcoins (exact required
> value TBD),

Likely, the "fair" price would be the cost of introducing however many
additional bytes to the utxo set that it would take to represent your
microcode, and the cost it would take to run jit(your microcode script)
if that were a validation function. Both seem pretty hard to manage.

"Ideally", I think you'd want to be able to say "this old microcode
no longer has any value, let's forget it, and instead replace it with
this new microcode that is much better" -- that way nodes don't have to
keep around old useless data, and you've reduced the cost of introducing
new functionality.

Additionally, I think it has something of a tragedy-of-the-commons
problem: whoever creates the microcode pays the cost, but then anyone
can use it and gain the benefit. That might even end up creating
centralisation pressure: if you design a highly decentralised L2 system,
it ends up expensive because people can't coordinate to pay for the
new microcode that would make it cheaper; but if you design a highly
centralised L2 system, you can just pay for the microcode yourself and
make it even cheaper.

This approach isn't very composable -- if there's a clever opcode
defined in one microcode spec, and another one in some other microcode,
the only way to use both of them in the same transaction is to burn 1
BTC to define a new microcode that includes both of them.

> We want to be able to execute the defined microcode
> faster than expanding an `OP_`-code SCRIPT to a
> `UOP_`-code SCRIPT and having an interpreter loop
> over the `UOP_`-code SCRIPT.
>
> We can use LLVM.

We've not long ago gone to the effort of removing openssl as a consensus
critical dependency; and likewise previously removed bdb.  Introducing a
huge new dependency to the definition of consensus seems like an enormous
step backwards.

This would also mean we'd be stuck at the performance of whatever version
of llvm we initially adopted, as any performance improvements introduced
in later llvm versions would be a hard fork.

> On the other hand, LLVM bugs are compiler bugs and
> the same bugs can hit the static compiler `cc`, too,

"Well, you could hit Achilles in the heel, so really, what's the point
of trying to be invulnerable anywhere else?"

> Then we put a pointer to this compiled function to a
> 256-long array of functions, where the array index is
> the `OP_` code.

That's a 256-long array of functions for each microcode, which increases
the "microcode-utxo" database storage size substantially.

Presuming there are different jit targets (x86 vs arm?) it seems
difficulty to come up with a consistent interpretation of the cost for
these opcodes.

I'm skeptical that a jit would be sufficient for increasing the
performance of an implementation just based on basic arithmetic opcodes
if we're talking about something like sha512 or bls12-381 or similar.

> Bugs in existing microcodes can be fixed by basing a
> new microcode from the existing microcode, and
> redefining the buggy implementation.
> Existing Tapscripts need to be re-spent to point to
> the new bugfixed microcode, but if you used the
> point-spend branch as an N-of-N of all participants
> you have an upgrade mechanism for free.

It's not free if you have to do an on-chain spend... 

The "1 BTC" cost to fix the bug, and the extra storage in every node's
"utxo" set because they now have to keep both the buggy and fixed versions
around permanently sure isn't free either. If you're re-jitting every
microcode on startup, that could get pretty painful too.

If you're proposing introducing byte vector manipulation and OP_CAT and
similar, which enables recursive covenants, then it might be good to
explain how this proposal addresses the concerns raised at the end of
https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2022-March/020092.html

Cheers,
aj



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [bitcoin-dev] Beyond Jets: Microcode: Consensus-Critical Jets Without Softforks
  2022-03-22 23:11 ` Anthony Towns
@ 2022-03-23  0:20   ` ZmnSCPxj
  0 siblings, 0 replies; 8+ messages in thread
From: ZmnSCPxj @ 2022-03-23  0:20 UTC (permalink / raw)
  To: Anthony Towns; +Cc: Bitcoin Protocol Discussion

Good morning aj,

> On Tue, Mar 22, 2022 at 05:37:03AM +0000, ZmnSCPxj via bitcoin-dev wrote:
>
> > Subject: Beyond Jets: Microcode: Consensus-Critical Jets Without Softforks
>
> (Have you considered applying a jit or some other compression algorithm
> to your emails?)
>
> > Microcode For Bitcoin SCRIPT
> >
> > =============================
> >
> > I propose:
> >
> > -   Define a generic, low-level language (the "RISC language").
>
> This is pretty much what Simplicity does, if you optimise the low-level
> language to minimise the number of primitives and maximise the ability
> to apply tooling to reason about it, which seem like good things for a
> RISC language to optimise.
>
> > -   Define a mapping from a specific, high-level language to
> >     the above language (the microcode).
> >
> > -   Allow users to sacrifice Bitcoins to define a new microcode.
>
> I think you're defining "the microcode" as the "mapping" here.

Yes.

>
> This is pretty similar to the suggestion Bram Cohen was making a couple
> of months ago:
>
> https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2021-December/019722.html
> https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2022-January/019773.html
> https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2022-January/019803.html
>
> I believe this is done in chia via the block being able to
> include-by-reference prior blocks' transaction generators:
>
> ] transactions_generator_ref_list: List[uint32]: A list of block heights of previous generators referenced by this block's generator.
>
> -   https://docs.chia.net/docs/05block-validation/block_format
>
>     (That approach comes at the cost of not being able to do full validation
>     if you're running a pruning node. The alternative is to effectively
>     introduce a parallel "utxo" set -- where you're mapping the "sacrificed"
>     BTC as the nValue and instead of just mapping it to a scriptPubKey for
>     a later spend, you're permanently storing the definition of the new
>     CISC opcode)
>
>

Yes, the latter is basically what microcode is.

> > We can then support a "RISC" language that is composed of
> > general instructions, such as arithmetic, SECP256K1 scalar
> > and point math, bytevector concatenation, sha256 midstates,
> > bytevector bit manipulation, transaction introspection, and
> > so on.
>
> A language that includes instructions for each operation we can think
> of isn't very "RISC"... More importantly it gets straight back to the
> "we've got a new zk system / ECC curve / ... that we want to include,
> let's do a softfork" problem you were trying to avoid in the first place.

`libsecp256k1` can run on purely RISC machines like ARM, so saying that a "RISC" set of opcodes cannot implement some arbitrary ECC curve, when the instruction set does not directly support that ECC curve, seems incorrect.

Any new zk system / ECC curve would have to be implementable in C++, so if you have micro-operations that would be needed for it, such as XORing two multi-byte vectors together, multiplying multi-byte precision numbers, etc., then any new zk system or ECC curve would be implementable in microcode.
For that matter, you could re-write `libsecp256k1` there.

> > Then, the user creates a new transaction where one of
> > the outputs contains, say, 1.0 Bitcoins (exact required
> > value TBD),
>
> Likely, the "fair" price would be the cost of introducing however many
> additional bytes to the utxo set that it would take to represent your
> microcode, and the cost it would take to run jit(your microcode script)
> if that were a validation function. Both seem pretty hard to manage.
>
> "Ideally", I think you'd want to be able to say "this old microcode
> no longer has any value, let's forget it, and instead replace it with
> this new microcode that is much better" -- that way nodes don't have to
> keep around old useless data, and you've reduced the cost of introducing
> new functionality.

Yes, but that invites "I accidentally the smart contract" behavior.

> Additionally, I think it has something of a tragedy-of-the-commons
> problem: whoever creates the microcode pays the cost, but then anyone
> can use it and gain the benefit. That might even end up creating
> centralisation pressure: if you design a highly decentralised L2 system,
> it ends up expensive because people can't coordinate to pay for the
> new microcode that would make it cheaper; but if you design a highly
> centralised L2 system, you can just pay for the microcode yourself and
> make it even cheaper.

The same "tragedy of the commons" applies to FOSS.
"whoever creates the FOSS pays the cost, but then anyone can use it and gain the benefit"
This seems like an argument against releasing a FOSS node software.

Remember, microcode is software too, and copying software does not have a tragedy of the commons --- the main point of a tragedy of the commons is that the commons is *degraded* by the use but nobody has incentive to maintain against the degradation.
But using software does not degrade the software, if I give you a copy of my software then I do not lose my software, which is why FOSS works.

In order to make a highly-decentralized L2, you need to cooperate with total strangers, possibly completely anonymously, in handling your money.
I imagine that the level of cooperation needed in, say, Lightning network, would be far above what is necessary to gather funds from multiple people who want a particular microcode to happen until enough funds have been gathered to make the microcode happen.

For example, create a fresh address for an amount you, personally, are willing to contribute in order to make the microcode happen.
(If you are willing to spend the time and energy arguing on bitcoin-dev, then you are willing to contribute, even if others get the benefit in addition to yourself, and that time and energy has a corresponding Bitcoin value)
Then spend it using a `SIGHASH_ANYONECANPAY | SIGHASH_SINGLE`, with the microcode introduction outpoint as the single output you are signing.
Gather enough such signatures from a community around a decentralized L2, and you can achieve the necessary total funds for the microcode to happen.


> This approach isn't very composable -- if there's a clever opcode
> defined in one microcode spec, and another one in some other microcode,
> the only way to use both of them in the same transaction is to burn 1
> BTC to define a new microcode that includes both of them.

Yes, that is indeed a problem.

> > We want to be able to execute the defined microcode
> > faster than expanding an `OP_`-code SCRIPT to a
> > `UOP_`-code SCRIPT and having an interpreter loop
> > over the `UOP_`-code SCRIPT.
> > We can use LLVM.
>
> We've not long ago gone to the effort of removing openssl as a consensus
> critical dependency; and likewise previously removed bdb. Introducing a
> huge new dependency to the definition of consensus seems like an enormous
> step backwards.
>
> This would also mean we'd be stuck at the performance of whatever version
> of llvm we initially adopted, as any performance improvements introduced
> in later llvm versions would be a hard fork.

Yes, LLVM is indeed the weak link in this idea.
We could use NaCl instead, that has probably fewer issues /s.

> > On the other hand, LLVM bugs are compiler bugs and
> > the same bugs can hit the static compiler `cc`, too,
>
> "Well, you could hit Achilles in the heel, so really, what's the point
> of trying to be invulnerable anywhere else?"

Yes, LLVM is indeed the weak point here.

We could just concatenate some C++ code together when a new microcode is introduced, and compile it statically, then store the resulting binary somewhere, and invoke it at the appropriate time to run validation.
At least LLVM would be isolated into its own process in that case.

> > Then we put a pointer to this compiled function to a
> > 256-long array of functions, where the array index is
> > the `OP_` code.
>
> That's a 256-long array of functions for each microcode, which increases
> the "microcode-utxo" database storage size substantially.
>
> Presuming there are different jit targets (x86 vs arm?) it seems
> difficulty to come up with a consistent interpretation of the cost for
> these opcodes.
>
> I'm skeptical that a jit would be sufficient for increasing the
> performance of an implementation just based on basic arithmetic opcodes
> if we're talking about something like sha512 or bls12-381 or similar.

Static compilation seems to work well enough --- and JIT vs static is a spectrum, not either/or.
The difference is really how much optimization you are willing to use.
If microcodes are costly enough that they happen rarely, then using optimizations that are often used only in static compilation, seems a reasonable tradeoff

> > Bugs in existing microcodes can be fixed by basing a
> > new microcode from the existing microcode, and
> > redefining the buggy implementation.
> > Existing Tapscripts need to be re-spent to point to
> > the new bugfixed microcode, but if you used the
> > point-spend branch as an N-of-N of all participants
> > you have an upgrade mechanism for free.
>
> It's not free if you have to do an on-chain spend...
>
> The "1 BTC" cost to fix the bug, and the extra storage in every node's
> "utxo" set because they now have to keep both the buggy and fixed versions
> around permanently sure isn't free either.

Heh, poor word choice.

What I meant is that we do not need a separate upgrade mechanism, the design work here is "free".
*Using* the upgrade mechanism is costly and hence not "free".

> If you're re-jitting every
> microcode on startup, that could get pretty painful too.

When LLVM is used in a static compiler, it writes the resulting code on-disk, I imagine the same mechanism can be used.

> If you're proposing introducing byte vector manipulation and OP_CAT and
> similar, which enables recursive covenants, then it might be good to
> explain how this proposal addresses the concerns raised at the end of
> https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2022-March/020092.html

It does not, I am currently exploring and generating ideas, not particularly tying myself to one idea or another.

Regards,
ZmnSCPxj


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-03-23  0:20 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-22  5:37 [bitcoin-dev] Beyond Jets: Microcode: Consensus-Critical Jets Without Softforks ZmnSCPxj
2022-03-22 15:08 ` Russell O'Connor
2022-03-22 16:22   ` ZmnSCPxj
2022-03-22 16:28     ` Russell O'Connor
2022-03-22 16:39       ` ZmnSCPxj
2022-03-22 16:47         ` ZmnSCPxj
2022-03-22 23:11 ` Anthony Towns
2022-03-23  0:20   ` ZmnSCPxj

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox