On Sat, Mar 5, 2022 at 5:59 AM Anthony Towns <aj@erisian.com.au> wrote:
On Fri, Mar 04, 2022 at 11:21:41PM +0000, Jeremy Rubin via bitcoin-dev wrote:
> I've seen some discussion of what the Annex can be used for in Bitcoin.

https://www.erisian.com.au/meetbot/taproot-bip-review/2019/taproot-bip-review.2019-11-12-19.00.log.html

includes some discussion on that topic from the taproot review meetings.

The difference between information in the annex and information in
either a script (or the input data for the script that is the rest of
the witness) is (in theory) that the annex can be analysed immediately
and unconditionally, without necessarily even knowing anything about
the utxo being spent.

I agree that should happen, but there are cases where this would not work. E.g., imagine OP_LISP_EVAL + OP_ANNEX... and then you do delegation via the thing in the annex.

Now the annex can be executed as a script.

 

The idea is that we would define some simple way of encoding (multiple)
entries into the annex -- perhaps a tag/length/value scheme like
lightning uses; maybe if we add a lisp scripting language to consensus,
we just reuse the list encoding from that? -- at which point we might
use one tag to specify that a transaction uses advanced computation, and
needs to be treated as having a heavier weight than its serialized size
implies; but we could use another tag for per-input absolute locktimes;
or another tag to commit to a past block height having a particular hash.

Yes, this seems tough to do without redefining checksig to allow partial annexes. Hence thinking we should make our current checksig behavior require it be 0, future operations should be engineered with specific structured annex in mind.

 

It seems like a good place for optimising SIGHASH_GROUP (allowing a group
of inputs to claim a group of outputs for signing, but not allowing inputs
from different groups to ever claim the same output; so that each output
is hashed at most once for this purpose) -- since each input's validity
depends on the other inputs' state, it's better to be able to get at
that state as easily as possible rather than having to actually execute
other scripts before your can tell if your script is going to be valid.

I think SIGHASH_GROUP could be some sort of mutable stack value, not ANNEX. you want to be able to compute what range you should sign, and then the signature should cover the actual range not the argument itself.

Why sign the annex literally?

Why require that all signatures in one output sign the exact same digest? What if one wants to sign for value and another for value + change?

 

> The BIP is tight lipped about it's purpose

BIP341 only reserves an area to put the annex; it doesn't define how
it's used or why it should be used.


It does define how it's used, Checksig must commit to it. Were there no opcodes dependent on it I would agree, and that would be preferable.


 
> Essentially, I read this as saying: The annex is the ability to pad a
> transaction with an additional string of 0's

If you wanted to pad it directly, you can do that in script already
with a PUSH/DROP combo.

You cannot, because the push/drop would not be signed and would be malleable.

The annex is not malleable, so it can be used to this as authenticated padding.

 

The point of doing it in the annex is you could have a short byte
string, perhaps something like "0x010201a4" saying "tag 1, data length 2
bytes, value 420" and have the consensus intepretation of that be "this
transaction should be treated as if it's 420 weight units more expensive
than its serialized size", while only increasing its witness size by
6 bytes (annex length, annex flag, and the four bytes above). Adding 6
bytes for a 426 weight unit increase seems much better than adding 426
witness bytes.


Yes, that's what I say in the next sentence,

> Or, we might somehow make the witness a small language (e.g., run length encoded zeros) such that we can very quickly compute an equivalent number of zeros to 'charge' without actually consuming the space but still consuming a linearizable resource... or something like that.

so I think we concur on that.

 
> Introducing OP_ANNEX: Suppose there were some sort of annex pushing opcode,
> OP_ANNEX which puts the annex on the stack

I think you'd want to have a way of accessing individual entries from
the annex, rather than the annex as a single unit.

Or OP_ANNEX + OP_SUBSTR + OP_POVARINTSTR? Then you can just do 2 pops for the length and the tag and then get the data.
 

> Now suppose that I have a computation that I am running in a script as
> follows:
>
> OP_ANNEX
> OP_IF
>     `some operation that requires annex to be <1>`
> OP_ELSE
>     OP_SIZE
>     `some operation that requires annex to be len(annex) + 1 or does a
> checksig`
> OP_ENDIF
>
> Now every time you run this,

You only run a script from a transaction once at which point its
annex is known (a different annex gives a different wtxid and breaks
any signatures), and can't reference previous or future transactions'
annexes...


In a transaction validator, yes. But in a satisfier, no.

And it doesn't break the signatures if we add the ability to only sign over a part of an annex either/multiple annexes, since the annex could be mutable partially.


Not true about accessing previous TXNs annexes. All coins spend from Coinbase transactions. If you can get the COutpoint you're spending, you can get the parent of the COutpoint... and iterate backwards so on and so forth. Then you have the CB txn, which commits to the tree of wtxids. So you get previous transactions annexes comitted there.


For future transactions, you can, as a miner with decent hashrate you could promise what your Coinbase transaction would be for a future block and what the Outputs would be, and then you can pop open that as well... but you can't show valid PoW for that one so I'm not sure that's different than authenticated data. But where it does have a use is that you could, if you had OP_COUTPOINTVERIFY, say that this coin is only spendable if a miner mines the specific block that you want at a certain height (e.g., with only your txn in it?) and then they can claim the outpoint in the future... so maybe there is something there bizzare that can happen with that capability....

 
> Because the Annex is signed, and must be the same, this can also be
> inconvenient:

The annex is committed to by signatures in the same way nVersion,
nLockTime and nSequence are committed to by signatures; I think it helps
to think about it in a similar way.

nSequence, yes, nLockTime is per-tx.

BTW i think we now consider nSeq/nLock to be misdesigned given desire to vary these per-input/per-tx....\

so if the annex is like these perhaps it's also misdesigned.

> Suppose that you have a Miniscript that is something like: and(or(PK(A),
> PK(A')), X, or(PK(B), PK(B'))).
>
> A or A' should sign with B or B'. X is some sort of fragment that might
> require a value that is unknown (and maybe recursively defined?) so
> therefore if we send the PSBT to A first, which commits to the annex, and
> then X reads the annex and say it must be something else, A must sign
> again. So you might say, run X first, and then sign with A and C or B.
> However, what if the script somehow detects the bitstring WHICH_A WHICH_B
> and has a different Annex per selection (e.g., interpret the bitstring as a
> int and annex must == that int). Now, given and(or(K1, K1'),... or(Kn,
> Kn')) we end up with needing to pre-sign 2**n annex values somehow... this
> seems problematic theoretically.

Note that you need to know what the annex will contain before you sign,
since the annex is committed to via the signature. If "X" will need
entries in the annex that aren't able to be calculated by the other
parties, then they need to be the first to contribute to the PSBT, not A.

I think the analogy to locktimes would be "I need the locktime to be at
least block 900k, should I just sign that now, or check that nobody else
is going to want it to be block 950k or something? Or should I just sign
with nLockTime at 900k, 910k, 920k, 930k, etc and let someone else pick
the right one?" The obvious solution is just to work out what the
nLockTime should be first, then run signing rounds. Likewise, work out
what the annex should be first, then run the signing rounds.


Yes, my point is this is computationally hard to do sometimes.

CLTV also has the problem that if you have one script fragment with
CLTV by time, and another with CLTV by height, you can't come up with
an nLockTime that will ever satisfy both. If you somehow have script
fragments that require incompatible interpretations of the annex, you're
likewise going to be out of luck.


Yes, see above. If we don't know how the annex will be structured or used, this is the point of this thread....

We need to drill down how to not introduce these problems.

 
Having a way of specifying locktimes in the annex can solve that
particular problem with CLTV (different inputs can sign different
locktimes, and you could have different tags for by-time/by-height so
that even the same input can have different clauses requiring both),
but the general problem still exists.

(eg, you might have per-input by-height absolute locktimes as annex
entry 3, and per-input by-time absolute locktimes as annex entry 4,
so you might convert:

 "900e3 CLTV DROP" -> "900e3 3 PUSH_ANNEX_ENTRY GREATERTHANOREQUAL VERIFY"

 "500e6 CLTV DROP" -> "500e6 4 PUSH_ANNEX_ENTRY GREATERTHANOREQUAL VERIFY"

for height/time locktime checks respectively)

> Of course this wouldn't be miniscript then. Because miniscript is just for
> the well behaved subset of script, and this seems ill behaved. So maybe
> we're OK?

The CLTV issue hit miniscript:

https://medium.com/blockstream/dont-mix-your-timelocks-d9939b665094

Maybe the humour didn't hit -- we can only define well behaved as best we know, and the solution was to re-define miniscript to only be the well defined subset of miniscript once the bug in the spec was found.

 

> It seems like one good option is if we just go on and banish the OP_ANNEX.
> Maybe that solves some of this? I sort of think so. It definitely seems
> like we're not supposed to access it via script, given the quote from above:

How the annex works isn't defined, so it doesn't make any sense to
access it from script. When how it works is defined, I expect it might
well make sense to access it from script -- in a similar way that the
CLTV and CSV opcodes allow accessing nLockTime and nSequence from script.

That's false: CLTV and CSV expressly do not allow accessing it from script, only lower bounding it (and transitively proving that it was not of the other flavour).

So you can't actually get the exact nLockTime / Sequence on the stack (exception: if you use the maximum allowable value, then there are no other values...)


Given that it's not defined at all, that's why I'm skeptical about signing it at all presently.

If theres a future upgrade, it would be compatible as we can add new sighash flags to cover that.


> One solution would be to... just soft-fork it out. Always must be 0. When
> we come up with a use case for something like an annex, we can find a way
> to add it back.

The point of reserving the annex the way it has been is exactly this --
it should not be used now, but when we agree on how it should be used,
we have an area that's immediately ready to be used.

(For the cases where you don't need script to enforce reasonable values,
reserving it now means those new consensus rules can be used immediately
with utxos that predate the new consensus rules -- so you could update
offchain contracts from per-tx to per-input locktimes immediately without
having to update the utxo on-chain first)

I highly doubt that we will not need new sighash flags once it is ready to allow partial covers of the annex, e.g. like the structured ones described above.

We're already doing a soft fork for the new annex rules, so this isn't a big deal...

Legacy outputs can use these new sighash flags as well, in theory (maybe I'll do a post on why we shouldn't...)



Cheers,

Jeremy