On Mon, Jan 31, 2022 at 8:16 PM Anthony Towns <aj@erisian.com.au> wrote:
On Fri, Jan 28, 2022 at 08:56:25AM -0500, Russell O'Connor via bitcoin-dev wrote:
> > https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2021-July/019243.html
> For more complex interactions, I was imagining combining this TXHASH
> proposal with CAT and/or rolling SHA256 opcodes.  If TXHASH ended up
> supporting relative or absolute input/output indexes then users could
> assemble the hashes of the particular inputs and outputs they care about
> into a single signed message.

That's certainly possible, but it sure seems overly complicated and
error prone...

Indeed, and we really want something that can be programmed at redemption time.
That probably involves something like how the historic MULTISIG worked by having list of input / output indexes be passed in along with length arguments.

I don't think there will be problems with quadratic hashing here because as more inputs are list, the witness in turns grows larger itself.  The amount of stack elements that can be copied is limited by a constant (3DUP).  Certainly care is needed here, but also keep in mind that an OP_HASH256 does a double hash and costs one weight unit.

That said, your SIGHASH_GROUP proposal suggests that some sort of intra-input communication is really needed, and that is something I would need to think about.

While normally I'd be hesitant about this sort of feature creep, when we are talking about doing soft-forks, I really think it makes sense to think through these sorts of issues (as we are doing here).
 
> I don't think there is much in the way of lessons to be drawn from how we
> see Bitcoin Script used today with regards to programs built out of
> reusable components.

I guess I think one conclusion we should draw is some modesty in how
good we are at creating general reusable components. That is, bitcoin
script looks a lot like a relatively general expression language,
that should allow you to write interesting things; but in practice a
lot of it was buggy (OP_VER hardforks and resource exhaustion issues),
or not powerful enough to actually be interesting, or too complicated
to actually get enough use out of [0].

> TXHASH + CSFSV won't be enough by itself to allow for very interesting
> programs Bitcoin Script yet, we still need CAT and friends for that,

"CAT" and "CHECKSIGFROMSTACK" are both things that have been available in
elements for a while; has anyone managed to build anything interesting
with them in practice, or are they only useful for thought experiments
and blog posts? To me, that suggests that while they're useful for
theoretical discussion, they don't turn out to be a good design in
practice.

Perhaps the lesson to be drawn is that languages should support multiplying two numbers together.

Having 2/3rd of the language you need to write interesting programs doesn't mean that you get 2/3rd of the interesting programs written.

But beyond that, there is a lot more to a smart contract than just the Script.  Dmitry Petukhov has a fleshed out design for Asset based lending on liquid at https://ruggedbytes.com/articles/ll/, despite the limitations of (pre-taproot) Elements Script.  But to make it a real thing you need infrastructure for working with partial transactions, key management, etc.

> but
> CSFSV is at least a step in that direction.  CSFSV can take arbitrary
> messages and these messages can be fixed strings, or they can be hashes of
> strings (that need to be revealed), or they can be hashes returned from
> TXHASH, or they can be locktime values, or they can be values that are
> added or subtracted from locktime values, or they can be values used for
> thresholds, or they can be other pubkeys for delegation purposes, or they
> can be other signatures ... for who knows what purpose.

I mean, if you can't even think of a couple of uses, that doesn't seem
very interesting to pursue in the near term? CTV has something like half
a dozen fairly near-term use cases, but obviously those can all be done
just with TXHASH without a need for CSFS, and likewise all the ANYPREVOUT
things can obviously be done via CHECKSIG without either TXHASH or CSFS...

To me, the point of having CSFS (as opposed to CHECKSIG) seems to be
verifying that an oracle asserted something; but for really simply boolean
decisions, doing that via a DLC seems better in general since that moves
more of the work off-chain; and for the case where the signature is being
used to authenticate input into the script rather than just gating a path,
that feels a bit like a weaker version of graftroot?

I didn't really mean this as a list of applications; it was a list of values that CSFSV composes with. Applications include delegation of pubkeys and oracles, and, in the presence of CAT and transaction reflection primitives, presumably many more things.
 
I guess I'd still be interested in the answer to:

> > If we had CTV, POP_SIGDATA, and SIGHASH_NO_TX_DATA_AT_ALL but no OP_CAT,
> > are there any practical use cases that wouldn't be covered that having
> > TXHASH/CAT/CHECKSIGFROMSTACK instead would allow? Or where those would
> > be significantly more convenient/efficient?
> >
> > (Assume "y x POP_SIGDATA POP_SIGDATA p CHECKSIGVERIFY q CHECKSIG"
> > commits to a vector [x,y] via p but does not commit to either via q so
> > that there's some "CAT"-like behaviour available)

I don't know if this is the answer you are looking for, but technically TXHASH + CAT + SHA256 awkwardly gives you limited transaction reflection.  In fact, you might not even need TXHASH, though it certainly helps.
 
TXHASH seems to me to be clearly the more flexible opcode compared to
CTV; but maybe all that flexibility is wasted, and all the real use
cases actually just want CHECKSIG or CTV? I'd feel much better having
some idea of what the advantage of being flexible there is...

The flexibility of TXHASH is intended to head off the need for future soft forks.  If we had specific applications in mind, we could simply set up the transaction hash flags to cover all the applications we know about.  But it is the applications that we don't know about that worry me.  If we don't put options in place with this soft-fork proposal, then they will need their own soft-fork down the line; and the next application after that, and so on.
 
If our attitude is to craft our soft-forks as narrowly as possible to limit them to what only allows for given tasks, then we are going to end up needing a lot more soft-forks, and that is not a good outcome.

But all that aside, probably the real question is can we simplify CTV's
transaction message algorithm, if we assume APO is enabled simultaneously?
If it doesn't get simplified and needs its own hashing algorithm anyway,
that would be probably be a good reason to keep the separate.

First, since ANYPREVOUT commits to the scriptPubKey, you'd need to use
ANYPREVOUTANYSCRIPT for CTV-like behaviour.

ANYPRVOUTANYSCRIPT is specced as commiting to:
  nVersion
  nLockTime
  nSequence
  spend_type and annex present
  sha_annex (if present)
  sha_outputs (ALL) or sha_single_output (SINGLE)
  key_version
  codesep_pos

CTV commits to:
  nVersion
  nLockTime
  scriptSig hash "(maybe!)"
  input count
  sequences hash
  output count
  outputs hash
  input index

(CTV thus allows annex malleability, since it neither commits to the
annex nor forbids inclusion of an annex)

"output count" and "outputs index" would both be covered by sha_outputs
with ANYPREVOUTANYSCRIPT|ALL.

I think "scriptSig hash" is only covered to avoid txid malleability; but
just adjusting your protocol to use APO signatures instead of relying on
the txid of future transactions also solves that problem.

I believe "sequences hash", "input count" and "input index" are all an
important part of ensuring that if you have two UTXOs distributing 0.42
BTC to the same set of addresses via CTV, that you can't combine them in a
single transaction and end up sending losing one of the UTXOs to fees. I
don't believe there's a way to resolve that with bip 118 alone, however
that does seem to be a similar problem to the one that SIGHASH_GROUP
tries to solve.

It was my understanding that it is only "input count = 1" that prevents this issue.

SIGHASH_GROUP [1] would be an alternative to ALL/SINGLE/NONE, with the exact
group of outputs being committed to determined via the annex.
ANYPREVOUTANYSCRIPT|GROUP would commit to:

  nVersion
  nLockTime
  nSequence
  spend_type and annex present
  sha_annex (if present)
  sha_group_outputs (GROUP)
  key_version
  codesep_pos

So in that case if you have your two inputs:

  0.42 [pays 0.21 to A, 0.10 to B, 0.10 to C]
  0.42 [pays 0.21 to A, 0.10 to B, 0.10 to C]

then, either:

  a) if they're both committed with GROUP and sig_group_count = 3, then
     the outputs must be [0.21 A, 0.10 B, 0.10 C, 0.21 A, 0.10 B, 0.10
     C], and you don't lose funds

  b) if they're both committed with GROUP and the first is
     sig_group_count=3 and the second is sig_group_count=0, then the
     outputs can be [0.21 A, 0.10 B, 0.10 C, *anything] -- but in that
     case the second input is already signalling that it's meant to be
     paired with another input to fund the same three outputs, so any
     funds loss is at least intentional

Note that this means txids are very unstable: if a tx is only protected
by SIGHASH_GROUP commitments then miners/relayers can add outputs, or
reorganise the groups without making the tx invalid. Beyond requiring
the signatures to be APO/APOAS-based to deal with that, we'd also need
to avoid txs getting rbf-pinned by some malicious third party who pulls
apart the groups and assembles a new tx that's hard to rbf but also
unlikely to confirm due to having a low feerate.

Note also that not reusing addresses solves this case -- it's only a
problem when you're paying the same amounts to the same addresses.

Being able to combine additional inputs and outputs at a later date
(which necessarily changes the txid) is an advantage though: it lets
you add additional funds and claim change, which allows you to adjust
to different fee rates.

I don't think the SIGHASH_GROUP approach would work very well without
access to the annex, ie if you're trying to do CTV encoded either in a
plain scriptPubKey or via segwit/p2sh.

I think that would give 16 different sighashes, choosing one of four
options for outputs,

 ALL/NONE/SINGLE/GROUP
   -- which outputs are committed to

and one of four options for inputs,

 -/ANYONECANPAY/ANYPREVOUT/ANYPREVOUTANYSCRIPT
   -- all inputs committed to, specific input committed to,
      scriptpubkey/tapscript committed to, or just the
      nseq/annex/codesep_pos

vs the ~155,000 sighashes in the TXHASH proposal.

I don't think there's an efficient way of doing SIGHASH_GROUP via tx
introspection opcodes that doesn't also introduce a quadratic hashing
risk -- you need to prevent different inputs from re-hashing distinct but
overlapping sets of outputs, and if your opcodes only allow grabbing one
output at a time to add to the message being signed you have to do a lot
of coding if you want to let the signer choose how many outputs to commit
to; if you provide an opcode that grabs man outputs to hash, it seems
hard to do that generically in a way that avoids quadratic behaviour.

So I think that suggests two alternative approaches, beyond the
VERIFY-vs-PUSH semantic:

 - have a dedicated sighash type for CTV (either an explicit one for it,
   per bip119, or support thousands of options like the proposal in this
   thread, one of which happens to be about the same as the bip119 idea)

 - use ANYPREVOUTANYSCRIPT|GROUP for CTV, which means also implementing
   annex parsing and better RBF behaviour to avoid those txs being
   excessively vulnerable to pinning; with the advantage being that
   txs using "GROUP" sigs can be combined either for batching purposes
   or for adapting to the fee market after the signature has been made,
   and the disadvantage that you can't rely on stable txids when looking
   for CTV spends and have to continue using APO/APOAS when chaining
   signatures on top of unconfirmed CTV outputs

Cheers,
aj

[0] Here's bitmatrix trying to multiply two numbers together:
     https://medium.com/bit-matrix/technical-how-does-bitmatrix-v1-multiply-two-integers-in-the-absence-of-op-mul-a58b7a3794a3

    Likewise, doing a point preimage reveal via clever scripting
    pre-taproot never saw an implementation, despite seeming
    theoretically plausible.
     https://lists.linuxfoundation.org/pipermail/lightning-dev/2015-November/000344.html

[1] https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2021-July/019243.html