public inbox for bitcoindev@googlegroups.com
 help / color / mirror / Atom feed
From: Anthony Towns <aj@erisian•com.au>
To: Russell O'Connor <roconnor@blockstream•com>,
	Bitcoin Protocol Discussion
	<bitcoin-dev@lists•linuxfoundation.org>
Subject: Re: [bitcoin-dev] TXHASH + CHECKSIGFROMSTACKVERIFY in lieu of CTV and ANYPREVOUT
Date: Fri, 18 Feb 2022 00:27:27 +1000	[thread overview]
Message-ID: <20220217142727.GA1429@erisian.com.au> (raw)
In-Reply-To: <CAMZUoKmp_B9vYX8akyWz6dXtrx6PWfDV6mDVG5Nk2MZdoAqnAg@mail.gmail.com>

On Mon, Feb 07, 2022 at 09:16:10PM -0500, Russell O'Connor via bitcoin-dev wrote:
> > > For more complex interactions, I was imagining combining this TXHASH
> > > proposal with CAT and/or rolling SHA256 opcodes.
> Indeed, and we really want something that can be programmed at redemption
> time.

I mean, ideally we'd want something that can be flexibly programmed at
redemption time, in a way that requires very few bytes to express the
common use cases, is very efficient to execute even if used maliciously,
is hard to misuse accidently, and can be cleanly upgraded via soft fork
in the future if needed?

That feels like it's probably got a "fast, cheap, good" paradox buried
in there, but even if it doesn't, it doesn't seem like something you
can really achieve by tweaking around the edges?

> That probably involves something like how the historic MULTISIG worked by
> having list of input / output indexes be passed in along with length
> arguments.
> 
> I don't think there will be problems with quadratic hashing here because as
> more inputs are list, the witness in turns grows larger itself.

If you cache the hash of each input/output, it would mean each byte of
the witness would be hashing at most an extra 32 bytes of data pulled
from that cache, so I think you're right. Three bytes of "script" can
already cause you to rehash an additional ~500 bytes (DUP SHA256 DROP),
so that should be within the existing computation-vs-weight relationship.

If you add the ability to hash a chosen output (as Rusty suggests, and
which would allow you to simulate SIGHASH_GROUP), your probably have to
increase your cache to cover each outputs' scriptPubKey simultaneously,
which might be annoying, but doesn't seem fatal.

> That said, your SIGHASH_GROUP proposal suggests that some sort of
> intra-input communication is really needed, and that is something I would
> need to think about.

I think the way to look at it is that it trades off spending an extra
witness byte or three per output (your way, give or take) vs only being
able to combine transactions in limited ways (sighash_group), but being
able to be more optimised than the more manual approach.

That's a fine tradeoff to make for something that's common -- you
save onchain data, make something easier to use, and can optimise the
implementation so that it handles the common case more efficiently.

(That's a bit of a "premature optimisation" thing though -- we can't
currently do SIGHASH_GROUP style things, so how can you sensibly justify
optimising it because it's common, when it's not only currently not
common, but also not possible? That seems to me a convincing reason to
make script more expressive)

> While normally I'd be hesitant about this sort of feature creep, when we
> are talking about doing soft-forks, I really think it makes sense to think
> through these sorts of issues (as we are doing here).

+1

I guess I especially appreciate your goodwill here, because this has
sure turned out to be a pretty long message as I think some of these
things through out loud :)

> > "CAT" and "CHECKSIGFROMSTACK" are both things that have been available in
> > elements for a while; has anyone managed to build anything interesting
> > with them in practice, or are they only useful for thought experiments
> > and blog posts? To me, that suggests that while they're useful for
> > theoretical discussion, they don't turn out to be a good design in
> > practice.
> Perhaps the lesson to be drawn is that languages should support multiplying
> two numbers together.

Well, then you get to the question of whether that's enough, or if
you need to be able to multiply bignums together, etc? 

I was looking at uniswap-like things on liquid, and wanted to do constant
product for multiple assets -- but you already get the problem that "x*y
< k" might overflow if the output values x and y are ~50 bits each, and
that gets worse with three assets and wanting to calculate "x*y*z < k",
etc. And really you'd rather calculate "a*log(x) + b*log(y) + c*log(z)
< k" instead, which then means implementing fixed point log in script...

> Having 2/3rd of the language you need to write interesting programs doesn't
> mean that you get 2/3rd of the interesting programs written.

I guess to abuse that analogy: I think you're saying something like
we've currently got 67% of an ideal programming language, and CTV
would give us 68%, but that would only take us from 10% to 11% of the
interesting programs. I agree txhash might bump that up to, say, 69%
(nice) but I'm not super convinced that even moves us from 11% to 12%
of interesting programs, let alone a qualitative leap to 50% or 70%
of interesting programs.

It's *possible* that the ideal combination of opcodes will turn out to
be CAT, TXHASH, CHECKSIGFROMSTACK, MUL64LE, etc, but it feels like it'd
be better working something out that fits together well, rather than
adding things piecemeal and hoping we don't spend all that effort to
end up in a local optimum that's a long way short of a global optimum?

[rearranged:]

> The flexibility of TXHASH is intended to head off the need for future soft
> forks.  If we had specific applications in mind, we could simply set up the
> transaction hash flags to cover all the applications we know about.  But it
> is the applications that we don't know about that worry me.  If we don't
> put options in place with this soft-fork proposal, then they will need
> their own soft-fork down the line; and the next application after that, and
> so on.
> 
> If our attitude is to craft our soft-forks as narrowly as possible to limit
> them to what only allows for given tasks, then we are going to end up
> needing a lot more soft-forks, and that is not a good outcome.

I guess I'm not super convinced that we're anywhere near the right level
of generality that this would help in avoiding future soft forks? That's
what I meant by it not covering SIGHASH_GROUP.

I guess the model I have in my head, is that what we should ideally
have a general/flexible/expressive but expensive way of doing whatever
scripting you like (so a "SIMPLICITY_EXEC" opcode, perhaps), but then,
as new ideas get discovered and widely deployed, we should then make them
easy and cheap to use (whether that's deploying a "jet" for the simplicity
code, or a dedicated opcode, or something else), but "cheap to use"
means defining a new cost function (or defining new execution conditions
for something that was already cheaper than the cheapest existing way
of encoding those execution conditions), which is itself a soft fork
since to make it "cheaper" means being able to fit more transactions
using that feature into a block than was previously possible..

But even then, based on [0], pure simplicity code to verify a signature
apparently takes 11 minutes, so that code probably should cost 66M vbytes
(based on a max time to verify a block of 10 seconds), which would
make it obviously unusable as a bitcoin tx with their normal 100k vbyte
limit... Presumably an initial simplicity deployment would come with a
bunch of jets baked in so that's less of an issue in practice... 

But I think that means that even with simplicity you couldn't experiment
with alternative ECC curves or zero knowledge stuff without a soft fork
to make the specific setup fast and cheap, first.

[0] https://medium.com/blockstream/simplicity-jets-release-803db10fd589

(I think this approach would already be an improvement in how we do soft
forks, though: (1) for many things, you would already have on-chain
evidence that this is something that's worthwhile, because people are
paying high fees to do it via hand-coded simplicity, so there's no
question of whether it will be used; (2) you can prove the jet and the
simplicity code do the exact same thing (and have unit/fuzz tests to
verify it), so can be more confident that the implementation is correct;
(3) maybe it's easier to describe in a bip that way too, since you can
just reference the simplicity code it's replacing rather than having
C++ code?)

That still probably doesn't cover every experiment you might want to do;
eg if you wanted to have your tx commit to a prior block hash, you'd
presumably need a soft fork to expose that data; and if you wanted to
extend the information about the utxo being spent (eg a parity bit for
the internal public key to make recursive TLUV work better) you'd need a
soft fork for that too.


I guess a purist approach to generalising sighashes might look something
like:

   [s] [shimplicity] DUP EXEC p CHECKSIGFROMSTACK

where both s and shimplicity (== sighash + simplicity or shim + simplicity
:) are provided by the signer, with s being a signature, and shimplicity
being a simplicity script that builds a 32 byte message based on whatever
bits of the transaction it chooses as well as the shimplicity script
itself to prevent malleability.

But writing a shimplicity script all the time is annoying, so adding an
extra opcode to avoid that makes sense, reducing it to:

   [s] [sh] TXHASH p CHECKIGFROMSTACK

which is then equivalent to the exisitng

   [s|sh] p CHECKSIG

Though in that case, wouldn't you just have "txhash(sh)" be your
shimplicity script (in which case txhash is a jet rather than an opcode),
and keep the program as "DUP EXEC p CHECKSIGFROMSTACK", which then gives
the signer maximum flexibility to either use a standard sighash, or
write special code to do something new and magic?

So I think I'm 100% convinced that a (simplified) TXHASH makes sense in
a world where we have simplicity-equivalent scripting (and where there's
*also* some more direct introspection functionality like Rusty's OP_TX
or elements' tapscript opcodes or whatever).

(I don't think there's much advantage of a TaggedHash opcode that
takes the tag as a parameter over just writing "SHA256 DUP CAT SWAP CAT
SHA256", and if you were going to have a "HASH_TapSighash" opcode it
probably should be limited to hashing the same things from the bip that
defines it anyway. So having two simplicity functions, one for bip340
(checksigfromstack) and one for bip342 (generating a signature message
for the current transaction) seems about ideal)

But, I guess that brings me back to more or less what Jeremy asked
earlier in this thread:

] Does it make "more sense" to invest the research and development effort
] that would go into proving TXHASH safe, for example, into Simplicity
] instead?

Should we be trying to gradually turn script into a more flexible
language, one opcode at a time -- going from 11% to 12% to 13.4% to
14.1% etc of coverage of interesting programs -- or should we invest
that time/effort into working on simplicity (or something like chialisp
or similar) instead? That is, something where we could actually evaluate
how all the improved pieces fit together rather than guessing how it might
work if we maybe in future add CAT or 64 bit maths or something else...

If we put all our "language design" efforts into simplicity/whatever,
we could leave script as more of a "macro" language than a programming
one; that is, focus on it being an easy, cheap, safe way of doing the
most common things. I think that would still be worthwhile, both before
and after simplicity/* is available?

I think my opinions are:

 * recursive covenants are not a problem; avoiding them isn't and
   shouldn't be a design goal; and trying to prevent other people using
   them is wasted effort

 * having a language redesign is worthwhile -- there are plenty of ways
   to improve script, and there's enough other blockchain languages out
   there by now that we ought be able to avoid a "second system effect"
   disaster

 * CTV via legacy script saves ~17 vbytes compared to going via
   tapscript (since the CTV hash is already in the scriptPubKey and the
   internal pubkey isn't needed, so neither need to be revealed to spend)
   and avoids the taproot ECC equation check, at the cost of using up
   an OP_NOP opcode. That seems worthwhile to me. Comparatively, TXHASH
   saves ~8 vbytes compared to emulating it with CTV (because you don't
   have to supply an unacceptable hash on demand). So having both may be
   worthwhile, but if we only have one, CTV seems the bigger saving? And
   we're not wasting an opcode if we do CTV now and add TXHASH later,
   since we TXHASH isn't NOP-compatible and can't be included in legacy
   script anyway.

 * TXHASH's "PUSH" behaviour vs CTV's "examine the stack but don't
   change it, and VERIFY" behaviour is independent of the question of 
   if we want to supply flags to CTV/TXHASH so they're more flexible

And perhaps less strongly:

 * I don't like the ~18 TXHASH flags; for signing/CTV behaviour, they're
   both overkill (they have lots of seemingly useless combinations)
   and insufficient (don't cover SIGHASH_GROUP), and they add additional
   bytes of witness data, compared to CTV's zero-byte default or CHECKSIG's
   zero/one-byte sighash which only do things we know are useful (well,
   some of those combinations might not be useful either...).

 * If we're deliberately trying to add transaction introspection, then
   all the flags do make sense, but Rusty's unhashed "TX" approach seems
   better than TXHASH for that (assuming we want one opcode versus the
   many opcodes elements use). But if we want that, we probably should
   also add maths opcodes that can cope with output amounts, at least;
   and perhaps also have some way for signatures to some witness data
   that's used as script input. Also, convenient introspection isn't
   really compatible with convenient signing without some way of
   conveniently converting data into a tagged hash. 

 * I'm not really convinced CTV is ready to start trying to deploy
   on mainnet even in the next six months; I'd much rather see some real
   third-party experimentation *somewhere* public first, and Jeremy's CTV
   signet being completely empty seems like a bad sign to me. Maybe that
   means we should tentatively merge the feature and deploy it on the
   default global signet though?  Not really sure how best to get more
   real world testing; but "deploy first, test later" doesn't sit right.

I'm not at all sure about bundling CTV with ANYPREVOUT and SIGHASH_GROUP:

Pros:

 - with APO available, you don't have to worry as much if spending
   a CTV output doesn't result in a guaranteed txid, and thus don't need
   to commit to scriptSigs and they like

 - APOAS and CTV are pretty similar in what they hash

 - SIGHASH_GROUP lets you add extra extra change outputs to a CTV spend
   which you can't otherwise do

 - reusing APOAS's tx hash scheme for CTV would avoid some of the weird
   ugly bits in CTV (that the input index is committed to and that the
   scriptSig is only "maybe!" included)

 - defining SIGHASH_GROUP and CTV simultaneously might let you define
   the groups in a way that is compatible between tapscript (annex-based)
   and legacy CTV. On the other hand, this probably still works provided
   you deploy SIGHASH_GROUP /after/ CTV is specced in (by defining CTV
   behaviour for a different length arg)

Cons:

 - just APOAS|ALL doesn't quite commit to the same things as bip 119 CTV
   and that matters if you reuse CTV addresses

 - SIGHASH_GROUP assumes use of the annex, which would need to be
   specced out; SIGHASH_GROUP itself doesn't really have a spec yet either

 - txs signed by APOAS|GROUP are more malleable than txs with a bip119
   CTV hash which might be annoying to handle even non-adversarially

 - that malleability with current RBF rules might lead to pinning
   problems

I guess for me that adds up to:

 * For now, I think I prefer OP_CTV over either OP_TXHASH alone or both
   OP_CTV and OP_TXHASH

 * I'd like to see CTV get more real-world testing before considering
   deployment

 * If APO/SIGHASH_GROUP get specced, implemented *and* tested by the
   time CTV is tested enough to think about deploying it, bundle them

 * Unless CTV testing takes ages, it's pretty unlikely it'll be worth
   simplifying CTV to more closely match APO's tx hashing

 * CAT, CHECKSIGFROMSTACK, tx introspection, better maths *are* worth
   prioritising, but would be better as part of a more thorough language
   overhaul (since you can analyse how they interact with each other
   in combination, and you get a huge jump from ~10% to ~80% benefit,
   instead of tiny incremental ones)?

I guess that's all partly dependent on thinking that, TXHASH isn't
great for tx introspection (especially without CAT) and, (without tx
introspection and decent math opcodes), DLCs already provide all the
interesting oracle behaviour you're really going to get...

> I don't know if this is the answer you are looking for, but technically
> TXHASH + CAT + SHA256 awkwardly gives you limited transaction reflection.
> In fact, you might not even need TXHASH, though it certainly helps.

Yeah, it wasn't really what I was looking for but it does demolish that
specific thought experiment anyway.

> > I believe "sequences hash", "input count" and "input index" are all an
> > important part of ensuring that if you have two UTXOs distributing 0.42
> > BTC to the same set of addresses via CTV, that you can't combine them in a
> > single transaction and end up sending losing one of the UTXOs to fees. I
> > don't believe there's a way to resolve that with bip 118 alone, however
> > that does seem to be a similar problem to the one that SIGHASH_GROUP
> > tries to solve.
> It was my understanding that it is only "input count = 1" that prevents
> this issue.

If you have input count = 1, that solves the issue, but you could also
have input count > 1, and simply commit to different input indexes to
allow/require you to combine two CTV utxos into a common set of new
outputs, or you could have input count > 1 but input index = 1 for both
utxos to prevent combining them with each other, but allow adding a fee
funding input (but not a change output; and at a cost of an unpredictable
txid).

(I only listed "sequences hash" there because it implicitly commits to
"input count")

Cheers,
aj



  reply	other threads:[~2022-02-17 14:27 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-26 17:20 Russell O'Connor
2022-01-26 22:16 ` Jeremy
2022-01-27  4:20   ` James Lu
2022-01-27 19:16   ` Russell O'Connor
2022-01-28  0:18     ` James O'Beirne
2022-01-28 13:14       ` Michael Folkson
2022-01-28 14:17         ` Anthony Towns
2022-01-28 16:38           ` Jeremy
2022-01-28 14:13       ` Russell O'Connor
2022-01-28 15:14         ` James O'Beirne
2022-01-29 15:43           ` Russell O'Connor
2022-01-29 17:02             ` Jeremy Rubin
     [not found]             ` <CAD5xwhjHv2EGYb33p2MRS=VSz=ciGwAsiafX1yRHjxQEXfykSA@mail.gmail.com>
2022-01-29 17:14               ` Russell O'Connor
2022-01-31  2:18       ` Anthony Towns
2022-01-28  1:34 ` Anthony Towns
2022-01-28 13:56   ` Russell O'Connor
2022-02-01  1:16     ` Anthony Towns
2022-02-08  2:16       ` Russell O'Connor
2022-02-17 14:27         ` Anthony Towns [this message]
2022-02-17 14:50           ` Russell O'Connor
2022-02-08  3:40 ` Rusty Russell
2022-02-08  4:34   ` Jeremy Rubin
2022-02-11  0:55     ` [bitcoin-dev] Recursive covenant opposition, or the absence thereof, was " David A. Harding
2022-02-11  3:42       ` Jeremy Rubin
2022-02-11 17:42       ` James O'Beirne
2022-02-11 18:12         ` digital vagabond
2022-02-12 10:54           ` darosior
2022-02-12 15:59             ` Billy Tetrud
2022-02-17 15:15           ` Anthony Towns
2022-02-18  7:34       ` ZmnSCPxj
2022-02-23 11:28       ` ZmnSCPxj
2022-02-23 18:14         ` Paul Sztorc
2022-02-24  2:20           ` ZmnSCPxj
2022-02-24  6:53         ` Anthony Towns
2022-02-24 12:03           ` ZmnSCPxj
2022-02-26  5:38             ` Billy Tetrud
2022-02-26  6:43               ` ZmnSCPxj
2022-02-27  0:58                 ` Paul Sztorc
2022-02-27  2:00                   ` ZmnSCPxj
2022-02-27  7:25                     ` ZmnSCPxj
2022-02-27 16:59                       ` Billy Tetrud
2022-02-27 23:50                         ` Paul Sztorc
2022-02-28  0:20                     ` Paul Sztorc
2022-02-28  6:49                       ` ZmnSCPxj
2022-02-28  7:55                         ` vjudeu
2022-03-04  8:42                           ` ZmnSCPxj
2022-03-04 13:43                             ` vjudeu
2022-02-28 22:54                         ` Paul Sztorc
2022-03-01  5:39                           ` Billy Tetrud
2022-03-02  0:00                             ` Paul Sztorc
2022-03-04 12:35                               ` Billy Tetrud
2022-03-04 20:06                                 ` Paul Sztorc
2022-02-26  6:00             ` Anthony Towns
2022-02-15  8:45     ` [bitcoin-dev] " Rusty Russell
2022-02-15 18:57       ` Jeremy Rubin
2022-02-15 19:12         ` Russell O'Connor
2022-02-16  2:26         ` Rusty Russell
2022-02-16  4:10           ` Russell O'Connor

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220217142727.GA1429@erisian.com.au \
    --to=aj@erisian$(echo .)com.au \
    --cc=bitcoin-dev@lists$(echo .)linuxfoundation.org \
    --cc=roconnor@blockstream$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox