public inbox for bitcoindev@googlegroups.com
 help / color / mirror / Atom feed
From: Jeremy <jlrubin@mit•edu>
To: Jeremy <jlrubin@mit•edu>
Cc: Bitcoin Protocol Discussion <bitcoin-dev@lists•linuxfoundation.org>
Subject: Re: [bitcoin-dev] BIP OP_CHECKTEMPLATEVERIFY
Date: Tue, 10 Dec 2019 16:37:59 -0800	[thread overview]
Message-ID: <CAD5xwhiQiCZJ18fqJKsW8Z5g2x4TxSyQeNf0+qEkr-UcLat-1A@mail.gmail.com> (raw)
In-Reply-To: <CAD5xwhi115pHK4J4=WDX=xbusxG_qP-oOWYNsD4z1Hh7JZ1yzQ@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 17789 bytes --]

Three changes I would like to make to the OP_CTV draft. I think this should
put the draft in a very good place w.r.t. outstanding feedback.

The changes can all be considered/merged independently, though, they are
written below assuming all of them are reasonable.


1) *Make the hash commit to the INPUT_INDEX of the executing scriptPubKey.*

*Motivation:* As previously specified, a CTV template constructed
specifying a transaction with two or more inputs has a "half-spend" issue
whereby if the template script is paid to more than once (a form of
key-reuse), they can be joined in a single transaction leading to half of
the intended outputs being created.
*Example:*
Suppose I have a UTXO with a CTV requiring two inputs. The first is set to
be the CTV template, and the input has enough money to pay for all the
outputs. The second input is added to allow the adding of a fee-only utxo.
Now suppose someone creates an similar UTXO with this same CTV (even within
the same transaction).


TxA {vin: [/*elided...*/], vout: [TxOut{1 BTC, <TxB template hash> CTV},
TxOut {1 BTC, <TxB template hash> CTV}]}

*Intended Behavior:*
    TxB0 {vin: [Outpoint{TxA.hash(), 0}, /*arbitrary fee utxo*/], vout :
[TxOut {1 BTC, /* arbitrary scriptPubKey */}]
    TxB1 {vin: [Outpoint{TxA.hash(), 1}, /*arbitrary fee utxo*/], vout :
[TxOut {1 BTC, /* arbitrary scriptPubKey */}]
*Possible Unintended Behaviors:*
*    Half-Spend:*
        TxB {vin: [Outpoint{TxA.hash(), 1}, Outpoint{TxA.hash(), 0}], vout
: [TxOut {1 BTC, /* arbitrary scriptPubKey */}]
    *Order-malleation:*
        TxB0 {vin: [/*arbitrary fee utxo*/, Outpoint{TxA.hash(), 0}], vout
: [TxOut {1 BTC, /* arbitrary scriptPubKey */}]
        TxB1 {vin: [Outpoint{TxA.hash(), 1}, /*arbitrary fee utxo*/], vout
: [TxOut {1 BTC, /* arbitrary scriptPubKey */}]

With the new rule, the CTV commits to the index in the vin array that it
will appear. This prevents both the half-spend issue and the
order-malleation issue.

Thus, the only execution possible is:

*Intended Behavior:*
    TxB0 {vin: [Outpoint{TxA.hash(), 0}, /*arbitrary fee utxo*/], vout :
[TxOut {1 BTC, /* arbitrary scriptPubKey */}]
    TxB1 {vin: [Outpoint{TxA.hash(), 1}, /*arbitrary fee utxo*/], vout :
[TxOut {1 BTC, /* arbitrary scriptPubKey */}]

*Impact of Change:*
This behavior change is minor -- in most cases we are expecting templates
with a single input, so committing the input index has no effect.

Only when we do specify multiple inputs, committing the INPUT_INDEX has the
side effect of making reused-keys not susceptible to the "half-spend" issue.

This change doesn't limit the technical capabilities of OP_CTV by much
because cases where the half-spend construct is desired can be specified by
selecting the correct inputs for the constituent transactions for the
transaction-program. In the future, Taproot can make it easier to express
contracts where the input can appear at any index by committing to a tree
of positions.

This change also has the benefit of reducing the miner-caused TXID
malleability in certain applications (e.g., in a wallet vault you can
reduce malleability from your deposit flow, preventing an exponential
blow-up). However in such constructions the TXIDs are still malleable if
someone decides to pay you Bitcoin that wasn't previously yours through a
withdrawal path (a recoverable error, and on the bright side, someone paid
you Bitcoin to do it).

This change also has a minor impact on the cacheability of OP_CTV. In the
reference implementation we currently precompute and store single hash for
the StandardTemplateHash of the entire transaction. Making the hash vary
per-input means that we would need to precompute one hash per-input, which
is impractical. Given that we expect the 0-index to be the exceedingly
common case, and it's not horribly expensive if we aren't cached (a
constant sized SHA-256), the spec will be updated to precompute and cache
only the hash for the 0th index. (The hash is also changed slightly to make
it more efficient for un-cached values, as noted in change 3).


*2) Remove Constexpr restriction*
*Changes:*
Currently it is checked that the template hash argument was not 'computed',
but came from a preceding push. Remove all this logic and accept any
argument.
*Motivation:*
I've had numerous conversations with Bitcoin developers (see above, see
#bitcoin-wizards on Nov 28th 2019, in person at local meetups, and in
private chats with ecosystem developers) about the constexpr restriction in
OP_CTV. There have been a lot of folks asking to remove template constexpr
restriction, for a few reasons:

a) Parsing Simplification / no need for special-casing in optimizers like
miniscript
b) The types of script it disables aren't dangerous
c) There are exciting things you can do were it not there and other
features were enabled (OP_CAT)
d) Without other features (like OP_CAT), there's not really too much you
can do

No one has expressed any strong justification to keep it.

The main motivation for the constexpr restriction was to keep the CTV
proposal very conservative in scope, increasing the likelihood that it is
an acceptable change. It was also designed to be able to be easily lifted
in a future soft-fork. There isn't a *specific* behavior the constexpr
restriction is attempting to prevent, it's just a belt-and-suspenders
measure to limit how creatively CTV could be used now or in the future.

Future OpCodes + OP_CTV may introduce a broader set of functionality than
possible if OP_CTV were to retain the constexpr rule. But I think given
that these future op-codes will likely be introduced intentionally to
introduce broader functionalities, we shouldn't limit the functionality of
OP_CTV today.

*Impact of Changes:*

The only mildly interesting thing that could be done with this change (with
no additional changes; that I could think of) would be to write a script
like:

<serialization of transaction fields according to hash spec> SHA256 OP_CTV

which would be a "self-describing" covenant (for no space savings). This
could be useful in some protocols where "the public" should be able to
execute some step with only chain-data.

N.B. This cannot enable a case where the CTV is in the scriptSig like:

scriptPubKey: <key> CHECKSIG
scriptSig: <serialization of transaction details> OP_SHA256 CTV <sig>

because the serialization of the transaction contains a commitment to
non-null scriptSigs, a self-reference/hash cycle.


*3) Modify the template digest to be easier to cache and work with in
script.*
*Changes:*
The current hash is:

    uint256 GetStandardTemplateHash(const CTransaction& tx) {
        auto h =  TaggedHash("StandardTemplateHash")
            << tx.nVersion << tx.nLockTime
            << GetOutputsSHA256(tx) << GetSequenceSHA256(tx)
            << uint64_t(tx.vin.size());
        for (const auto& in : tx.vin) h << in.scriptSig;
        return h.GetSHA256();
    }

I propose changing it to:

    uint256 GetStandardTemplateHash(const CTransaction& tx, uint32_t
input_index) {

        uint256 scriptSig_hash{};

        bool hash_scriptSigs = std::count(tx.vin.begin(),
tx.vin.begin(), CScript()) != tx.vin().size();

        if (hash_scriptSigs) {
            auto h =  CHashWriter()
            for (const auto& in : tx.vin) h << in.scriptSig;

            scriptSig_hash = h.GetSHA256();

        }
        auto h =  CHashWriter()
            << tx.nVersion
            << tx.nLockTime;

            if (hash_scriptSigs) h << scriptSig_hash;

            h << uint64_t(tx.vin.size())

            << GetSequenceSHA256(tx)

            << uint32_t(tx.vout.size())

            << GetOutputsSHA256(tx)

            << input_index;

        return h.GetSHA256();
    }

This changes a few things:
1) Getting rid of the TaggedHash use

2) Re-ordering to put input_index last

3) Re-ordering to put Outputs SHA256 second-to-last

4) Only computing scriptSig_hash if any scriptSig is non-null

5) Making scriptSig data hashed in it's own hash-buffer

6) explicitly committing to the vout.size()
7) Casting vout.size() but not vin.size() to uint32_t (vout is capped
by COutpoint indicies to 32 bits, vin is not)

*Motivation:*
The current digest algorithm is relatively arbitrarily ordered and set up.
Modifying it makes it easier to cache (given the input index change) and
makes it easier to construct templates in script (once OP_CAT, or
OP_SUBSTR, or OP_STREAMSHA256 are added to core).

*Impact of Changes:*

*1) Getting rid of the TaggedHash use*

Low-impact. TaggedHash didn't add any security to the template hashes,
but did make it harder to "confuse" a StandardTemplateHash for a hash
of another type.

However, the tagged hash makes it slightly more difficult/costly to
construct (with OP_CAT enabled) a template hash within script, so it
is removed.

*2) Re-ordering to put input_index last*

The input index should be put last because this makes it easy to cache
the intermediate hash state *just before* hashing the index, which
makes recomputing for different indexes cheaper.

It also allows (with OP_CAT or STREAMSHA256) to easily allow setting
the accepted indexes from script.

*3) Re-ordering to put Outputs SHA256 second-to-last*

In the future, with OP_CAT/SHA256STREAM or similar, changing the
outputs in the covenant is the most likely change. Placing it near the
end simplifies this operation.


*4) Only computing scriptSig_hash if any scriptSig is non-null*

There is no need to hash the scriptSig data at all if they are all
null. This is in most cases true, so we avoid extra hashing.

But the bigger win is for scripted construction of templates, which
can just completely ignore the scriptSig hashing if it is known to be
using all bare CTV/non-p2sh segwit inputs (which should be the common
case).


*5) Making scriptSig data hashed in it's own hash-buffer, when hash is
included.*

This implies that there are two possible sizes for the hashed data,
+/- 1 hash (for scripSig_hash). This eliminates concerns that directly
hashing elements into the template hash buffer might expose some
length extension issue when constructing a template in script.


*6) explicitly committing to the vout.size()*

This makes it easier, when OP_CAT or similar is added, to write
restrictions that guarantee a limit on the number of inputs that may
be created.


*7) Casting vout.size() but not vin.size() to uint32_t (vout is capped
by COutpoint indicies to 32 bits, vin is not)*

This is just kind of annoying, but technically you can have more inputs in
a transaction than outputs because more than 32-bits of outputs breaks the
COutpoint class invariants.


--
@JeremyRubin <https://twitter.com/JeremyRubin>
<https://twitter.com/JeremyRubin>


On Thu, Nov 28, 2019 at 11:59 AM Jeremy <jlrubin@mit•edu> wrote:

> Thanks for the feedback Russell, now and early. It deeply informed the
> version I'm proposing here.
>
> I weighed carefully when selecting this design that I thought it would be
> an acceptable tradeoff after our discussion, but I recognize this isn't
> exactly what you had argued for.
>
> First off, with respect to the 'global state' issue, I figured it was
> reasonable with this choice of constexpr rule given that a reasonable tail
> recursive parser might look something like:
>
> parse (code : rest) stack alt_stack just_pushed =
>     match code with
>         OP_PUSH => parse rest (x:stack) alt_stack True
>         OP_DUP => parse rest (x:stack) alt_stack False
>         // ...
>
> So we're only adding one parameter which is a bool, and we only need to
> ever set it to an exact value based on the current code path, no
> complicated rules. I'm sensitive to the complexity added when formally
> modeling script, but I think because it is only ever a literal, you could
> re-write it as co-recursive:
>
> parse_non_constexpr (code : rest) stack alt_stack =
>     match code with
>         OP_PUSH => parse_constexpr rest (x:stack) alt_stack
>         OP_DUP => parse_non_constexpr rest (x:stack) alt_stack
>         // ...
>
> parse_constexpr (code : rest) stack alt_stack  =
>     match code with
>         OP_CTV => ...
>         _ => parese_non_constexpr (code : rest) stack alt_stack
>
>
> If I recall, this should help a bit with the proof automatability as it's
> easier in the case by case breakdown to see the unconditional code paths.
>
>
> In terms of upgrade-ability, one of the other reasons I liked this design
> is that if we do enable OP_CTV for non-constexpr arguments, the issue
> basically goes away and the OP becomes "pure" without any state tracking.
> (I think the switching on argument size is much less a concern because we
> already use similar upgrade mechanisms elsewhere, and it doesn't add
> parsing context).
>
>
> It's also possible, as I think *should be done* for tooling to treat an
> unbalanced OP_CTV as a parsing error. This will always produce
> consensus-valid scripts! However by keeping the consensus rules more
> relaxed we keep our upgrade-ability paths open for OP_CTV, which as I
> understand from speaking with other users is quite desirable.
>
>
> Best (and happy thanksgiving to those celebrating),
>
> Jeremy
>
> --
> @JeremyRubin <https://twitter.com/JeremyRubin>
> <https://twitter.com/JeremyRubin>
>
>
> On Thu, Nov 28, 2019 at 6:33 AM Russell O'Connor <roconnor@blockstream•io>
> wrote:
>
>> Thanks for this work Jeremy.
>>
>> I know we've discussed this before, but I'll restate my concerns with
>> adding a new "global" state variable to the Script interpreter for tracking
>> whether the previous opcode was a push-data operation or not.  While it
>> isn't so hard to implement this in Bitcoin Core's Script interpreter,
>> adding a new global state variable adds that much more complexity to anyone
>> trying to formally model Script semantics.  Perhaps one can argue that
>> there is already (non-stack) state in Script, e.g. to deal with
>> CODESEPARATOR, so why not add more?  But I'd argue that we should avoid
>> making bad problems worse.
>>
>> If we instead make the CHECKTEMPLATEVERIFY operation fail if it isn't
>> preceded by (or alternatively followed by) an appropriate sized
>> (canonical?) PUSHDATA constant, even in an unexecuted IF branch, then we
>> can model the Script semantics by considering the
>> PUSHDATA-CHECKTEMPLATEVERIFY pair as a single operation.  This allows
>> implementations to consider improper use of CHECKTEMPLATEVERIFY as a
>> parsing error (just as today unbalanced IF-ENDIF pairs can be modeled as a
>> parsing error, even though that isn't how it is implemented in Bitcoin
>> Core).
>>
>> I admit we would lose your soft-fork upgrade path to reading values off
>> the stack; however, in my opinion, this is a reasonable tradeoff.  When we
>> are ready to add programmable covenants to Script, we'll do so by adding
>> CAT and operations to push transaction data right onto the stack, rather
>> than posting a preimage to this template hash.
>>
>> Pleased to announce refinements to the BIP draft for
>>> OP_CHECKTEMPLATEVERIFY (replaces previous OP_SECURETHEBAG BIP). Primarily:
>>>
>>> 1) Changed the name to something more fitting and acceptable to the
>>> community
>>> 2) Changed the opcode specification to use the argument off of the stack
>>> with a primitive constexpr/literal tracker rather than script lookahead
>>> 3) Permits future soft-fork updates to loosen or remove "constexpr"
>>> restrictions
>>> 4) More detailed comparison to alternatives in the BIP, and why
>>> OP_CHECKTEMPLATEVERIFY should be favored even if a future technique may
>>> make it semi-redundant.
>>>
>>> Please see:
>>> BIP: https://github.com/JeremyRubin/bips/blob/ctv/bip-ctv.mediawiki
>>> Reference Implementation:
>>> https://github.com/JeremyRubin/bitcoin/tree/checktemplateverify
>>>
>>> I believe this addresses all outstanding feedback on the design of this
>>> opcode, unless there are any new concerns with these changes.
>>>
>>> I'm also planning to host a review workshop in Q1 2020, most likely in
>>> San Francisco. Please fill out the form here
>>> https://forms.gle/pkevHNj2pXH9MGee9 if you're interested in
>>> participating (even if you can't physically attend).
>>>
>>> And as a "but wait, there's more":
>>>
>>> 1) RPC functions are under preliminary development, to aid in testing
>>> and evaluation of OP_CHECKTEMPLATEVERIFY. The new command
>>> `sendmanycompacted` shows one way to use OP_CHECKTEMPLATEVERIFY. See:
>>> https://github.com/JeremyRubin/bitcoin/tree/checktemplateverify-rpcs.
>>> `sendmanycompacted` is still under early design. Standard practices for
>>> using OP_CHECKTEMPLATEVERIFY & wallet behaviors may be codified into a
>>> separate BIP. This work generalizes even if an alternative strategy is used
>>> to achieve the scalability techniques of OP_CHECKTEMPLATEVERIFY.
>>> 2) Also under development are improvements to the mempool which will, in
>>> conjunction with improvements like package relay, help make it safe to lift
>>> some of the mempool's restrictions on longchains specifically for
>>> OP_CHECKTEMPLATEVERIFY output trees. See: https://github.com/bitcoin/bitcoin/pull/17268
>>> This work offers an improvement irrespective of OP_CHECKTEMPLATEVERIFY's
>>> fate.
>>>
>>>
>>> Neither of these are blockers for proceeding with the BIP, as they are
>>> ergonomics and usability improvements needed once/if the BIP is activated.
>>>
>>> See prior mailing list discussions here:
>>>
>>> *
>>> https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2019-May/016934.html
>>> *
>>> https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2019-June/016997.html
>>>
>>> Thanks to the many developers who have provided feedback on iterations
>>> of this design.
>>>
>>> Best,
>>>
>>> Jeremy
>>>
>>> --
>>> @JeremyRubin <https://twitter.com/JeremyRubin>
>>>
>>

[-- Attachment #2: Type: text/html, Size: 38780 bytes --]

  reply	other threads:[~2019-12-11  0:38 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-26  1:50 Jeremy
2019-11-27 21:32 ` Russell O'Connor
2019-11-28 19:59   ` Jeremy
2019-12-11  0:37     ` Jeremy [this message]
2019-12-13 23:06       ` Jeremy
2019-12-19 20:08         ` Jeremy
     [not found]         ` <20191214122546.5e72eb93@simplexum.com>
     [not found]           ` <CAD5xwhgwhOwuPjKz-0_y7HP=jTi=6wJo8uH6HqCvOndr6wo0+Q@mail.gmail.com>
2020-02-14 11:18             ` Dmitry Petukhov
2020-02-14 19:16               ` Jeremy
2020-09-03 14:42                 ` Dmitry Petukhov
2020-09-03 17:34                   ` Jeremy
2020-09-03 17:47                     ` Jeremy
2020-02-15  0:24               ` ZmnSCPxj
2020-06-07 16:51 ` Joachim Strömbergson
2020-06-07 22:45   ` Jeremy
2020-06-08  6:05     ` Dmitry Petukhov
2020-06-08  6:43       ` [bitcoin-dev] [was BIP OP_CHECKTEMPLATEVERIFY] Fee Bumping Operation Jeremy
2020-06-08  7:15         ` Dmitry Petukhov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAD5xwhiQiCZJ18fqJKsW8Z5g2x4TxSyQeNf0+qEkr-UcLat-1A@mail.gmail.com \
    --to=jlrubin@mit$(echo .)edu \
    --cc=bitcoin-dev@lists$(echo .)linuxfoundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox