Hi AJ,

Long time listener first time caller here.

All merits (or lack thereof depending on your view) of CTV aside, I find this topic around decision making both interesting and important. While I think I sympathize with the high level concern about making sure there are use cases, interest, and sufficient testing of a particular proposal before soft forking it into consensus code, it does feel like the attempt to attribute hard numbers in this way is somewhat arbitrary.

For example, I think it could be reasonable to paint the list of examples you provided where CTV has been used on signet in a positive light. 317 CTV spends “out in the wild” before there’s a known activation date is quite a lot (more than taproot had afaik). If we don’t think it is enough, then what number of unique spends and use cases should we expect to see of a new proposal before it’s been sufficiently tested?

Perhaps this is simply a case of the Justice Stewart view on pornography where “you know it when you see it”[1], but then if that’s the case than it really doesn’t seem productive to use these numbers to evaluate the readiness and eagerness to use CTV since they are effectively arbitrary and could be used to make the argument in either direction.

So perhaps it just *felt* like there was more need and ready-to-use applications of taproot, and perhaps this feeling is broadly shared and that’s fine, and it doesn’t *feel* that way for CTV, but by the measures you laid out were there as many uses in the wild of taproot spends before it was brought to Speedy Trial? As far as I’m aware there seemed to be more broken [2], premature Taproot spends then actual practical uses of it on a test network before it was fully activated (I might be wrong about this though). Meanwhile the primary tooling necessary to really make it useful (PTLCs, Musig2, FROST) weren’t even fully specced out yet let alone actively being used on a test network and the list of proposed applications on the bitcoin wiki [3] (last updated April 2021) is similarly sparse and not quite up to the standards you’ve set for CTV to be the next soft fork (this is to say nothing of the fact that many prefer to develop, build, and test on regtest than signet at this stage).

Maybe this is similarly an argument for Taproot being activated too early and that may be a fair argument (not one I share to be clear). If that’s the case, I think it’s reasonable to put all cards on the table and we should be explicit that Taproot activation was premature for [X] reasons, here’s the new standard we want to have new proposals hit (a signet, expected level of activity, etc.). Then whether it’s CTV or TLUV or ANYPREVOUT, that’s what we as a community, the developers maintaining the proposals, and the developers/companies that plan to build on top of the new proposal should strive to achieve.

In absence of the above, the risk of a constantly moving bar means the possibility of either insufficiently reviewed proposals getting activated because we’re complacent and just placing trust in a small cadre of experts (nothing malicious in this, just seems worth avoiding) or personal and subjective reasoning allowing for premature ossification and blocking of upgrades that could otherwise be considered safe, useful, and perhaps even necessary on a long enough time horizon.

There’s also the other risk which you point out:

One challenge with building a soft fork is that people don't want to
commit to spending time building something that relies on consensus
features and run the risk that they might never get deployed. But the
reverse of that is also a concern: you don't want to deploy consensus
changes and run the risk that they won't actually turn out to be useful.

Perhaps if we had clear metrics of what would make the work worth it, if we knew what we were working towards, we’d be more likely to get that proof of work. To use your meme, miners know precisely what they’re mining for and what a metric of success looks like which makes the risk/costs of attempting the PoW worth it (or conversely sometimes resulting the decision for miners to be decommissioned). In addition to the examples listed above, even Taproot which most agree had relatively broad consensus didn’t have much work happening on top of it until it was activated. Suredbits as far as I’m aware didn’t want to build DLCs on top of lightning until taproot AND PTLCs were in use (in contrast we already have a DLC implementation that uses CTV[4]). We also have new ideas that only started coming up after Taproot activation (TLUV and Taro for example), so there’s also the unknown of what we could have once it becomes clear that it’s worth devoting mental energy and financial resources towards research.

One last wrinkle with regards to using countable metrics to determine a feature’s “worth” is that not all features are the same. Many of the use cases that people are excited to use CTV for ([5], [6]) are very long term in nature and targeted for long term store of value in contrast to medium of exchange. Metrics for measuring value of a store of value upgrade are fundamentally different than those measuring value of MoE. It’s like people pointing to transaction volume on other cryptocurrency systems to show that their chain is more valuable. You can build a CTV vault in signet, but you’ll only really see a lot of people using it when it’s to store real value on a time scale measured in decades not minutes or days like you might find for lightning testing and experimentation. This doesn’t make one more or less desirable or valuable imo, just that the evaluation metrics should be treated totally differently.

Anyway, I guess that’s a (very) long way of saying, are these constructive ways to evaluate an upgrade and if they are can we maybe have an idea of what a success vs. a failure metric looks like (and in the interest of retrospection and iterative improvement, it would be useful to know if in retrospect Taproot didn’t reach these metrics and maybe was activated prematurely as a result). To put another way and leave CTV out of it completely, what should an outside, unbiased observer that doesn’t spend much time on Twitter expect to be able to see to evaluate the readiness or acceptability of ANYPREVOUT, TLUV, or any other possible future soft forks? If nothing else, nailing this down would seem to help make the lives of key bitcoin core maintainers much easier by removing politics from decisions that should otherwise be as technical in nature as possible.

[1](http://cbldf.org/about-us/case-files/obscenity-case-files/obscenity-case-files-jacobellis-v-ohio-i-know-it-when-i-see-it/)

[2](https://suredbits.com/taproot-funds-burned-on-the-bitcoin-blockchain/)

[3](https://en.bitcoin.it/wiki/Taproot_Uses)

[4](https://github.com/sapio-lang/sapio/blob/master/sapio-contrib/src/contracts/derivatives/dlc.rs)

[5](https://github.com/kanzure/python-vaults/blob/master/vaults/bip119_ctv.py)

[6](https://github.com/jamesob/simple-ctv-vault)


On Thu, Feb 17, 2022 at 01:58:38PM -0800, Jeremy Rubin via bitcoin-dev wrote:
AJ Wrote (in another thread):
 I'd much rather see some real
 third-party experimentation *somewhere* public first, and Jeremy's CTV
 signet being completely empty seems like a bad sign to me.

There's now been some 2,200 txs on CTV signet, of which (if I haven't
missed anything) 317 have been CTV spends:

- none have been bare CTV (ie, CTV in scriptPubKey directly, not via
  p2sh/p2wsh/taproot)

- none have been via p2sh

- 3 have been via taproot:
  https://explorer.ctvsignet.com/tx/f73f4671c6ee2bdc8da597f843b2291ca539722a168e8f6b68143b8c157bee20
  https://explorer.ctvsignet.com/tx/7e4ade977db94117f2d7a71541d87724ccdad91fa710264206bb87ae1314c796
  https://explorer.ctvsignet.com/tx/e05d828bf716effc65b00ae8b826213706c216b930aff194f1fb2fca045f7f11

  The first two of these had alternative merkle paths, the last didn't.

- 314 have been via p2wsh
  https://explorer.ctvsignet.com/tx/62292138c2f55713c3c161bd7ab36c7212362b648cf3f054315853a081f5808e
  (don't think there's any meaningfully different examples?)

As far as I can see, all the scripts take the form:

 [PUSH 32 bytes] [OP_NOP4] [OP_DROP] [OP_1]

(I didn't think DROP/1 is necessary here? Doesn't leaving the 32 byte
hash on the stack evaluate as true? I guess that means everyone's using
sapio to construct the txs?)

I don't think there's any demos of jamesob's simple-ctv-vault [0], which
I think uses a p2wsh of "IF n CSV DROP hotkey CHECKSIG ELSE lockcoldtx CTV
ENDIF", rather than taproot branches.

[0] https://github.com/jamesob/simple-ctv-vault

Likewise I don't think there's any examples of "this CTV immediately;
or if fees are too high, this other CTV that pays more fees after X
days", though potentially they could be hidden in the untaken taproot
merkle branches.

I don't think there's any examples of two CTV outputs being combined
and spent in a single transaction.

I don't see any txs with nSequence set meaningfully; though most (all?)
of the CTV spends seem to set nSequence to 0x00400000 which I think
doesn't have a different effect from 0xfffffffe?

That looks to me like there's still not much practical (vs theoretical)
exploration of CTV going on; but perhaps it's an indication that CTV
could be substantially simplified and still get all the benefits that
people are particularly eager for.

I am unsure that "learning in public" is required --

For a consensus system, part of the learning is "this doesn't seem that
interesting to me; is it actually valuable enough to others that the
change is worth the risk it imposes on me?" and that's not something
you can do purely in private.

One challenge with building a soft fork is that people don't want to
commit to spending time building something that relies on consensus
features and run the risk that they might never get deployed. But the
reverse of that is also a concern: you don't want to deploy consensus
changes and run the risk that they won't actually turn out to be useful.

Or, perhaps, to "meme-ify" it -- part of the "proof of work" for deploying
a consensus change is actually proving that it's going to be useful.
Like sha256 hashing, that does require real work, and it might turn out
to be wasteful.

Cheers,