[bitcoin-dev] PubRef - Script OP Code For Public Data References

public inbox for bitcoindev@googlegroups.com
 help / color / mirror / Atom feed

* [bitcoin-dev] PubRef - Script OP Code For Public Data References
@ 2019-07-19  6:05 Mike Brooks
  2019-07-19 15:17 ` William Casarin
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Mike Brooks @ 2019-07-19  6:05 UTC (permalink / raw)
  To: bitcoin-dev

[-- Attachment #1: Type: text/plain, Size: 4739 bytes --]

I noticed that this Merkle tree we are building is made more expensive by
including repetitive data.  So, this proposal draws inspiration from LZ78,
an algorithm which constructs a dictionary of repetitive strings which are
then referred to by their index. What if the blockchain already built a
dictionary for us, and now we just need to index it?

---

Abstract

This BIP describes how a new OP code can be used to construct smaller, more
compact transactions.  With a public reference (PubRef), a newly created
transaction can reuse elements of a previously confirmed transaction by
representing this information with a smaller numeric offset or “pointer”.

Motivation

Giving scripts the ability to refer to data on the blockchain will reduce
transaction sizes because key material does not have to be repeated in
every Script. Users of the network are rewarded with smaller transaction
sizes, and miners are able to fit more transactions into new blocks.
Pointers are a common feature and it felt like this was missing from
Bitcoin Script.

Specification

This BIP defines a new Script opcode that can be introduced with BIP-0141
(Segregated Witness (Consensus layer)). This new opcode is as follows:

Word

Opcode

Hex

Input

Output

Description

OP_PUBREF4

TBD

TBD

uint4

data

The next four bytes is an integer reference to a previously defined
PUSHDATA

Let there be an ordered list of all invocations of PUSHDATA (OP codes;
0x4c,0x4d,0x4e) across all scripts starting from the genesis block:
[t0,t2,...,tn].   With this list a newly created script can refer to a
specific PUSHDATA that was used in any previously confirmed block. The
values referenced by this list are immutable and uniform to all observers.

Lets assume there is some transaction containing a ScriptSig and
outputScript pair, here is what an input and an output script would look
like:

ScriptSig

PUSHDATA(72)[30450221008f906b9fe728cb17c81deccd6704f664ed1ac920223bb2eca918f066269c703302203b1c496fd4c3fa5071262b98447fbca5e3ed7a52efe3da26aa58f738bd342d3101]
PUSHDATA
(65)[04bca69c59dc7a6d8ef4d3043bdcb626e9e29837b9beb143168938ae8165848bfc788d6ff4cdf1ef843e6a9ccda988b323d12a367dd758261dd27a63f18f56ce77]

outputScript

DUP HASH160 PUSHDATA(20)[dd6cce9f255a8cc17bda8ba0373df8e861cb866e]
EQUALVERIFY CHECKSIG

The ScriptSig of an input typically contains the full public key which is
~65 bytes. Outputs will typically contain a hash of the public key which is
20 bytes.  Using PubRef, the public-key material shown above no longer need
to be repeated, instead the needed values can be resolved from previously
confirmed transaction.   The signature of the input must still be unique,
however, the public key can be replaced with a call to PUBREF, as shown
below:

ScriptSig

PUSHDATA(72)[30450221008f906b9fe728cb17c81deccd6704f664ed1ac920223bb2eca918f066269c703302203b1c496fd4c3fa5071262b98447fbca5e3ed7a52efe3da26aa58f738bd342d3101]
PUBREF[43123]

outputScript

DUP HASH160 PUBREF[12123] EQUALVERIFY CHECKSIG

The above call to PUBREF removed the need to include the full public-key,
or hash of the public key in a given transaction.  This is only possible
because these values where used previously in a confirmed block. If for
example a user was sending Bitcoins to a newly formed address, then no
PUBREF can be created in this case - and a outputScript using PUSHDATA
would need to be used at least once.  If a newly created wallet has never
been used on the Bitcoin network, then the full public-key will need to be
included in the ScriptSig. Once these transactions have been confirmed,
then these values will be indexed and any future script can refer to these
public-key values with a PUBREF operation.

PubRef is not susceptible to malleability attacks because the blockchain is
immutable. The PUSHDATA operation can store at most 520 bytes on the stack,
therefore a single PUBREF operation can never obtain more than 520 bytes.

In order for a client to make use of the PUBREF operations they’ll need
access to a database that look up public-keys and resolve their PUBREF
index.  A value can be resolved to an index with a hash-table lookup in
O(1) constant time. Additionally, all instances of PUSHDATA can be indexed
as an ordered list, resolution of a PUBREF index to the intended value
would be an O(1) array lookup.  Although the data needed to build and
resolve public references is already included with every full node,
additional computational effort is needed to build and maintain these
indices - a tradeoff which provides smaller transaction sizes and relieving
the need to store repetitive data on the blockchain.

[-- Attachment #2: Type: text/html, Size: 18581 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [bitcoin-dev] PubRef - Script OP Code For Public Data References
  2019-07-19  6:05 [bitcoin-dev] PubRef - Script OP Code For Public Data References Mike Brooks
@ 2019-07-19 15:17 ` William Casarin
  2019-07-19 19:17   ` Pieter Wuille
  2019-07-19 17:45 ` Yuval Kogman
  2019-07-19 18:07 ` ZmnSCPxj
  2 siblings, 1 reply; 13+ messages in thread
From: William Casarin @ 2019-07-19 15:17 UTC (permalink / raw)
  To: Mike Brooks, bitcoin-dev


Hello Mike,

Mike Brooks via bitcoin-dev <bitcoin-dev@lists•linuxfoundation.org>
writes:

> Motivation
>
> Giving scripts the ability to refer to data on the blockchain will reduce
> transaction sizes because key material does not have to be repeated in
> every Script. Users of the network are rewarded with smaller transaction
> sizes, and miners are able to fit more transactions into new blocks.
> Pointers are a common feature and it felt like this was missing from
> Bitcoin Script.

This would incentivize address re-use which would be bad for
fungibility. It appears you're trying to optimize a use case which is
already discouraged :(

Cheers,
Will

-- 
https://jb55.com


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [bitcoin-dev] PubRef - Script OP Code For Public Data References
  2019-07-19  6:05 [bitcoin-dev] PubRef - Script OP Code For Public Data References Mike Brooks
  2019-07-19 15:17 ` William Casarin
@ 2019-07-19 17:45 ` Yuval Kogman
       [not found]   ` <CALFqKjTHT7XaREK7ewwKKBjrZcty7ueNBMtSLEW7B-o9uwXgmw@mail.gmail.com>
  2019-07-19 18:07 ` ZmnSCPxj
  2 siblings, 1 reply; 13+ messages in thread
From: Yuval Kogman @ 2019-07-19 17:45 UTC (permalink / raw)
  To: Mike Brooks, Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 1720 bytes --]

Hi,

On Fri, 19 Jul 2019 at 14:00, Mike Brooks via bitcoin-dev <
bitcoin-dev@lists•linuxfoundation.org> wrote:

Giving scripts the ability to refer to data on the blockchain will reduce
> transaction sizes because key material does not have to be repeated in
> every Script.
>

Given that address reuse is discouraged, and as far as I know is
predominantly utilized for customer deposit addresses by exchanges, many of
which have not invested resources in batching withdrawals or consolidating
small UTXOs, I am skeptical that such a feature would actually be utilized
by users for whom a potential use exists, especially as mining fees are
usually pushed onto customers anyway.

Unless extensively utilized such that costs outweigh benefits, this change
would impose an externality on validating nodes:

With this list a newly created script can refer to a specific PUSHDATA that
> was used in any previously confirmed block.
>

This would make pruning impossible, and also relaxes the bounds on
validation costs since it would allow random reads on all historical data
as opposed to just the UTXO set.

Although it would do nothing for block capacity, perhaps this redundancy
might be better addressed as opt-in functionality in the p2p layer? It
might help with IBD, though at least in my experience peer latency (as
opposed to throughput) is the limiting factor, and as far as I can tell
this would increase it. Somewhat relatedly, transaction IDs are another
type of pointer which might benefit from being encoded as a (block height,
offset). However, here too it seems to me like the complexity is
substantial, potentially introducing new DoS vectors, while saving several
bytes per input at most.

Regards,
Yuval

[-- Attachment #2: Type: text/html, Size: 3061 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [bitcoin-dev] PubRef - Script OP Code For Public Data References
  2019-07-19  6:05 [bitcoin-dev] PubRef - Script OP Code For Public Data References Mike Brooks
  2019-07-19 15:17 ` William Casarin
  2019-07-19 17:45 ` Yuval Kogman
@ 2019-07-19 18:07 ` ZmnSCPxj
  2019-07-27 20:03   ` Mike Brooks
  2 siblings, 1 reply; 13+ messages in thread
From: ZmnSCPxj @ 2019-07-19 18:07 UTC (permalink / raw)
  To: Mike Brooks, Bitcoin Protocol Discussion

Good morning Mike,

> PubRef is not susceptible to malleability attacks because the blockchain is immutable.

This is not quite accurate.
While very old blocks are indeed immutable-in-practice, chain tips are not, and are often replaced.
At least specify that such data can only be referred to if buried under 100 blocks.

--

There are a number of other issues:

* It strongly encourages pubkey reuse, reducing privacy.
* There is a design-decision wherein a SCRIPT can only access data in the transaction that triggers its execution.
  In particular, it cannot access data in the block the transaction is in, or in past blocks.
  For example, `OP_CHECKLOCKTIMEVERIFY` does not check the blockheight of the block that the transaction is confirmed in, but instead checks only `nLockTime`, a field in the transaction.
  * This lets us run SCRIPT in isolation on a transaction, exactly one time, when the transaction is about to be put into our mempool.
    When a new block arrives, transactions in our mempool that are in the block do not need to have their SCRIPTs re-executed or re-validated.

> In order for a client to make use of the PUBREF operations they’ll need access to a database that look up public-keys and resolve their PUBREF index.  A value can be resolved to an index with a hash-table lookup in O(1) constant time. Additionally, all instances of PUSHDATA can be indexed as an ordered list, resolution of a PUBREF index to the intended value would be an O(1) array lookup.  Although the data needed to build and resolve public references is already included with every full node, additional computational effort is needed to build and maintain these indices - a tradeoff which provides smaller transaction sizes and relieving the need to store repetitive data on the blockchain.

This is not only necessary at the creator of the transaction --- it is also necessary at every validator.

In particular, consider existing pruning nodes, which cannot refer to previous block data.

We would need to have another new database containing every `PUSHDATA` in existence.
And existing pruning nodes would need to restart from genesis, as this database would not exist yet.

Regards,
ZmnSCPxj


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [bitcoin-dev] PubRef - Script OP Code For Public Data References
  2019-07-19 15:17 ` William Casarin
@ 2019-07-19 19:17   ` Pieter Wuille
  0 siblings, 0 replies; 13+ messages in thread
From: Pieter Wuille @ 2019-07-19 19:17 UTC (permalink / raw)
  To: William Casarin, Bitcoin Dev

[-- Attachment #1: Type: text/plain, Size: 1026 bytes --]

On Fri, Jul 19, 2019, 12:13 William Casarin via bitcoin-dev <
bitcoin-dev@lists•linuxfoundation.org> wrote:

>
> Hello Mike,
>
> Mike Brooks via bitcoin-dev <bitcoin-dev@lists•linuxfoundation.org>
> writes:
>
> > Motivation
> >
> > Giving scripts the ability to refer to data on the blockchain will reduce
> > transaction sizes because key material does not have to be repeated in
> > every Script. Users of the network are rewarded with smaller transaction
> > sizes, and miners are able to fit more transactions into new blocks.
> > Pointers are a common feature and it felt like this was missing from
> > Bitcoin Script.
>
> This would incentivize address re-use which would be bad for
> fungibility. It appears you're trying to optimize a use case which is
> already discouraged :(
>

Furthermore, right now block validation does not require access to the
whole historical chain (only to the set of unspent outputs), so a change
like this would massively increase storage requirements for validation.

Cheers,

-- 
Pieter

[-- Attachment #2: Type: text/html, Size: 1849 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [bitcoin-dev] PubRef - Script OP Code For Public Data References
       [not found]   ` <CALFqKjTHT7XaREK7ewwKKBjrZcty7ueNBMtSLEW7B-o9uwXgmw@mail.gmail.com>
@ 2019-07-19 22:48     ` Yuval Kogman
  2019-07-24 19:49       ` Mike Brooks
  0 siblings, 1 reply; 13+ messages in thread
From: Yuval Kogman @ 2019-07-19 22:48 UTC (permalink / raw)
  To: Mike Brooks, bitcoin-dev

[-- Attachment #1: Type: text/plain, Size: 2649 bytes --]

Hi,

On Fri, 19 Jul 2019 at 17:49, Mike Brooks <m@ib•tc> wrote:

> Users will adopt whatever the client accepts - this feature would be
> transparent.
>

My skepticism was based in an assumption on my part that most such data is
produced by actors with a track record of neglecting transaction
efficiency. I'd be happy to be proven wrong, but considering say usage
rates of native witness outputs, or the fraction of transactions with more
than 2 outputs so far I see little evidence in support of widespread
adoption of cost saving measures. Assuming this is intended as a new script
version, all fully validating nodes need to commit to keeping all data
indefinitely before they can enforce the rules that make transactions
benefiting from this opcode safe to broadcast.

That said, if successful, the main concern is still that of address reuse -
currently there is no incentive in the protocol to do that, and with BIP32
wallets fewer reasons to do it as well, but this proposal introduces such
an incentive for users which would otherwise generate new addresses
(disregarding the ones that already reuse addresses anyway), and this is
problematic for privacy and fungibility.

Since address reuse has privacy concerns, I think it's important to draw a
distinction between clients accepting and producing such transactions, if
the latter were done transparently that would be very concerning IMO, and
even the former would not be transparent for users who run currently
pruning nodes.

I'm not sure how an O(1) time complexity leads to DoS, that seems like a
> very creative jump.
>

For what it's worth, that was in reference to hypothetical deduplication
only at the p2p layer, similar to compact blocks, but on further reflection
I'd like to retract that, as since both scenarios which I had in mind seem
easily mitigated.

  Based on this response, it makes me want to calculate the savings, what
> if it was a staggering reduction in the tree size?
>

Assuming past address reuse rates are predictive this only places an upper
bound on the potential size savings, so personally I would not find that
very compelling. Even if the realized size savings would be substantial,
I'm not convinced the benefits outweigh the downsides (increased validation
costs, monotonically growing unprunable data, and direct incentive for
address reuse), especially when compared to other methods/proposals that
can reduce on chain footprint generally improve privacy while reducing
validation costs for others (e.g. batching, lightning, MAST for
sufficiently large scripts, Schnorr signatures (musig, adaptor signatures),
{tap,graft}root, ).

Regards,
Yuval

[-- Attachment #2: Type: text/html, Size: 3789 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [bitcoin-dev] PubRef - Script OP Code For Public Data References
  2019-07-19 22:48     ` Yuval Kogman
@ 2019-07-24 19:49       ` Mike Brooks
  0 siblings, 0 replies; 13+ messages in thread
From: Mike Brooks @ 2019-07-24 19:49 UTC (permalink / raw)
  To: Yuval Kogman; +Cc: bitcoin-dev

[-- Attachment #1: Type: text/plain, Size: 4758 bytes --]

Fungibility is a good question for inductive reasoning.  After all, what is
a claim without some rigger?

There exists some set of wallets 'w' which contain a balance of satoshi too
small to recover because it doesn't meet the minimum transaction fee
required to confirm the transaction.  These wallets are un-spendable by
their very nature - and quantifying un-spendable wallets is one way to
measure the fungibility of Bitcoin.  The number of un-spendable wallets can
be quantified as follows:

   Let 'p' equal the current price/per-bit for a transaction
   Let 'n' equal the number of bits in the transaction
   Let '[a]' be the set of all wallets with a balance
   Let '[w]' be the set of un-spendable wallets
   [w0] = [a] > b*n

Now, let's introduce a savings measure 'k' which is a constant flat savings
per transaction.

   [w1] = [a] > b*(n - k0)

As the savings 'k' increase, the set of 'w' also increases in size.
   len([w0]) < len([w1]) ... < len([wk])

In this case 'k0' could be the savings from a P2PKH transaction to a 233
byte SegWit transaction, and 'k1' could be 148 byte SegWit+PubRef
transaction, and the same model can be used for some future transaction
type that is even smaller.   As the savings 'k' approaches infinity the set
of un-spendable wallets approaches zero.

If a user is privacy-aware, and chooses to use single-use p2sh for all
transactions, then these users can still gain from PubRef because
block-weight should be decreased for everyone. There are cases where a user
or merchant might want to reuse their p2sh hash - and PubRef can provide
savings.  Additionally, the resulting p2sh script could be a multi-sig
transaction which could still benefit from PubRef compression.  PubRef
improves fungibility of Bitcoin in two different ways - both reduced cost
of transaction and increasing the max number of transactions that are able
to be confirmed by a block. Which is pretty hot - QED.

On Fri, Jul 19, 2019 at 3:48 PM Yuval Kogman <nothingmuch@woobling•org>
wrote:

> Hi,
>
> On Fri, 19 Jul 2019 at 17:49, Mike Brooks <m@ib•tc> wrote:
>
>> Users will adopt whatever the client accepts - this feature would be
>> transparent.
>>
>
> My skepticism was based in an assumption on my part that most such data is
> produced by actors with a track record of neglecting transaction
> efficiency. I'd be happy to be proven wrong, but considering say usage
> rates of native witness outputs, or the fraction of transactions with more
> than 2 outputs so far I see little evidence in support of widespread
> adoption of cost saving measures. Assuming this is intended as a new script
> version, all fully validating nodes need to commit to keeping all data
> indefinitely before they can enforce the rules that make transactions
> benefiting from this opcode safe to broadcast.
>
> That said, if successful, the main concern is still that of address reuse
> - currently there is no incentive in the protocol to do that, and with
> BIP32 wallets fewer reasons to do it as well, but this proposal introduces
> such an incentive for users which would otherwise generate new addresses
> (disregarding the ones that already reuse addresses anyway), and this is
> problematic for privacy and fungibility.
>
> Since address reuse has privacy concerns, I think it's important to draw a
> distinction between clients accepting and producing such transactions, if
> the latter were done transparently that would be very concerning IMO, and
> even the former would not be transparent for users who run currently
> pruning nodes.
>
> I'm not sure how an O(1) time complexity leads to DoS, that seems like a
>> very creative jump.
>>
>
> For what it's worth, that was in reference to hypothetical deduplication
> only at the p2p layer, similar to compact blocks, but on further reflection
> I'd like to retract that, as since both scenarios which I had in mind seem
> easily mitigated.
>
>   Based on this response, it makes me want to calculate the savings, what
>> if it was a staggering reduction in the tree size?
>>
>
> Assuming past address reuse rates are predictive this only places an upper
> bound on the potential size savings, so personally I would not find that
> very compelling. Even if the realized size savings would be substantial,
> I'm not convinced the benefits outweigh the downsides (increased validation
> costs, monotonically growing unprunable data, and direct incentive for
> address reuse), especially when compared to other methods/proposals that
> can reduce on chain footprint generally improve privacy while reducing
> validation costs for others (e.g. batching, lightning, MAST for
> sufficiently large scripts, Schnorr signatures (musig, adaptor signatures),
> {tap,graft}root, ).
>
> Regards,
> Yuval
>

[-- Attachment #2: Type: text/html, Size: 6471 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [bitcoin-dev] PubRef - Script OP Code For Public Data References
  2019-07-19 18:07 ` ZmnSCPxj
@ 2019-07-27 20:03   ` Mike Brooks
  2019-07-29  1:46     ` ZmnSCPxj
  0 siblings, 1 reply; 13+ messages in thread
From: Mike Brooks @ 2019-07-27 20:03 UTC (permalink / raw)
  To: ZmnSCPxj, pieter.wuille; +Cc: Bitcoin Protocol Discussion

[-- Attachment #1: Type: text/plain, Size: 4765 bytes --]

Hey ZmnSCPxj,

As to your first point.  I wasn't aware there was so much volatility at the
tip, also 100 blocks is quite the difference!  I agree no one could
references a transaction in a newly formed blocks, but I'm curious how this
number was chosen. Do you have any documentation or code that you can share
related to how re-orgs are handled? Do we have a kind of 'consensus
checkpoint' when a re-org is no longer possible? This is a very interesting
topic.

 > * It strongly encourages pubkey reuse, reducing privacy.
Privacy-aware users are free to have single-use p2sh transactions, and they
are free to use the same SCRIPT opcodes we have now.  Adding an extra
opcode helps with the utility of SCRIPT by compressing the smallest SegWit
transactions by a further 40% from 233 bytes to 148 bytes.  Cost savings is
a great utility - and it need not undermine anyones privacy. The resulting
p2sh SCRIPT could end up using public key material that could be compressed
with a PubRef - everyone wins.

 > * There is a design-decision wherein a SCRIPT can only access data in
the transaction that triggers its execution.
In order for a compression algorithm like LZ78 to be written in a
stack-based language like SCRIPT, there needs to be pointer arithmetic to
refer back to the dictionary or a larger application state.  If Bitcoin's
entire stack was made available to the SCRIPT language as an application
state, then LZ78-like compression could be accomplished using PubRef. If a
Script can reuse a PUSHDATA, then transactions will be less repetitious...
and this isn't free.  There is a cost in supporting this opcode.

Giving the SCRIPT language access to more data opens the door for
interesting algorithms, not just LZ78.  This is interesting to discuss how
this application state could be brought to the language.  It strikes me
that your concerns(ZmnSCPxj), as well as the topic of pruning brought up by
others (including Pieter Wuille) could be fixed by the creation of a
side-chain of indexes.  A validator would not need a hash table which is
only needed for O(1) PUBREF creation, these nodes need not be burdened with
this added index.  A validator only needs an array of PUSHDATA elements and
can then validate any given SCRIPT at O(1).

Just a thought.

Best Regards,
Mike


On Fri, Jul 19, 2019 at 11:08 AM ZmnSCPxj <ZmnSCPxj@protonmail•com> wrote:

> Good morning Mike,
>
> > PubRef is not susceptible to malleability attacks because the blockchain
> is immutable.
>
> This is not quite accurate.
> While very old blocks are indeed immutable-in-practice, chain tips are
> not, and are often replaced.
> At least specify that such data can only be referred to if buried under
> 100 blocks.
>
> --
>
> There are a number of other issues:
>
> * It strongly encourages pubkey reuse, reducing privacy.
> * There is a design-decision wherein a SCRIPT can only access data in the
> transaction that triggers its execution.
>   In particular, it cannot access data in the block the transaction is in,
> or in past blocks.
>   For example, `OP_CHECKLOCKTIMEVERIFY` does not check the blockheight of
> the block that the transaction is confirmed in, but instead checks only
> `nLockTime`, a field in the transaction.
>   * This lets us run SCRIPT in isolation on a transaction, exactly one
> time, when the transaction is about to be put into our mempool.
>     When a new block arrives, transactions in our mempool that are in the
> block do not need to have their SCRIPTs re-executed or re-validated.
>
> > In order for a client to make use of the PUBREF operations they’ll need
> access to a database that look up public-keys and resolve their PUBREF
> index.  A value can be resolved to an index with a hash-table lookup in
> O(1) constant time. Additionally, all instances of PUSHDATA can be indexed
> as an ordered list, resolution of a PUBREF index to the intended value
> would be an O(1) array lookup.  Although the data needed to build and
> resolve public references is already included with every full node,
> additional computational effort is needed to build and maintain these
> indices - a tradeoff which provides smaller transaction sizes and relieving
> the need to store repetitive data on the blockchain.
>
> This is not only necessary at the creator of the transaction --- it is
> also necessary at every validator.
>
> In particular, consider existing pruning nodes, which cannot refer to
> previous block data.
>
> We would need to have another new database containing every `PUSHDATA` in
> existence.
> And existing pruning nodes would need to restart from genesis, as this
> database would not exist yet.
>
> Regards,
> ZmnSCPxj
>

[-- Attachment #2: Type: text/html, Size: 5264 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [bitcoin-dev] PubRef - Script OP Code For Public Data References
  2019-07-27 20:03   ` Mike Brooks
@ 2019-07-29  1:46     ` ZmnSCPxj
  2019-07-29  2:19       ` Mike Brooks
  0 siblings, 1 reply; 13+ messages in thread
From: ZmnSCPxj @ 2019-07-29  1:46 UTC (permalink / raw)
  To: Mike Brooks; +Cc: Bitcoin Protocol Discussion, pieter.wuille

Good morning Mike,

Sent with ProtonMail Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Sunday, July 28, 2019 4:03 AM, Mike Brooks <m@ib•tc> wrote:

> Hey ZmnSCPxj,
>
> As to your first point.  I wasn't aware there was so much volatility at the tip, also 100 blocks is quite the difference!  I agree no one could references a transaction in a newly formed blocks, but I'm curious how this number was chosen. Do you have any documentation or code that you can share related to how re-orgs are handled? Do we have a kind of 'consensus checkpoint' when a re-org is no longer possible? This is a very interesting topic.
>

Miner coinbases need 100 blocks for maturity, which is the basis of my suggestion to use 100 blocks.
It might be too high, but I doubt there will be good reason to be less than 100.

There is a potential for a targeted attack where a large payout going to a `scriptPubKey` that uses `OP_PUBREF` on a recently-confirmed transaction finds that recently-confirmed transaction is replaced with one that pays to a different public key, via a history-rewrite attack.
Such an attack is doable by miners, and if we consider that we accept 100 blocks for miner coinbase maturity as "acceptably low risk" against miner shenanigans, then we might consider that 100 blocks might be acceptable for this also.
Whether 100 is too high or not largely depends on your risk appetite.

>  A validator only needs an array of PUSHDATA elements and can then validate any given SCRIPT at O(1).  

Data derived from > 220Gb of perpetually-growing blockchain is hardly, to my mind, "only needs an array".
Such an array would not fit in memory for many devices that today are practical for running fullnodes.
It is keeping that array and indexing it which is the problem, i.e. the devil in the details.

Reiterating also, current pruned nodes did not retain that data and would be forced to re-download the entire blockchain.
Unless you propose that we can refer only to `OP_PUSHDATA` after activation of `OP_PUSHREF`.

Regards,
ZmnSCPxj

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [bitcoin-dev] PubRef - Script OP Code For Public Data References
  2019-07-29  1:46     ` ZmnSCPxj
@ 2019-07-29  2:19       ` Mike Brooks
  2019-07-29  2:49         ` ZmnSCPxj
  0 siblings, 1 reply; 13+ messages in thread
From: Mike Brooks @ 2019-07-29  2:19 UTC (permalink / raw)
  To: ZmnSCPxj; +Cc: Bitcoin Protocol Discussion, pieter.wuille

[-- Attachment #1: Type: text/plain, Size: 4072 bytes --]

ZmnSCPxj,

 I think that this implication affects other applications built on the
blockchain, not just the PubRef proposal:

 > There is a potential for a targeted attack where a large payout going to
a `scriptPubKey` that uses `OP_PUBREF` on a recently-confirmed transaction
finds that recently-confirmed transaction is replaced with one that pays to
a different public key, via a history-rewrite attack.
 > Such an attack is doable by miners, and if we consider that we accept
100 blocks for miner coinbase maturity as "acceptably low risk" against
miner shenanigans, then we might consider that 100 blocks might be
acceptable for this also.
 > Whether 100 is too high or not largely depends on your risk appetite.

I agree 100% this attack is unexpected and very interesting.  However, I
find the arbitrary '100' to be unsatisfying - I'll have to do some more
digging. It would be interesting to trigger this on the testnet to see what
happens.  Do you know if anyone has pushed these limits?  I am so taken by
this attack I might attempt it.

 > Data derived from > 220Gb of perpetually-growing blockchain is hardly,
to my mind, "only needs an array".

There are other open source projects that have to deal with larger data
sets and have accounted for the real-world limits on computability. Apache
HTTPD's Bucket-Brigade comes to mind, which has been well tested and can
account for limited RAM when accessing linear data structures. For a more
general purpose utility leveldb (bsd-license) provides random access to
arbitrary data collections.  Pruning can also be a real asset for PubRef.
If all transactions for a wallet have been pruned, then there is no need to
index this PubRef - a validator can safely skip over it.

Best Regards,
Mike

On Sun, Jul 28, 2019 at 6:46 PM ZmnSCPxj <ZmnSCPxj@protonmail•com> wrote:

> Good morning Mike,
>
>
> Sent with ProtonMail Secure Email.
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Sunday, July 28, 2019 4:03 AM, Mike Brooks <m@ib•tc> wrote:
>
> > Hey ZmnSCPxj,
> >
> > As to your first point.  I wasn't aware there was so much volatility at
> the tip, also 100 blocks is quite the difference!  I agree no one could
> references a transaction in a newly formed blocks, but I'm curious how this
> number was chosen. Do you have any documentation or code that you can share
> related to how re-orgs are handled? Do we have a kind of 'consensus
> checkpoint' when a re-org is no longer possible? This is a very interesting
> topic.
> >
>
> Miner coinbases need 100 blocks for maturity, which is the basis of my
> suggestion to use 100 blocks.
> It might be too high, but I doubt there will be good reason to be less
> than 100.
>
> There is a potential for a targeted attack where a large payout going to a
> `scriptPubKey` that uses `OP_PUBREF` on a recently-confirmed transaction
> finds that recently-confirmed transaction is replaced with one that pays to
> a different public key, via a history-rewrite attack.
> Such an attack is doable by miners, and if we consider that we accept 100
> blocks for miner coinbase maturity as "acceptably low risk" against miner
> shenanigans, then we might consider that 100 blocks might be acceptable for
> this also.
> Whether 100 is too high or not largely depends on your risk appetite.
>
> >  A validator only needs an array of PUSHDATA elements and can then
> validate any given SCRIPT at O(1).
>
> Data derived from > 220Gb of perpetually-growing blockchain is hardly, to
> my mind, "only needs an array".
> Such an array would not fit in memory for many devices that today are
> practical for running fullnodes.
> It is keeping that array and indexing it which is the problem, i.e. the
> devil in the details.
>
> Reiterating also, current pruned nodes did not retain that data and would
> be forced to re-download the entire blockchain.
> Unless you propose that we can refer only to `OP_PUSHDATA` after
> activation of `OP_PUSHREF`.
>
> Regards,
> ZmnSCPxj
>

[-- Attachment #2: Type: text/html, Size: 4664 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [bitcoin-dev] PubRef - Script OP Code For Public Data References
  2019-07-29  2:19       ` Mike Brooks
@ 2019-07-29  2:49         ` ZmnSCPxj
  2019-07-29  3:07           ` Mike Brooks
  0 siblings, 1 reply; 13+ messages in thread
From: ZmnSCPxj @ 2019-07-29  2:49 UTC (permalink / raw)
  To: Mike Brooks; +Cc: Bitcoin Protocol Discussion, pieter.wuille

Good morning Mike,

>  I think that this implication affects other applications built on the blockchain, not just the PubRef proposal:
>

I believe not?
Current applications use txids to refer to previous transactions, so even a short-ranged history rewrite will mostly not affect them --- they can just rebroadcast the transactions they are spending and get those reconfirmed again.
There is admittedly a risk of double-spending, but each individual application can just spend deeply-confirmed transactions, and tune what it considers "deeply-confirmed" depending on how large the value being spent is.
The point is that history rewrites are costly, but if the value being put in a `scriptPubKey` that uses `OP_PUBREF` is large enough, it may justify the cost of history rewrites --- but if the value is small, the individual application (which refers to transactions by their txid anyway) can generally assume miners will not bother to history-rewrite.

Since `OP_PUBREF` would be a consensus rule, we need to select a "deeply-confirmed" point that is deep enough for *all* cases, unlike applications **on top of the blockchain** which can tune their rule of "deeply-confirmed" based on value.
Thus my suggestion to use 100, which we consider "deep enough" to risk allowing miners to sell their coins.

Lightning uses a "short channel ID" which is basically an index of block number + index of transaction + index of output to refer to channels.
This is not a problem, however, even in case of short-ranged history rewrites.
The short channel ID is only used for public routing.
Between the channel counterparties, no security is based on short channel ID being stable; it just loses you potential routing fees from the channel (and can be fixed by increasing your "deeply-confirmed" enough level before you announce the channel for public routing).

>  > There is a potential for a targeted attack where a large payout going to a `scriptPubKey` that uses `OP_PUBREF` on a recently-confirmed transaction finds that recently-confirmed transaction is replaced with one that pays to a different public key, via a history-rewrite attack.
>  > Such an attack is doable by miners, and if we consider that we accept 100 blocks for miner coinbase maturity as "acceptably low risk" against miner shenanigans, then we might consider that 100 blocks might be acceptable for this also.
>  > Whether 100 is too high or not largely depends on your risk appetite.
>
> I agree 100% this attack is unexpected and very interesting.

It is precisely because of this possibility that we tend to avoid making SCRIPT validity dependent on anything that is not in the transaction.
We would have to re-evaluate the SCRIPT every time there is a chain tip reorganization (increasing validation CPU load), unless we do something like "only allow `OP_PUBREF` to data that is more than 100 blocks confirmed".

>  However, I find the arbitrary '100' to be unsatisfying - I'll have to do some more digging. It would be interesting to trigger this on the testnet to see what happens.  Do you know if anyone has pushed these limits?  I am so taken by this attack I might attempt it.
>
>  > Data derived from > 220Gb of perpetually-growing blockchain is hardly, to my mind, "only needs an array".
>
> There are other open source projects that have to deal with larger data sets and have accounted for the real-world limits on computability. Apache HTTPD's Bucket-Brigade comes to mind, which has been well tested and can account for limited RAM when accessing linear data structures. For a more general purpose utility leveldb (bsd-license) provides random access to arbitrary data collections.

Which is the point: we need to use something, the details need to be considered during implementation, implementation details may leak in the effective spec (e.g. DER-encoding), etc.

>  Pruning can also be a real asset for PubRef.  If all transactions for a wallet have been pruned, then there is no need to index this PubRef - a validator can safely skip over it.

What?
The problem with transaction being pruned is that the data in them might now be used in a *future* `OP_PUBREF`.

Further, pruned nodes are still full validators --- transactions may be pruned, but the pruned node will ***still*** validate any `OP_PUBREF` it uses, because it is still a full validator, it just does not archive old blocks in local storage.

Regards,
ZmnSCPxj

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [bitcoin-dev] PubRef - Script OP Code For Public Data References
  2019-07-29  2:49         ` ZmnSCPxj
@ 2019-07-29  3:07           ` Mike Brooks
  2019-07-29  3:39             ` ZmnSCPxj
  0 siblings, 1 reply; 13+ messages in thread
From: Mike Brooks @ 2019-07-29  3:07 UTC (permalink / raw)
  To: ZmnSCPxj, bitcoin-dev

[-- Attachment #1: Type: text/plain, Size: 5712 bytes --]

ZmnSCPxj,

> Lightning uses a "short channel ID" which is basically an index of block
number + index of transaction + index of output to refer to channels.

Oh wow, this is very similar to the PUBREF proposal. In fact the OP_PUBREF4
operation could be modified to take the tuple: (block number, index of
transaction, index of PUSHDATA) and it would be functionally equivalent.
It looks like the construction of the short channel ID was chosen for the
performance needed to resolve the lookup.

  > The problem with transaction being pruned is that the data in them
might now be used in a *future* `OP_PUBREF`.

I can see how pruning is needed for scalability, and pruning can be made
compatible with a reference to a transaction. If a transaction is pruned,
then the key material used in a prune'ed block's PUSHDATA operations are of
no value.  A user of the network shouldn't need to make this kind of
PUBREF, and if a user did want to bring a wallet back from the dead - then
the utility of PUBREF wouldn't be available to them.

Best Regards,
Mike


On Sun, Jul 28, 2019 at 7:49 PM ZmnSCPxj <ZmnSCPxj@protonmail•com> wrote:

> Good morning Mike,
>
> >  I think that this implication affects other applications built on the
> blockchain, not just the PubRef proposal:
> >
>
> I believe not?
> Current applications use txids to refer to previous transactions, so even
> a short-ranged history rewrite will mostly not affect them --- they can
> just rebroadcast the transactions they are spending and get those
> reconfirmed again.
> There is admittedly a risk of double-spending, but each individual
> application can just spend deeply-confirmed transactions, and tune what it
> considers "deeply-confirmed" depending on how large the value being spent
> is.
> The point is that history rewrites are costly, but if the value being put
> in a `scriptPubKey` that uses `OP_PUBREF` is large enough, it may justify
> the cost of history rewrites --- but if the value is small, the individual
> application (which refers to transactions by their txid anyway) can
> generally assume miners will not bother to history-rewrite.
>
> Since `OP_PUBREF` would be a consensus rule, we need to select a
> "deeply-confirmed" point that is deep enough for *all* cases, unlike
> applications **on top of the blockchain** which can tune their rule of
> "deeply-confirmed" based on value.
> Thus my suggestion to use 100, which we consider "deep enough" to risk
> allowing miners to sell their coins.
>
> Lightning uses a "short channel ID" which is basically an index of block
> number + index of transaction + index of output to refer to channels.
> This is not a problem, however, even in case of short-ranged history
> rewrites.
> The short channel ID is only used for public routing.
> Between the channel counterparties, no security is based on short channel
> ID being stable; it just loses you potential routing fees from the channel
> (and can be fixed by increasing your "deeply-confirmed" enough level before
> you announce the channel for public routing).
>
> >  > There is a potential for a targeted attack where a large payout going
> to a `scriptPubKey` that uses `OP_PUBREF` on a recently-confirmed
> transaction finds that recently-confirmed transaction is replaced with one
> that pays to a different public key, via a history-rewrite attack.
> >  > Such an attack is doable by miners, and if we consider that we accept
> 100 blocks for miner coinbase maturity as "acceptably low risk" against
> miner shenanigans, then we might consider that 100 blocks might be
> acceptable for this also.
> >  > Whether 100 is too high or not largely depends on your risk appetite.
> >
> > I agree 100% this attack is unexpected and very interesting.
>
> It is precisely because of this possibility that we tend to avoid making
> SCRIPT validity dependent on anything that is not in the transaction.
> We would have to re-evaluate the SCRIPT every time there is a chain tip
> reorganization (increasing validation CPU load), unless we do something
> like "only allow `OP_PUBREF` to data that is more than 100 blocks
> confirmed".
>
> >  However, I find the arbitrary '100' to be unsatisfying - I'll have to
> do some more digging. It would be interesting to trigger this on the
> testnet to see what happens.  Do you know if anyone has pushed these
> limits?  I am so taken by this attack I might attempt it.
> >
> >  > Data derived from > 220Gb of perpetually-growing blockchain is
> hardly, to my mind, "only needs an array".
> >
> > There are other open source projects that have to deal with larger data
> sets and have accounted for the real-world limits on computability. Apache
> HTTPD's Bucket-Brigade comes to mind, which has been well tested and can
> account for limited RAM when accessing linear data structures. For a more
> general purpose utility leveldb (bsd-license) provides random access to
> arbitrary data collections.
>
> Which is the point: we need to use something, the details need to be
> considered during implementation, implementation details may leak in the
> effective spec (e.g. DER-encoding), etc.
>
> >  Pruning can also be a real asset for PubRef.  If all transactions for a
> wallet have been pruned, then there is no need to index this PubRef - a
> validator can safely skip over it.
>
> What?
> The problem with transaction being pruned is that the data in them might
> now be used in a *future* `OP_PUBREF`.
>
> Further, pruned nodes are still full validators --- transactions may be
> pruned, but the pruned node will ***still*** validate any `OP_PUBREF` it
> uses, because it is still a full validator, it just does not archive old
> blocks in local storage.
>
> Regards,
> ZmnSCPxj
>

[-- Attachment #2: Type: text/html, Size: 6417 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [bitcoin-dev] PubRef - Script OP Code For Public Data References
  2019-07-29  3:07           ` Mike Brooks
@ 2019-07-29  3:39             ` ZmnSCPxj
  0 siblings, 0 replies; 13+ messages in thread
From: ZmnSCPxj @ 2019-07-29  3:39 UTC (permalink / raw)
  To: Mike Brooks; +Cc: bitcoin-dev

Good morning Mike,

>   > The problem with transaction being pruned is that the data in them might now be used in a *future* `OP_PUBREF`.
>
> I can see how pruning is needed for scalability, and pruning can be made compatible with a reference to a transaction. If a transaction is pruned, then the key material used in a prune'ed block's PUSHDATA operations are of no value.  A user of the network shouldn't need to make this kind of PUBREF, and if a user did want to bring a wallet back from the dead - then the utility of PUBREF wouldn't be available to them.

I believe you misunderstand the point completely.

Currently Bitcoin Core has a mode, called pruning, where:

1.  It validates ***all*** transactions.
2.  It throws away the transaction data of a block ***after*** validation.
3.  It keeps only the UTXO set of ***all*** addresses, not just a specific wallet.

Given the above, if a transaction block 1000 `OP_PUBREF`s to a `OP_PUSHDATA` in block 100, how does the pruned validator get access to this data (Which was pruned after the pruned validator handled block 100)?

Note that pruned nodes ***are*** fullnodes and validate all transactions, not just those that involve their wallet.

Regards,
ZmnSCPxj

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-07-29  3:39 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-19  6:05 [bitcoin-dev] PubRef - Script OP Code For Public Data References Mike Brooks
2019-07-19 15:17 ` William Casarin
2019-07-19 19:17   ` Pieter Wuille
2019-07-19 17:45 ` Yuval Kogman
     [not found]   ` <CALFqKjTHT7XaREK7ewwKKBjrZcty7ueNBMtSLEW7B-o9uwXgmw@mail.gmail.com>
2019-07-19 22:48     ` Yuval Kogman
2019-07-24 19:49       ` Mike Brooks
2019-07-19 18:07 ` ZmnSCPxj
2019-07-27 20:03   ` Mike Brooks
2019-07-29  1:46     ` ZmnSCPxj
2019-07-29  2:19       ` Mike Brooks
2019-07-29  2:49         ` ZmnSCPxj
2019-07-29  3:07           ` Mike Brooks
2019-07-29  3:39             ` ZmnSCPxj

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox