Re: [bitcoindev] On (in)ability to embed data into Schnorr

public inbox for bitcoindev@googlegroups.com
 help / color / mirror / Atom feed

From: waxwing/ AdamISZ <ekaggata@gmail•com>
To: Bitcoin Development Mailing List <bitcoindev@googlegroups.com>
Subject: Re: [bitcoindev] On (in)ability to embed data into Schnorr
Date: Tue, 7 Oct 2025 05:05:24 -0700 (PDT)	[thread overview]
Message-ID: <e4d271ad-9ea3-41e5-96e2-6cb0118943e4n@googlegroups.com> (raw)
In-Reply-To: <aOTNvteE8PCm6yDd@erisian.com.au>

[-- Attachment #1.1: Type: text/plain, Size: 8898 bytes --]

Hi aj,

Interesting points! Answers inline.

On Tuesday, October 7, 2025 at 6:38:40 AM UTC-3 Anthony Towns wrote:

On Wed, Oct 01, 2025 at 07:24:50AM -0700, waxwing/ AdamISZ wrote: 
> I'm curious about the case of P, R, s published in utxos to prevent usage 
> of utxos as data. I think this answers in the half-affirmative: you can 
> only embed data by leaking the privkey so that it (can) immediately fall 
> out of the utxo set. 

I think you can attack the setup here. 

If you allow scriptPubKeys in the utxo set whose spending conditions 
are HTLC/atomic-swap-like: 

(pubkey A and preimage reveal of X) 
OR (pubkey B and block height > H) 

then you either set H to be arbitrarily far in the future and reveal 
B's privkey, or choose an NUMS X with no known preimage, and reveal 
A's privkey.

Yes. In the paper (and my OP email) I'm trying to narrow it down completely 
to a P, R, s structure. I guess if we try to be realistic about this 
"publish a signature in the output always" horrible scenario, it would have 
to just ditch the NUMS variant of taproot, and I agree, that is a very Bad 
Thing (TM). (uh sorry you discuss this in the next paragraph but, w/e).

Alternative examples like multisig or hash lock in script to get the data 
leakage without losing control of the output (necessarily) have been 
mentioned but I like your 2-branch setup as a good flexible example.

If you don't allow those things (eg, by requiring such constructions 
also have a (pubkey musig(A,B)) path) then I think you rule out NUMS-IPK 
constructions, and end up making things like vaults ("hotkey with delay, 
coldkey anytime") difficult to send to ("I have to sign with my cold 
key to request funds?"), or, depending on what the utxo R,s is signing, 
encourage key reuse. 

> (To emphasize, this is different to the earlier observations (including 
by 
> me!) that just say it is *possible* to leak data by leaking the private 
> key; here I'm trying to prove that there is *no other way*). 

That seems right to me. 

I think if the signature scheme supported pubkey recovery (ie, s*G = R + 
H(R,m)*P, and our "m" didn't commit to P as well), you could get around 
this by just having P be the data, with no one, including the "signer" 
able to recover the private key. 

Yes, basically. I discuss this in the paper w.r.t. ECDSA. Your description 
of the relevance of pubkey recovery is good, but there are some nuances. 
You can't quite (with ECDSA) get P to be the data and have a valid sig, but 
you can get 's' to be the data simply by backsolving for the private key x. 
Lack of "pubkey prefixing" in the very funky 'commitment to the nonce' in 
ECDSA causes that. And the second nuance, you did actually mention: you get 
"not leaking the key" for free, here. But it's still only a 32/96 bytes 
embedding rate though, the way I count it.

> However I still am probably in the large majority that thinks it's 
> appalling to imagine a sig attached to every pubkey onchain. 

I think the only thing achieved by embedding data in the utxo set (vs 
an OP_RETURN output or witness data) is to bloat the utxo set; and if 
that's the goal, it can equally easily be done with spendable outputs 
that the attacker simply chooses not to ever spend. So that doesn't seem 
like a terribly interesting solution to anything.

I think the logic of that is not quite right. Suppose I want to embed 
pictures into the unpruneable utxo set specifically (and not only 'in 
transactions'). The starting point here was me trying to write out how you 
can't embed data in known-privkey (Schnorr) P, R, s tuples.

And not only pictures; as Andrew pointed out above, there's always the 
concern of some kind of virus-y "naughty" data.

As far as embedding data in signatures goes, I think the following 
scheme would allow you to publish data in a cryptographically-secure way, 
with minimal lost funds: 

0) Setup secret keys p and q, and a 32-byte secret k. H(a,b,..) is sha256 
of a,b,.. concatenated. 

1) Split your data into N 31 byte blocks, a1, a2, .., aN. 

2) Calculate r0 as H(k*G). Calculate r1, .., rN as: 

r(i+1) = H(p, r(i)) + a(i) 

3) Sign N+1 transactions in a chain spending pubkey p*G, using rN, r(N-1), 
.., r1, r0 as nonces. All but the final tx should pay to a p*G output to 
continue the chain; the final output should pay to q*G instead. 

4) Once all transactions are sufficiently confirmed, spend the final 
output with k as the secret nonce (and hence R=k*G as the public 
nonce). 

Recover the data using the following process: 

1) From the final transaction, recover R=k*G, and calculate r0 as H(R). 
Recover p from the previous transaction, p = (s0-r0)/H(r0*G, P,mi). 

2) Recover ri from each signature; ri = si - H(Ri, P, mi)*p. Recover 
the data ai as ai = ri - H(p,r(i-1)). 

Dealing with the points being 32-bytes might require carrying over a 
sign-bit; but that should be possible in the spare ~7 bits since each 
block was only 31 bytes not 32 bytes. Left as an exercise for the 
reader, etc. 

I believe that the privkey p is secure prior to k*G being revealed, 
since all the nonces are distinct hashes seeded by that privkey; and q 
remains secure because k is never revealed. 

If you wanted to not reuse the pubkey p*G repeatedly, you could tweak it 
to be p0 = p, p(i+1) = p + H(k*G, p(i)), or similar. That would allow you 
to use an n-of-n multisig to get multiple blocks in a single transaction 
without seeming weird, eg. 

I believe the only way to distinguish this from a normal transaction 
pattern where a wallet has a change output, is via the final transaction 
that reveals k*G, and detecting the relationship between k*G and the 
spending conditions of the transaction that created the coin being spent. 
That's already somewhat expensive to check for every spend, but could 
be made more so by publishing k*G on some other medium (ie the data is 
in the blockchain, but you obtain the txid and key to find the data 
from elsewhere), or by revealing (k+x)*G where x is a random 20-bit 
(?) number, and a significant but tractable amount of grinding is needed 
to recover the desired k*G and decode the data -- the idea being that 
that is tractable for someone who knows there is data at that txid, 
but not tractable when performed on every signature in the blockchain 
in order to filter data publication. 

I think if you did 20 such transactions per block, each spending a single 
20-of-20 tapscript multisig, you'd get 12400 bytes of data per block 
(without violating standardness constraints), at a cost of ~11800vb, so 
much less efficient than inscriptions, but slightly more efficient than 
OP_RETURN, and significantly less detectable than either. I think Knots 
default policy currently allows up to 50-of-50 multisig in tapscript, 
which would give you 31kB of data in ~26.6kvB of tx weight in a block. 

If you're regularly making payments from a particular wallet, I think 
that procedure would allow you to encode data in your change outputs at 
the rate of 32B/tx for no additional cost. Though the data would only be 
recoverable once complete, and it's probably worth noting that I haven't 
provided any security proofs...

Very nice example. I am glad you took the trouble to write it out, because 
I agree that examples like that are worth working through because as you 
say they lean closer to being properly indistinguishable from ordinary 
transaction patterns.

My analysis was narrower: output-side embedding (in a theoretical future of 
P,R,s outputs). But that's a little confusing because (P, R, s) is still 
there whether some of it is put in witness or not. So everyone seems to 
agree that privkey reveal is necessary for that, but everyone is also 
pointing out that with Bitcoin's actual consensus scripting system, that 
doesn't quite mean what it seems! And the embedding rate is not very good. 
In this framing, not much has changed in your "chained" example: once the 
privkey p is revealed, you get the k value per chain link, so it's still 
roughly a 1/3 ratio, or more realistically, as you mention (and I did 
upthread), it's per *transaction* which is a much lower rate.

Your points about limits, standardness constraints are well taken; those 
are the kinds of things that do actually matter today, but I was not 
thinking about.

-- 
You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+unsubscribe@googlegroups•com.
To view this discussion visit https://groups.google.com/d/msgid/bitcoindev/e4d271ad-9ea3-41e5-96e2-6cb0118943e4n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 10567 bytes --]

next prev parent reply	other threads:[~2025-10-07 13:52 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-01 14:24 waxwing/ AdamISZ
2025-10-01 22:10 ` Greg Maxwell
2025-10-01 23:11   ` Andrew Poelstra
2025-10-02  0:25     ` waxwing/ AdamISZ
2025-10-02 15:56       ` waxwing/ AdamISZ
2025-10-02 19:49         ` Greg Maxwell
2025-10-06 13:04           ` waxwing/ AdamISZ
2025-10-03 13:24 ` Peter Todd
2025-10-04  2:39   ` waxwing/ AdamISZ
2025-10-07  8:22 ` Anthony Towns
2025-10-07 12:05   ` waxwing/ AdamISZ [this message]
2025-10-08  5:12     ` Anthony Towns
2025-10-08 12:55       ` waxwing/ AdamISZ

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e4d271ad-9ea3-41e5-96e2-6cb0118943e4n@googlegroups.com \
    --to=ekaggata@gmail$(echo .)com \
    --cc=bitcoindev@googlegroups.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox