--- Log opened Fri Dec 04 00:00:36 2020
00:33 -!- jeremyrubin [~jr@c-73-15-215-148.hsd1.ca.comcast.net] has quit [Ping timeout: 264 seconds]
07:58 -!- jeremyrubin [~jr@c-73-15-215-148.hsd1.ca.comcast.net] has joined ##miniscript
08:07 < andytoshi> shesek brings up an interesting issue https://github.com/rust-bitcoin/rust-miniscript/pull/197#issuecomment-738235908
08:07 < andytoshi> related old core issue https://github.com/bitcoin/bitcoin/issues/15740
08:09 < andytoshi> exposing two checksums does not seem reasonable to me
08:10 < shesek> I don't either, I found Bitcoin Core's behavior to be even more confusing...
08:10 < andytoshi> but i'm not too worried about descriptors having different forms and having different checksums... you encounter similar issues by providing different depths of keyorigin info
08:11 < andytoshi> i think we want a "descriptor id' which is not a checksum
08:11 < andytoshi> but that is potentially difficult, because i can encode keys in different ways if i know a hardened path
08:11 < andytoshi> such that 3rd parties cannot tell, even in principle, tha tthey are the same key. because they cannot do the hardened steps
08:12 < andytoshi> (or maybe i am confused? i guess you never have hardened derivation steps in descriptors that you expect anyone to get an address out of)
08:12 < andytoshi> oh lol
08:12 < shesek> different depths is not quite the same though, you'll still get the same checksum given the same descriptor. with non-canonical encoding like with h/' you lose the checksum even for the exact same descriptor
08:12 < andytoshi> so ... one "canonical descriptor ID" may be the scriptpubkey
08:12 < andytoshi> consensus on the core issue seems to be that we should just transition away from ' everywhere
08:13 < andytoshi> so maybe it's ok if ' descriptors are second-class
08:14 < shesek> it '/h the only occurrence, though? are there any other valid non-canonical ways to encode descriptors that won't survive a deserialization followed by serialization?
08:14 < andytoshi> heh i don't know, and i don't know how to tell
08:14 < shesek> say, like extra spacing?
08:14 < andytoshi> i wonder if dmitry's alloy thing helps... i still haven't read that
08:14 < andytoshi> i think spaces are disallowed in descriptors
08:14 < andytoshi> everywhere
08:16 < shesek> if it really is just '/h, or assuming that the other cases are fixable, I would also be in favor of just transitioning away from '. in bwt I might even just outright reject them
08:16 < andytoshi> i guess, let me go disallow ' and then write a fuzztest at least..
08:16 < andytoshi> in a month i'll have a supercomputer like gmax does (well, about 1/4 the power :)) and then we can do some real fuzzing..
08:17 < shesek> yeah, some fuzzing would definitely help here. congrats on the new machine :)
08:18 < andytoshi> thanks :) i'm putting it together at my parents' house but i'm not even in the right country (and covid makes travel difficult) so it'll be a while to get everything together
08:18 < andytoshi> and i have the ram/ssd on my person so my family can't eve nbuild it without me :P
08:19 < andytoshi> bleh the ' vs h thing is in rust-bitcoin isn't it..
08:19 < shesek> would it make sense to add a function to rust-miniscript that verifies if a given descriptor is canonical? or perhaps a new Descriptor method that parses from a string and fails for non-canonical ones?
08:20 < andytoshi> if we can enumerate all the sources of canonicity yes..
08:20 < shesek> Descriptor::from_canonical_str() or whatever
08:20 < shesek> andytoshi, you don't need do, can just check that Descriptor::from_str(s).to_string() == s
08:21 < shesek> (that's what I did in bwt)
08:22 < andytoshi> i really don't like saying "what .to_string() does in rust-miniscript is "canonical", and no we don't know what that entails"
08:22 < andytoshi> but i suspect it is just h vs '
08:22 < shesek> right. I should've added "(as defined by rust-miniscript)" next to every "canonical"
08:24 < shesek> but I think that it is reasonable to treat whatever rust-miniscript is doing as some sort of canonically that one should stick to avoid losing checksum information
08:24 < andytoshi> heh that feels like a lot of responsibility on rust-miniscript
08:25 < andytoshi> and i don't think it's reasonable for people to need Rust to have a notion of canonicity for anything
08:25 < shesek> well, as an app developer using rust-miniscript that doesn't want its users to experience weird behavior around checksums, what would you suggest I do? is there an alternative approach?
08:25 < andytoshi> rust itself has no spec and no reasonably-buildable recent compiler
08:26 < andytoshi> one alternative is to use scriptPubKeys as IDs
08:26 < andytoshi> can you do that? or do you not have enough data to compute scriptpubkeys
08:27 < andytoshi> i'm not sure what you mean by not wanting your users to "experience weird behavior" ... currently if they use ' vs h they'll see different checksums, and that's because of how the checksum is defined, and there's not really anything you or i can do about that
08:27 < shesek> you mean derive the spk at index 0 and use that? yeah, I could do that
08:27 < andytoshi> yeah
08:28 < andytoshi> i wouldn't mind calling that "the descriptor ID" even
08:28 < andytoshi> though we should wait for sipa to wake up and see what his thoughts are
08:29 < andytoshi> oh, does the checksum depend on the case of hex-encoded data?
08:29 < andytoshi> another source of weirdness is that the "bare descriptor" c:pk_k is equivalent to the pk descriptor
08:29 < andytoshi> and similar for pkh
08:30 < shesek> it seems that it does depend on it
08:30 < andytoshi> i wouldn't mind explicitly forbidding that as a bare descriptor
08:30 < andytoshi> do we support pk() throughout miniscript as an alias for c:pk_k?
08:31 < andytoshi> i think we might
08:32 < andytoshi> yeah we do ... and i'm sure that affects the checksum
08:33 < shesek> re "not sure what you mean by ... currently if they use": with bwt it'll currently reject the descriptor if they use h. the weird behavior is providing a checksummed descriptor, then seeing a different one in the APIs. its of course okay that they'll get different checksums for ' vs h, as long as the software they're feeding it into sticks to it...
08:33 < andytoshi> i think there's a similar story for l: and or_i(X,0) though we may have forbidden the long form of that
08:33 < shesek> or at least tell them loudly that something is up
08:34 < andytoshi> i think, more than just the descriptor, having the user see a different descriptor in the API than what they input may be problematic
08:34 < shesek> more than just the checksum you mean
08:34 < shesek> ?
08:34 < andytoshi> yeah
08:35 < shesek> yeah, I agree, its problematic either way. but it seemed more problematic with descriptors that have an explicit checksum, which suggests the user might care about that checksum
08:35 < andytoshi> i don't think the user should care about the checksum as anything other than a checksum
08:35 < andytoshi> we shouldn't be using it as a sort of descriptor id
08:36 < andytoshi> it's too short anyway (i also feel this way about key fingerprints, though with descriptors the danger is potentially more severe since descirptors are multiparty)
08:37 < andytoshi> i can easily come up with a descriptor that excludes you, but whose checksum matches one of your descriptors
08:38 < andytoshi> (we should have a general "am i excluded" function, see https://github.com/rust-bitcoin/rust-miniscript/issues/57 )
08:38 < shesek> hmm, yes, I tend to agree. the root cause of my issues is that checksums aren't really meant to be used as identifiers
08:39 < andytoshi> i think "scriptpubkey at index 0" is a reasonable identifier, though it also has the problem that you can replace /* with /0 and you'd get the same ID that way
08:39 < andytoshi> running fuzzer btw
08:39 < shesek> checksums have 40 bits which is at least better than bip32 fingerprints which are 32, but still not really enough
08:39 < andytoshi> yeah, i can still second-preimage 40 bits on my laptop if i let it go overnight
08:40 < andytoshi> (though actually, i am doing a 40-bit bech32 vanitygen search on my laptop which is at 13 hours and counting..)
08:42 < sipa> the checksum is a linear code
08:43 < sipa> it has no collision resistance whatsoever
08:43 < andytoshi> oh lol
08:43 < andytoshi> ofc
08:51 < shesek> I like the spk-based-id approach. maybe it could be "spk at index 0 + a bit indicating whether each xpub in the descriptor is wildcard or not, in order"? a simpler but not as failproof approach is to derive at an unusual index, say the maximum non-hardened index
08:52 < shesek> would be useful to have a standard defined for something that could be used as a stable, collision-resistant identifier
08:52 < andytoshi> yeah
08:52 < sipa> that's an interesting idea
08:52 < andytoshi> i'd prefer to use a whole byte rather than a bit, for extension. e.g. for p2c contracts
08:52 < andytoshi> think we'll only ever have 8 ways to parameretize keys? lol
08:53 < sipa> what if you say do something like derive at index -1 or so
08:53 < andytoshi> hmm... so unfortunately i think bip32 uses the whole 32-bit space
08:54 < andytoshi> oh but it never does public derivation with the high bit set
08:54 < andytoshi> nice
08:54 < sipa> yes, but the high ones only with private derivatiin
08:54 < sipa> so you could use 0xffffffff with public derivation
08:54 < andytoshi> lol so that would give us 2^31 extenions
08:54 < sipa> though
08:54 < shesek> how about derive a non-hardened derivation with an index that's normally reserved for hardened derivation?
08:55 < andytoshi> shesek: yeah, i think that's what we're getting at
08:55 < shesek> which is actually what you just suggested ^_^
08:55 < shesek> yes
08:55 < andytoshi> :P
08:55 < sipa> but what if a descriptor has private derivation steps?
08:55 < andytoshi> oo fuzzer found something btw
08:56 < andytoshi> sipa: so, such a descriptor can't really be used by someone who lacks the xprivs
08:56 < sipa> yes
08:56 < andytoshi> so i guess, if you -did- have the xpriv you could "normalize" it by doing all the hardened steps. then compute the ID of the result
08:56 < andytoshi> oh, though you could have a /*h
08:57 < sipa> it's interesting to have a checksum that"s independent of things like origin info etc
08:57 < andytoshi> oh but the spk for that would use /0h rather than /0
08:57 < andytoshi> so i think we can do the same thing
08:58 < sipa> which don't affect the spk
08:58 < sipa> well, identifier
08:58 < sipa> not checksum
08:58 < andytoshi> "derive the spk with all *s set to 0 and all *hs set to 0h, add a bitmap of what's hardened or not"
08:58 < andytoshi> and hash that i guess
08:58 < andytoshi> lol dammit, the fuzzer found a crash
08:58 < andytoshi> sanket1729_: justinmoon: if you try to do from_str on "sh(sortedmulti)" we get an "index out of bounds" error
08:59 < shesek> what was the other non-crash one?
08:59 < andytoshi> shesek: there wasn't a non-crash one
08:59 < andytoshi> i was just slow to open this one
09:00 < shesek> oh okay
09:01 < andytoshi> ok patched that, restartinrg
09:03 < sipa> so uh
09:03 < sipa> maybe this is all overkill
09:03 < sipa> if you're worried about roundtrippability, you can make anythig roundtrip
09:04 < sipa> if we'd remember h vs ', that would roundtrip too
09:05 < andytoshi> and also remember c:pk_k vs pk and c:pk_h vs pkh
09:05 < andytoshi> though maybe we should forbid the long forms of those the way we forbid the long forms of l: and u:
09:05 < sipa> right, we'd need to outlaw those
09:06 < sipa> another "issue" is descriptors with private keys
09:06 < sipa> or if by roundtripping you mean can be inferred back... things like key order in sortedmulti
09:06 < andytoshi> i think roundtrippability might be an independent issue from having a descriptor ID
09:07 < andytoshi> ah yeah, i forgot about key ordering in sortedmulti
09:07 < andytoshi> we'll see if the fuzzer finds that..
09:07 < andytoshi> (or maybe we preserve key ordering in rust-miniscript right now? i think we do..)
09:07 < sipa> bitcoin core will preserve it i think; it's only sorted at derivation time
09:08 < andytoshi> i think roundtrippability is important for things like user surprise / integrity of backups / etc
09:08 < andytoshi> but having a "descriptor ID" is independently important for sanity checking "do you and i have the same descriptor"
09:08 < sipa> agree
09:08 < andytoshi> even if we have different privkeys, hardened paths, etc
09:09 < sipa> well, how about we just bite the bullet and associate a hash with every node in the tree
09:09 < andytoshi> heh hmmm
09:09 < andytoshi> i like that better than using spk
09:10 < andytoshi> mainly because i can imagine descriptor users who actually don't use the script serialization ever (maybe they just hand off to scantxoutset when they want to check the blockchain)
09:10 < sipa> with a few rules (drop origin info, sort children of sortedmulti, drop private keys, ...) and otherwise boring merkle tree hashing... you'd pretty much get what you want
09:10 < andytoshi> i like this
09:11 < andytoshi> also "turn * into -1, *h into ??"
09:11 < sipa> right
09:11 < andytoshi> hehe, we need to define ?? though. i guess, -2 works
09:12 < sipa> one annoying thing, but i don't think it's solvable... you can have a descriptor for an individual spk and for a range, where that spk belongs to that range
09:12 < sipa> and they'll have distinct ids
09:13 < andytoshi> yeah, i don't think that's solvable
09:13 < andytoshi> though it's an interesting question how we want to handle ranges.  probably by keeping them out of the id?
09:13 < andytoshi> as described, two ranged descriptors which are identical except for different ranges
09:13 < andytoshi> will get the same ID
09:13 < sipa> define 'keep out' ?
09:14 < sipa> what does that mean? different "ranges" ?
09:14 < andytoshi> i mean say "the id is 0xabcd1234[1..100]"
09:14 < sipa> different derivation paths?
09:14 < andytoshi> no
09:14 < andytoshi> i mean, subbing different subsets of integers for the *s
09:14 < andytoshi> maybe we don't have such a notion
09:14 < sipa> that's not part of descriptors
09:14 < andytoshi> ah cool
09:14 < sipa> application layer chooses which index to derive at
09:17 < shesek> it seems to me that a wildcard descriptor and a descriptor for an individual spk that could be derived from the wildcard should have different ids, no? why is that an annoyance?
09:19 < sipa> shesek: i think it depends on the application
09:19 < sipa> but yes, it's not unreasonable that they're different
09:25 < shesek> is there significance to having the descriptor id commit to the descriptor in a way that makes the id sufficient for verifying the original descriptor?
09:25 < shesek> say someone took part in a multi-party descriptor setup, kept the descriptor id, then lost the descriptor itself in a tragic boating accident. does it matter that someone could present him with a descriptor that has the right id, but which doesn't actually convey the information necessary for spending? (say, because all xpubs were replaced with the final single pubkey)
09:26 < shesek> (hmm, replacing with the final single pubkey would actually only be possible with non-wildcard descriptors)
09:27 < andytoshi> yeah
09:27 < andytoshi> i think the only worry is that keyorigin info is lost
09:27 < andytoshi> which is not a lot different, conceptually, from the boating-party guy losing his secret keys
09:30 < shesek> right, but the secret keys is something that only the user should have (and in this example, he still has them safe at his house), while the descriptor is something that could be recoverable from the other parties
09:32 < shesek> and for that recovery process, the id won't be sufficient to know that you really got the descriptor you wanted. which might be okay I guess...
09:33 < andytoshi> i think it's ok
09:34 < sipa> shesek: i mean, if you want something that's sufficient for having all the information for spending... you need the descriptor itself
09:34 < sipa> or a hash of it directly
09:36 < shesek> right, I guess a simple hash of the descriptor string will do it, the non-canonicality 
09:36 < shesek> .. doesn't matter much here
09:39 < shesek> its just that... if we have a standard descriptor id that's widely understood by bitcoin software, it might be useful to have that same id commit to all the information needed for spending. but its not really necessary either
09:39 < andytoshi> i don't think there's any canonical notion of "all the information needed for spending"
09:39 < andytoshi> aside from the descriptor itself
09:41 < sipa> right
09:41 < sipa> that's exactly what a descriptor is
09:42 < andytoshi> and re the descriptor, we should fix roundtrippabiilty ... and i think we know how to do that
09:50 < gwillen> it sounds to me like the ID you want is computed in the same way as the checksum right now, just longer
09:50 < gwillen> if you want it to be sensitive to things like h vs ' (which I'm not sure if you actually should, but), which the checksum already is
09:51 < gwillen> I'm stupid for not reading the scrollback before talking, though, since you already went over that
09:53 < gwillen> btw if people actually want ' absolished in favor of h, you should respond to my questions about the implementation on https://github.com/bitcoin/bitcoin/issues/15740
09:54 < gwillen> but it sounds like consensus is in the direction of "don't just handle it in the interface, switch to h internally as well, and maybe abolish ' "
10:00 < sipa> gwillen: i'll respond, but since you're here: what do you think about making ' and h roundtrio?
10:00 < sipa> *roundtrip
10:26 < andytoshi> i'll bet we get pushback from rust-bitcoin about storing that in our bip32 types :/
10:26 < andytoshi> which is not a big deal really, we can hack it into rust-miniscript
10:26 < andytoshi> it looks like we already split paths by '/' in rust-miniscript for some reason, idk why rust-bitcoin isn't doing that..
10:29 < andytoshi> oh lol we are actually already looking at that character in our parsing code
10:30 < andytoshi> yeah we don't need rust-bitcoin onboard at all
10:31 < sipa>  rust-bitcoin doesn't supporg descriptors?
10:32 < andytoshi> no, that's all in rust-miniscript
10:32 < sipa> oh, ok
10:33 < andytoshi> which i guess ought to be renamed to rust-descriptors
10:33 < andytoshi> haha sanket1729_ if we rename the project we can go back to being <1.0
10:50 < shesek> could splitting miniscript and descriptors to separate crates makes sense?
10:51 < shesek> where the descriptor one depends on the miniscript one
11:02 < andytoshi> i don't really think so
11:02 < andytoshi> the descriptor stuff is a small amount of extra code without which miniscript is really not that useful
11:24 < gwillen> sipa: hmmmmmmm, I'm not sure
11:24 < gwillen> I actually hadn't realized before that they didn't (I only just when writing that comment discovered that it always uses ' internally)
11:24 < gwillen> it kind of sounded like people were maybe happy about getting rid of ' entirely, though, and making h canonical
11:25 < sipa> gwillen: they're deserialized into BIP32 paths, and both are turned into ' at serialization time
11:25 < sipa> so the internal representation doesn't distinguish between the two
11:25 < gwillen> well, except that it seems we do internally store string descriptors also, or at least at one time we did
11:25 < gwillen> I can't tell if we now store them redundantly and don't parse them, or don't store them
11:26 < sipa> i think the wallet stores descriptors as strings, because there is no other serialization
11:26 < gwillen> and those strings right now appear to always use '
11:26 < sipa> but the reason why commands like getdescriptorinfo don't is because the parse and reserialize
11:27 < gwillen> personally I think I would rather try to make things more canonical vs make the incidental textual features roundtrip, but I could see either way
11:27 < sipa> achow101: if you'd import a descriptor with a "h" in it, would it be stored as "h" or as "'" ?
11:27 < gwillen> I guess it's likely the RPC interface will always want to accept both forms, and you will always want the checksum to be textual, and that means there will never exactly be a canonical version, we'll always need to deal with both
11:28 < achow101> it will be stored as '
11:28 < achow101> it round trips through the Descriptor class so it gets written as ToString() outputs
11:29 < sipa> makes sense
11:29 < sipa> but afaict, neither idea would actually break anything
11:30 < gwillen> if we start using h for everything internally, we still need to cope with old wallets that have ' and are missing key origin info (meaning we need to parse the stored descriptor)
11:30 < gwillen> that's the only tricky part I could find
11:30 < sipa> the ideas being (a) making Descriptor store h vs ' and serialize back the way it was found  and  (b) always serializing with h
11:30 < gwillen> *nods*
14:12 < sanket1729_> andytoshi: oops, missed this channel all day. with the amount of breaking changes I think it makes sense to go rename crate
15:57 -!- jb55 [~jb55@gateway/tor-sasl/jb55] has quit [Ping timeout: 240 seconds]
16:10 -!- jb55 [~jb55@gateway/tor-sasl/jb55] has joined ##miniscript
16:44 < andytoshi> lol justinmoon darosior how would you feel if we did that "rust-miniscript has been superceded by rust-descriptors" and then we reset the version number back to 0.5 or something
16:44 < andytoshi> since we probably should not have gone 1.0 before rust-bitcoin did ... or before taproot ... or before having an API that supported privkeys ... or before having a coherent interpreter API ... or before figuring out the analyzability rules
16:44 < andytoshi> or etc etc
16:45 < andytoshi> it'd be exactly the same lib we'd just rename it
17:38 < aj> andytoshi: claim you're saving the planet and recylce the "0." that bitcoin is dropping
17:40 < andytoshi> lol, well, we sorta did that when we hit 1.0
17:40 < andytoshi> (and are now on 4.0 .. and will have 5.0 soon .. and so on)
17:41 < andytoshi> we're approaching google chrome levels of version turnover
17:54 < andytoshi> it's not the worst thing i suppose
17:54 < andytoshi> i guess X did something similar, it was at version 11 before it settled down and then they decided to be stable forever
19:13 -!- Netsplit *.net <-> *.split quits: midnight
19:18 -!- Netsplit over, joins: midnight
19:26 -!- midnight [~midnight@unaffiliated/midnightmagic] has quit [Max SendQ exceeded]
19:28 -!- midnight [~midnight@unaffiliated/midnightmagic] has joined ##miniscript
20:49 -!- shesek [~shesek@unaffiliated/shesek] has quit [Remote host closed the connection]
23:15 -!- jeremyrubin [~jr@c-73-15-215-148.hsd1.ca.comcast.net] has quit [Ping timeout: 260 seconds]
--- Log closed Sat Dec 05 00:00:36 2020