--- Log opened Fri Dec 04 00:00:36 2020 00:33 -!- jeremyrubin [~jr@c-73-15-215-148.hsd1.ca.comcast.net] has quit [Ping timeout: 264 seconds] 07:58 -!- jeremyrubin [~jr@c-73-15-215-148.hsd1.ca.comcast.net] has joined ##miniscript 08:07 < andytoshi> shesek brings up an interesting issue https://github.com/rust-bitcoin/rust-miniscript/pull/197#issuecomment-738235908 08:07 < andytoshi> related old core issue https://github.com/bitcoin/bitcoin/issues/15740 08:09 < andytoshi> exposing two checksums does not seem reasonable to me 08:10 < shesek> I don't either, I found Bitcoin Core's behavior to be even more confusing... 08:10 < andytoshi> but i'm not too worried about descriptors having different forms and having different checksums... you encounter similar issues by providing different depths of keyorigin info 08:11 < andytoshi> i think we want a "descriptor id' which is not a checksum 08:11 < andytoshi> but that is potentially difficult, because i can encode keys in different ways if i know a hardened path 08:11 < andytoshi> such that 3rd parties cannot tell, even in principle, tha tthey are the same key. because they cannot do the hardened steps 08:12 < andytoshi> (or maybe i am confused? i guess you never have hardened derivation steps in descriptors that you expect anyone to get an address out of) 08:12 < andytoshi> oh lol 08:12 < shesek> different depths is not quite the same though, you'll still get the same checksum given the same descriptor. with non-canonical encoding like with h/' you lose the checksum even for the exact same descriptor 08:12 < andytoshi> so ... one "canonical descriptor ID" may be the scriptpubkey 08:12 < andytoshi> consensus on the core issue seems to be that we should just transition away from ' everywhere 08:13 < andytoshi> so maybe it's ok if ' descriptors are second-class 08:14 < shesek> it '/h the only occurrence, though? are there any other valid non-canonical ways to encode descriptors that won't survive a deserialization followed by serialization? 08:14 < andytoshi> heh i don't know, and i don't know how to tell 08:14 < shesek> say, like extra spacing? 08:14 < andytoshi> i wonder if dmitry's alloy thing helps... i still haven't read that 08:14 < andytoshi> i think spaces are disallowed in descriptors 08:14 < andytoshi> everywhere 08:16 < shesek> if it really is just '/h, or assuming that the other cases are fixable, I would also be in favor of just transitioning away from '. in bwt I might even just outright reject them 08:16 < andytoshi> i guess, let me go disallow ' and then write a fuzztest at least.. 08:16 < andytoshi> in a month i'll have a supercomputer like gmax does (well, about 1/4 the power :)) and then we can do some real fuzzing.. 08:17 < shesek> yeah, some fuzzing would definitely help here. congrats on the new machine :) 08:18 < andytoshi> thanks :) i'm putting it together at my parents' house but i'm not even in the right country (and covid makes travel difficult) so it'll be a while to get everything together 08:18 < andytoshi> and i have the ram/ssd on my person so my family can't eve nbuild it without me :P 08:19 < andytoshi> bleh the ' vs h thing is in rust-bitcoin isn't it.. 08:19 < shesek> would it make sense to add a function to rust-miniscript that verifies if a given descriptor is canonical? or perhaps a new Descriptor method that parses from a string and fails for non-canonical ones? 08:20 < andytoshi> if we can enumerate all the sources of canonicity yes.. 08:20 < shesek> Descriptor::from_canonical_str() or whatever 08:20 < shesek> andytoshi, you don't need do, can just check that Descriptor::from_str(s).to_string() == s 08:21 < shesek> (that's what I did in bwt) 08:22 < andytoshi> i really don't like saying "what .to_string() does in rust-miniscript is "canonical", and no we don't know what that entails" 08:22 < andytoshi> but i suspect it is just h vs ' 08:22 < shesek> right. I should've added "(as defined by rust-miniscript)" next to every "canonical" 08:24 < shesek> but I think that it is reasonable to treat whatever rust-miniscript is doing as some sort of canonically that one should stick to avoid losing checksum information 08:24 < andytoshi> heh that feels like a lot of responsibility on rust-miniscript 08:25 < andytoshi> and i don't think it's reasonable for people to need Rust to have a notion of canonicity for anything 08:25 < shesek> well, as an app developer using rust-miniscript that doesn't want its users to experience weird behavior around checksums, what would you suggest I do? is there an alternative approach? 08:25 < andytoshi> rust itself has no spec and no reasonably-buildable recent compiler 08:26 < andytoshi> one alternative is to use scriptPubKeys as IDs 08:26 < andytoshi> can you do that? or do you not have enough data to compute scriptpubkeys 08:27 < andytoshi> i'm not sure what you mean by not wanting your users to "experience weird behavior" ... currently if they use ' vs h they'll see different checksums, and that's because of how the checksum is defined, and there's not really anything you or i can do about that 08:27 < shesek> you mean derive the spk at index 0 and use that? yeah, I could do that 08:27 < andytoshi> yeah 08:28 < andytoshi> i wouldn't mind calling that "the descriptor ID" even 08:28 < andytoshi> though we should wait for sipa to wake up and see what his thoughts are 08:29 < andytoshi> oh, does the checksum depend on the case of hex-encoded data? 08:29 < andytoshi> another source of weirdness is that the "bare descriptor" c:pk_k is equivalent to the pk descriptor 08:29 < andytoshi> and similar for pkh 08:30 < shesek> it seems that it does depend on it 08:30 < andytoshi> i wouldn't mind explicitly forbidding that as a bare descriptor 08:30 < andytoshi> do we support pk() throughout miniscript as an alias for c:pk_k? 08:31 < andytoshi> i think we might 08:32 < andytoshi> yeah we do ... and i'm sure that affects the checksum 08:33 < shesek> re "not sure what you mean by ... currently if they use": with bwt it'll currently reject the descriptor if they use h. the weird behavior is providing a checksummed descriptor, then seeing a different one in the APIs. its of course okay that they'll get different checksums for ' vs h, as long as the software they're feeding it into sticks to it... 08:33 < andytoshi> i think there's a similar story for l: and or_i(X,0) though we may have forbidden the long form of that 08:33 < shesek> or at least tell them loudly that something is up 08:34 < andytoshi> i think, more than just the descriptor, having the user see a different descriptor in the API than what they input may be problematic 08:34 < shesek> more than just the checksum you mean 08:34 < shesek> ? 08:34 < andytoshi> yeah 08:35 < shesek> yeah, I agree, its problematic either way. but it seemed more problematic with descriptors that have an explicit checksum, which suggests the user might care about that checksum 08:35 < andytoshi> i don't think the user should care about the checksum as anything other than a checksum 08:35 < andytoshi> we shouldn't be using it as a sort of descriptor id 08:36 < andytoshi> it's too short anyway (i also feel this way about key fingerprints, though with descriptors the danger is potentially more severe since descirptors are multiparty) 08:37 < andytoshi> i can easily come up with a descriptor that excludes you, but whose checksum matches one of your descriptors 08:38 < andytoshi> (we should have a general "am i excluded" function, see https://github.com/rust-bitcoin/rust-miniscript/issues/57 ) 08:38 < shesek> hmm, yes, I tend to agree. the root cause of my issues is that checksums aren't really meant to be used as identifiers 08:39 < andytoshi> i think "scriptpubkey at index 0" is a reasonable identifier, though it also has the problem that you can replace /* with /0 and you'd get the same ID that way 08:39 < andytoshi> running fuzzer btw 08:39 < shesek> checksums have 40 bits which is at least better than bip32 fingerprints which are 32, but still not really enough 08:39 < andytoshi> yeah, i can still second-preimage 40 bits on my laptop if i let it go overnight 08:40 < andytoshi> (though actually, i am doing a 40-bit bech32 vanitygen search on my laptop which is at 13 hours and counting..) 08:42 < sipa> the checksum is a linear code 08:43 < sipa> it has no collision resistance whatsoever 08:43 < andytoshi> oh lol 08:43 < andytoshi> ofc 08:51 < shesek> I like the spk-based-id approach. maybe it could be "spk at index 0 + a bit indicating whether each xpub in the descriptor is wildcard or not, in order"? a simpler but not as failproof approach is to derive at an unusual index, say the maximum non-hardened index 08:52 < shesek> would be useful to have a standard defined for something that could be used as a stable, collision-resistant identifier 08:52 < andytoshi> yeah 08:52 < sipa> that's an interesting idea 08:52 < andytoshi> i'd prefer to use a whole byte rather than a bit, for extension. e.g. for p2c contracts 08:52 < andytoshi> think we'll only ever have 8 ways to parameretize keys? lol 08:53 < sipa> what if you say do something like derive at index -1 or so 08:53 < andytoshi> hmm... so unfortunately i think bip32 uses the whole 32-bit space 08:54 < andytoshi> oh but it never does public derivation with the high bit set 08:54 < andytoshi> nice 08:54 < sipa> yes, but the high ones only with private derivatiin 08:54 < sipa> so you could use 0xffffffff with public derivation 08:54 < andytoshi> lol so that would give us 2^31 extenions 08:54 < sipa> though 08:54 < shesek> how about derive a non-hardened derivation with an index that's normally reserved for hardened derivation? 08:55 < andytoshi> shesek: yeah, i think that's what we're getting at 08:55 < shesek> which is actually what you just suggested ^_^ 08:55 < shesek> yes 08:55 < andytoshi> :P 08:55 < sipa> but what if a descriptor has private derivation steps? 08:55 < andytoshi> oo fuzzer found something btw 08:56 < andytoshi> sipa: so, such a descriptor can't really be used by someone who lacks the xprivs 08:56 < sipa> yes 08:56 < andytoshi> so i guess, if you -did- have the xpriv you could "normalize" it by doing all the hardened steps. then compute the ID of the result 08:56 < andytoshi> oh, though you could have a /*h 08:57 < sipa> it's interesting to have a checksum that"s independent of things like origin info etc 08:57 < andytoshi> oh but the spk for that would use /0h rather than /0 08:57 < andytoshi> so i think we can do the same thing 08:58 < sipa> which don't affect the spk 08:58 < sipa> well, identifier 08:58 < sipa> not checksum 08:58 < andytoshi> "derive the spk with all *s set to 0 and all *hs set to 0h, add a bitmap of what's hardened or not" 08:58 < andytoshi> and hash that i guess 08:58 < andytoshi> lol dammit, the fuzzer found a crash 08:58 < andytoshi> sanket1729_: justinmoon: if you try to do from_str on "sh(sortedmulti)" we get an "index out of bounds" error 08:59 < shesek> what was the other non-crash one? 08:59 < andytoshi> shesek: there wasn't a non-crash one 08:59 < andytoshi> i was just slow to open this one 09:00 < shesek> oh okay 09:01 < andytoshi> ok patched that, restartinrg 09:03 < sipa> so uh 09:03 < sipa> maybe this is all overkill 09:03 < sipa> if you're worried about roundtrippability, you can make anythig roundtrip 09:04 < sipa> if we'd remember h vs ', that would roundtrip too 09:05 < andytoshi> and also remember c:pk_k vs pk and c:pk_h vs pkh 09:05 < andytoshi> though maybe we should forbid the long forms of those the way we forbid the long forms of l: and u: 09:05 < sipa> right, we'd need to outlaw those 09:06 < sipa> another "issue" is descriptors with private keys 09:06 < sipa> or if by roundtripping you mean can be inferred back... things like key order in sortedmulti 09:06 < andytoshi> i think roundtrippability might be an independent issue from having a descriptor ID 09:07 < andytoshi> ah yeah, i forgot about key ordering in sortedmulti 09:07 < andytoshi> we'll see if the fuzzer finds that.. 09:07 < andytoshi> (or maybe we preserve key ordering in rust-miniscript right now? i think we do..) 09:07 < sipa> bitcoin core will preserve it i think; it's only sorted at derivation time 09:08 < andytoshi> i think roundtrippability is important for things like user surprise / integrity of backups / etc 09:08 < andytoshi> but having a "descriptor ID" is independently important for sanity checking "do you and i have the same descriptor" 09:08 < sipa> agree 09:08 < andytoshi> even if we have different privkeys, hardened paths, etc 09:09 < sipa> well, how about we just bite the bullet and associate a hash with every node in the tree 09:09 < andytoshi> heh hmmm 09:09 < andytoshi> i like that better than using spk 09:10 < andytoshi> mainly because i can imagine descriptor users who actually don't use the script serialization ever (maybe they just hand off to scantxoutset when they want to check the blockchain) 09:10 < sipa> with a few rules (drop origin info, sort children of sortedmulti, drop private keys, ...) and otherwise boring merkle tree hashing... you'd pretty much get what you want 09:10 < andytoshi> i like this 09:11 < andytoshi> also "turn * into -1, *h into ??" 09:11 < sipa> right 09:11 < andytoshi> hehe, we need to define ?? though. i guess, -2 works 09:12 < sipa> one annoying thing, but i don't think it's solvable... you can have a descriptor for an individual spk and for a range, where that spk belongs to that range 09:12 < sipa> and they'll have distinct ids 09:13 < andytoshi> yeah, i don't think that's solvable 09:13 < andytoshi> though it's an interesting question how we want to handle ranges. probably by keeping them out of the id? 09:13 < andytoshi> as described, two ranged descriptors which are identical except for different ranges 09:13 < andytoshi> will get the same ID 09:13 < sipa> define 'keep out' ? 09:14 < sipa> what does that mean? different "ranges" ? 09:14 < andytoshi> i mean say "the id is 0xabcd1234[1..100]" 09:14 < sipa> different derivation paths? 09:14 < andytoshi> no 09:14 < andytoshi> i mean, subbing different subsets of integers for the *s 09:14 < andytoshi> maybe we don't have such a notion 09:14 < sipa> that's not part of descriptors 09:14 < andytoshi> ah cool 09:14 < sipa> application layer chooses which index to derive at 09:17 < shesek> it seems to me that a wildcard descriptor and a descriptor for an individual spk that could be derived from the wildcard should have different ids, no? why is that an annoyance? 09:19 < sipa> shesek: i think it depends on the application 09:19 < sipa> but yes, it's not unreasonable that they're different 09:25 < shesek> is there significance to having the descriptor id commit to the descriptor in a way that makes the id sufficient for verifying the original descriptor? 09:25 < shesek> say someone took part in a multi-party descriptor setup, kept the descriptor id, then lost the descriptor itself in a tragic boating accident. does it matter that someone could present him with a descriptor that has the right id, but which doesn't actually convey the information necessary for spending? (say, because all xpubs were replaced with the final single pubkey) 09:26 < shesek> (hmm, replacing with the final single pubkey would actually only be possible with non-wildcard descriptors) 09:27 < andytoshi> yeah 09:27 < andytoshi> i think the only worry is that keyorigin info is lost 09:27 < andytoshi> which is not a lot different, conceptually, from the boating-party guy losing his secret keys 09:30 < shesek> right, but the secret keys is something that only the user should have (and in this example, he still has them safe at his house), while the descriptor is something that could be recoverable from the other parties 09:32 < shesek> and for that recovery process, the id won't be sufficient to know that you really got the descriptor you wanted. which might be okay I guess... 09:33 < andytoshi> i think it's ok 09:34 < sipa> shesek: i mean, if you want something that's sufficient for having all the information for spending... you need the descriptor itself 09:34 < sipa> or a hash of it directly 09:36 < shesek> right, I guess a simple hash of the descriptor string will do it, the non-canonicality 09:36 < shesek> .. doesn't matter much here 09:39 < shesek> its just that... if we have a standard descriptor id that's widely understood by bitcoin software, it might be useful to have that same id commit to all the information needed for spending. but its not really necessary either 09:39 < andytoshi> i don't think there's any canonical notion of "all the information needed for spending" 09:39 < andytoshi> aside from the descriptor itself 09:41 < sipa> right 09:41 < sipa> that's exactly what a descriptor is 09:42 < andytoshi> and re the descriptor, we should fix roundtrippabiilty ... and i think we know how to do that 09:50 < gwillen> it sounds to me like the ID you want is computed in the same way as the checksum right now, just longer 09:50 < gwillen> if you want it to be sensitive to things like h vs ' (which I'm not sure if you actually should, but), which the checksum already is 09:51 < gwillen> I'm stupid for not reading the scrollback before talking, though, since you already went over that 09:53 < gwillen> btw if people actually want ' absolished in favor of h, you should respond to my questions about the implementation on https://github.com/bitcoin/bitcoin/issues/15740 09:54 < gwillen> but it sounds like consensus is in the direction of "don't just handle it in the interface, switch to h internally as well, and maybe abolish ' " 10:00 < sipa> gwillen: i'll respond, but since you're here: what do you think about making ' and h roundtrio? 10:00 < sipa> *roundtrip 10:26 < andytoshi> i'll bet we get pushback from rust-bitcoin about storing that in our bip32 types :/ 10:26 < andytoshi> which is not a big deal really, we can hack it into rust-miniscript 10:26 < andytoshi> it looks like we already split paths by '/' in rust-miniscript for some reason, idk why rust-bitcoin isn't doing that.. 10:29 < andytoshi> oh lol we are actually already looking at that character in our parsing code 10:30 < andytoshi> yeah we don't need rust-bitcoin onboard at all 10:31 < sipa> rust-bitcoin doesn't supporg descriptors? 10:32 < andytoshi> no, that's all in rust-miniscript 10:32 < sipa> oh, ok 10:33 < andytoshi> which i guess ought to be renamed to rust-descriptors 10:33 < andytoshi> haha sanket1729_ if we rename the project we can go back to being <1.0 10:50 < shesek> could splitting miniscript and descriptors to separate crates makes sense? 10:51 < shesek> where the descriptor one depends on the miniscript one 11:02 < andytoshi> i don't really think so 11:02 < andytoshi> the descriptor stuff is a small amount of extra code without which miniscript is really not that useful 11:24 < gwillen> sipa: hmmmmmmm, I'm not sure 11:24 < gwillen> I actually hadn't realized before that they didn't (I only just when writing that comment discovered that it always uses ' internally) 11:24 < gwillen> it kind of sounded like people were maybe happy about getting rid of ' entirely, though, and making h canonical 11:25 < sipa> gwillen: they're deserialized into BIP32 paths, and both are turned into ' at serialization time 11:25 < sipa> so the internal representation doesn't distinguish between the two 11:25 < gwillen> well, except that it seems we do internally store string descriptors also, or at least at one time we did 11:25 < gwillen> I can't tell if we now store them redundantly and don't parse them, or don't store them 11:26 < sipa> i think the wallet stores descriptors as strings, because there is no other serialization 11:26 < gwillen> and those strings right now appear to always use ' 11:26 < sipa> but the reason why commands like getdescriptorinfo don't is because the parse and reserialize 11:27 < gwillen> personally I think I would rather try to make things more canonical vs make the incidental textual features roundtrip, but I could see either way 11:27 < sipa> achow101: if you'd import a descriptor with a "h" in it, would it be stored as "h" or as "'" ? 11:27 < gwillen> I guess it's likely the RPC interface will always want to accept both forms, and you will always want the checksum to be textual, and that means there will never exactly be a canonical version, we'll always need to deal with both 11:28 < achow101> it will be stored as ' 11:28 < achow101> it round trips through the Descriptor class so it gets written as ToString() outputs 11:29 < sipa> makes sense 11:29 < sipa> but afaict, neither idea would actually break anything 11:30 < gwillen> if we start using h for everything internally, we still need to cope with old wallets that have ' and are missing key origin info (meaning we need to parse the stored descriptor) 11:30 < gwillen> that's the only tricky part I could find 11:30 < sipa> the ideas being (a) making Descriptor store h vs ' and serialize back the way it was found and (b) always serializing with h 11:30 < gwillen> *nods* 14:12 < sanket1729_> andytoshi: oops, missed this channel all day. with the amount of breaking changes I think it makes sense to go rename crate 15:57 -!- jb55 [~jb55@gateway/tor-sasl/jb55] has quit [Ping timeout: 240 seconds] 16:10 -!- jb55 [~jb55@gateway/tor-sasl/jb55] has joined ##miniscript 16:44 < andytoshi> lol justinmoon darosior how would you feel if we did that "rust-miniscript has been superceded by rust-descriptors" and then we reset the version number back to 0.5 or something 16:44 < andytoshi> since we probably should not have gone 1.0 before rust-bitcoin did ... or before taproot ... or before having an API that supported privkeys ... or before having a coherent interpreter API ... or before figuring out the analyzability rules 16:44 < andytoshi> or etc etc 16:45 < andytoshi> it'd be exactly the same lib we'd just rename it 17:38 < aj> andytoshi: claim you're saving the planet and recylce the "0." that bitcoin is dropping 17:40 < andytoshi> lol, well, we sorta did that when we hit 1.0 17:40 < andytoshi> (and are now on 4.0 .. and will have 5.0 soon .. and so on) 17:41 < andytoshi> we're approaching google chrome levels of version turnover 17:54 < andytoshi> it's not the worst thing i suppose 17:54 < andytoshi> i guess X did something similar, it was at version 11 before it settled down and then they decided to be stable forever 19:13 -!- Netsplit *.net <-> *.split quits: midnight 19:18 -!- Netsplit over, joins: midnight 19:26 -!- midnight [~midnight@unaffiliated/midnightmagic] has quit [Max SendQ exceeded] 19:28 -!- midnight [~midnight@unaffiliated/midnightmagic] has joined ##miniscript 20:49 -!- shesek [~shesek@unaffiliated/shesek] has quit [Remote host closed the connection] 23:15 -!- jeremyrubin [~jr@c-73-15-215-148.hsd1.ca.comcast.net] has quit [Ping timeout: 260 seconds] --- Log closed Sat Dec 05 00:00:36 2020