--- Log opened Wed Aug 21 00:00:46 2019 05:40 < nickler> elichai2: sorry for the late reply, re https://github.com/bitcoin-core/secp256k1/pull/558/files#diff-4ad2dd2b8b6840915f8f0e99d18e1dc8R657 this is not there to deal with constness but to create array of pointers to messages which is required by verify_batch instead of an array of messages. 06:08 -!- elichai2 [uid212594@gateway/web/irccloud.com/x-jlacsifbunngnxdy] has joined #secp256k1 06:35 < elichai2> Didn't know it's a thing to look up into structs using array indexes `&a[np]` (saw this in `secp256k1_ecmult_strauss_wnaf`) 07:14 -!- elichai2 [uid212594@gateway/web/irccloud.com/x-jlacsifbunngnxdy] has quit [Ping timeout: 250 seconds] 07:15 -!- Lightsword [~Lightswor@2604:a880:1:20::1d3:9001] has quit [Ping timeout: 250 seconds] 07:16 -!- eragmus [sid136308@gateway/web/irccloud.com/x-fwdtmrtipljtluit] has quit [Ping timeout: 250 seconds] 07:17 -!- Lightsword [~Lightswor@107.170.253.193] has joined #secp256k1 07:18 -!- eragmus [sid136308@gateway/web/irccloud.com/x-mbskwmzzvasdcmyt] has joined #secp256k1 07:19 -!- elichai2 [uid212594@gateway/web/irccloud.com/x-virceihzfqfvnmqe] has joined #secp256k1 08:12 < real_or_random> elichai2: yeah hm, I don't think it's good style 08:13 < real_or_random> it simply relies on the guarantee that you can cast a pointer to a struct into a pointer to the first element 08:17 < real_or_random> ah no! 08:20 < real_or_random> "const secp256k1_gej *a" ... so this is just an array subscript, so we're not looking into the struct 08:20 < real_or_random> it's an array of structs 08:21 < roconnor> &a[np] is equivalent to (a + np) 08:26 < sipa> or better: a[b] is equivalent to *(a + b) 08:32 < elichai2> real_or_random: ohh I missed that emult calls this with `1`. so I didn't realized it looks at it like an array 08:41 < elichai2> Is there a place I can put this function? I find it very handy with in my tests https://gist.github.com/rust-play/16f64d5d9d7310e7cc68e2b35b660b3e (otherwise I'll put it in a local file because I keep deleting it when I think i'm done with the tests and then write it all over again when something fails lol) 08:43 < elichai2> (maybe invalid instead of infinity is more correct but whatever) 09:31 -!- sipa [~pw@gateway/tor-sasl/sipa1024] has quit [Remote host closed the connection] 09:31 -!- sipa [~pw@gateway/tor-sasl/sipa1024] has joined #secp256k1 09:59 -!- afk11 [~afk11@gateway/tor-sasl/afk11] has quit [Ping timeout: 260 seconds] 10:00 -!- afk11 [~afk11@gateway/tor-sasl/afk11] has joined #secp256k1 10:52 -!- jonatack [~jon@2a01:e35:8aba:8220:6627:dad:d967:649d] has quit [Ping timeout: 250 seconds] 11:05 -!- jonatack [~jon@jau64-1-88-171-168-34.fbx.proxad.net] has joined #secp256k1 11:19 -!- jonatack [~jon@jau64-1-88-171-168-34.fbx.proxad.net] has quit [Ping timeout: 276 seconds] 11:20 -!- jonatack [~jon@2a01:e35:8aba:8220:6627:dad:d967:649d] has joined #secp256k1 11:44 < elichai2> So i'm pretty much finished. but not really satisfied with the benchmarks, `ecdsa_recover_batch: 150: min 82.0us / avg 82.5us / max 83.4us` though I don't see any obvious optimization here. (maybe thinking of a way to optimize `S=A-X, R=A+X`) 11:46 < sipa> how fast is it for a batch of 2 keys? 11:47 < elichai2> Same `ecdsa_recover_batch: 2: min 83.0us / avg 83.7us / max 85.2us` 11:48 < sipa> and an ecdsa verification on the same hardware? 11:49 < elichai2> `ecdsa_verify: min 49.0us / avg 51.7us / max 53.0us` 11:49 < elichai2> (the amounts are per signature) 11:52 < andytoshi> shouldn't the recover and verify numbers be the same? 11:52 < sipa> so this means it's basically only usable when k < 3/5*n 11:52 < andytoshi> maybe there's some decompression and comparisons involved 11:53 < andytoshi> or do we sometimes need to do 2x the work to recover? 11:53 < sipa> andytoshi: recovery needs two independent EC multiplications; verification needs one combined EC multiplication 11:53 < andytoshi> ah 11:54 < sipa> elichai2: can i see your code? 11:54 < elichai2> yes https://github.com/elichai/secp256k1/blob/2019-08-batch-recover/src/modules/recovery/main_impl.h#L123 11:55 < elichai2> Trying to think of a way to test how much time is spent on `secp256k1_gej_add_var` and if it's worth looking for an algorithm to optimize this 11:55 < sipa> you could batch the R.x inversions 11:56 < elichai2> you mean across signatures? 11:56 < sipa> yes 11:57 < elichai2> is there a function that already does this? 11:58 < elichai2> (didn't see any other batching functions except `secp256k1_ecmult` and `secp256k1_ecmult_multi_var` 12:00 < sipa> ah right, that would need scalar batch inversion 12:00 < sipa> not field batch inversion 12:02 < elichai2> oh right I see there's `secp256k1_fe_inv_all_var` for fields, should I try writing a similiar one for scalars? (do you think it's gonna give more performance?) altough this stats are with gmp. so it should already be fast (but without endo) 12:03 < sipa> without gmp the gains will be nigger i think 12:03 < sipa> *bigger 12:14 < elichai2> i'll try to profile this, see how much time is spent on inversions vs secp256k1_gej_add_var vs maybe something else 12:53 < roconnor> sipa: can you expand on "recovery needs two independent EC multiplications; verification needs one combined EC multiplication"? 12:56 < andytoshi> i believe he is referring to both branches of P = (1/r) (mG ± sR) 12:56 < andytoshi> er rather, computing both mG and sR separately 12:57 < sipa> right; you need to compute (m/r)G and (s/r)R, and then compute their sum and difference 12:58 < sipa> rather than a single multi-multiplication (s/r)R + (-m/r)G 12:58 < roconnor> ah because there are two pubkeys that need to be computed. 12:58 < sipa> yeah 12:58 < roconnor> dang 12:59 < sipa> of course, we _have_ the pubkeys to match against 12:59 < sipa> i wonder if we can avoid computing multiple versions by taking that information into account 13:06 < andytoshi> probbaly not without doing key-many EC ops 13:06 < sipa> yeah 13:07 < andytoshi> there are fast multiexp algorithms for computing all the different (m/r)Gs in parallel 13:07 < andytoshi> if you interpret pippinger as a matrix and take the transpose and then *handwave* *magic* 13:07 < elichai2> well in theory, we could replace some of the multiplication time with comparing time. i.e. passing also a list of pubkeys and if the recovered pubkey matches something on the list then don't compute the next pubkeys and remove that one from the list 13:08 < elichai2> but then we depend on luck 13:09 < andytoshi> well, the unused keys will never get removed from that list 13:09 < sipa> yeah 13:09 < andytoshi> so no matter how many comparisons we do i don't think it'll save us anything 13:09 < andytoshi> if i'm understanding your suggestion correctly 13:10 < elichai2> I mean go to the first signature, recover one pubkey, go over the pubkey list, if it's there you don't need to compute the next 1-3 keys 13:11 < sipa> we can also avoid the batch inversion entirely by providing the list of pubkeys 13:11 < elichai2> then there's ~50% that each key will require only the first recovery, without negating and adding again 13:11 < sipa> as you can compare affine with jacobian points efficiently 13:11 < andytoshi> sipa: oh that's a good point 13:11 < andytoshi> elichai2: ah i see 13:11 < sipa> you don't even need to decompress them actually 13:11 < sipa> just test the x coordinates 13:11 < sipa> and then sanity check y on match 13:11 < andytoshi> so it'd be a 50% chance it costs verify-much computation, 50% chance it costs as much as the current algo 13:11 < andytoshi> which sounds like a win 13:12 < andytoshi> sipa: yeah nice 13:12 < elichai2> how having the x helps with not inversing the rx? 13:14 < andytoshi> oh, it's not the average of the verify perf and the current perf 13:14 < andytoshi> it's the verify per + 50% chance of doing another mult-by-G 13:14 < andytoshi> which i -think- will be a win but i'm not sure 13:16 < sipa> i'm not sure either 13:16 < elichai2> hmm I don't think it's even a mult-by-G, I think it's point addition. becuase to go from the first pubkey to the second you negate the u2Xr point and add the u1 point 13:17 < andytoshi> you don't have the u2Xr point, if you're doing a combined mult 13:17 < elichai2> oh right 13:18 < roconnor> Isn't mult-by-G faster than ecmult? 13:19 < elichai2> usually it should be faster 13:20 < andytoshi> yes. i'm trying to hack the bench_ecmult code to get numbers 13:20 < roconnor> G comes with a giant precomputed table. 13:24 < elichai2> That's weird. I wrote a hardcoded benchmark so I can easily profile it. and in that benchmark I'm seeing `ecdsa_recover_batch: 10: min 56.1us / avg 58.7us / max 62.5us` 13:25 < elichai2> can the sig parsing/pubkey serializing take so much time?! something is wrong here 13:26 < andytoshi> both of those should be extremely fast. though if the pubkey "serializing" includes the gej→ge conversion that'll cost a few µs 13:27 < elichai2> why does it include gej->ge? 13:28 < andytoshi> it doesn't. i'm just speculating that you misspoke 13:28 < andytoshi> or are miscounting 13:28 < elichai2> makes more sense 13:28 < andytoshi> because parsing and serialization of field elems and scalars should be within the noise threshold 13:29 < elichai2> oh lol. I compiled with endo 13:29 < elichai2> that's the difference 13:29 < elichai2> forgot about that 13:29 < andytoshi> lol. sorry, i sholud've asked what you were actually comparing 13:29 < andytoshi> a 25-µs difference is definitely the endo 13:30 < elichai2> I'll compile without endo and without gmp as this are the defaults for bitcoin 13:32 < roconnor> oh the default for bitcoin is without gmp? I didn't know that. What's the reason? 13:32 < real_or_random> andytoshi says because it's a PITA in the build process 13:32 < roconnor> I can believe that. 13:32 < elichai2> I asked that in the past, andytoshi gave me some answer that I don't remember lol, but I have a diff for when I compile bitcoin for myself that enables gmp and endo and some other stuff 13:32 < elichai2> real_or_random: yeah 13:33 < sipa> also additional worry about bringing in gmp into consensus 13:33 < real_or_random> sipa: right, that was my first guess 13:33 < real_or_random> and it seems a reasonable thing 13:33 < real_or_random> e.g. after the openssl parsing bug 13:33 < real_or_random> "bug" 13:34 < elichai2> I think gmp is very different than openssl but nvm 13:34 < andytoshi> neither of them are written with consensus in mind 13:35 < real_or_random> some people would argue neither of them are written with security in mind 13:35 < real_or_random> (to be clear, I won't) 13:35 < sipa> gmp has some constant time functions 13:35 < sipa> for cryptographic applications 13:36 < elichai2> perf without endo, without gmp, and with a benchmark that basically has everything hardcoded except the actual call to `secp256k1_ecdsa_recover_batch` https://pastebin.com/raw/bphSMJUt. the only influence here is that I don't know how to get the context creation outside of the profile 13:36 < sipa> but still, additional code == additional attack surface 13:38 < real_or_random> well there are some old PRs that would avoid gmp entirely 13:38 < andytoshi> i was never able to get the perf to be comparable 13:39 < andytoshi> there are original mathematics in the gmp implementation as near as i can tell 13:39 < andytoshi> (the author of the gcd code wrote a bunch of papers but i think the papers don't tell the whole story) 13:40 < real_or_random> gmp-satoshi 13:40 < andytoshi> lol. he's a real guy who works at a university in sweden iirc 13:42 < elichai2> real_or_random: I would be very suprised to see something that can match gmp's performance 13:43 < andytoshi> elichai2: i mean, i read the papers and read the gmp code. in principle i should've been able to -beat- gmp's performance since we only work with small fixed fields and need no allocations 13:43 < andytoshi> but implementing the contents of the papers wasn't sufficient, and i wasn't able to distangle the gmp code enough to see how it differed 13:45 < elichai2> yeah. i'm cirious if the gmp guys have like an archive of all the papers they've used 13:45 < elichai2> would be interesting to spend a week-month-year reading lol 13:45 < andytoshi> lol. the gmp guys wrote the relevant papers. 14:01 < elichai2> sipa: it does seem that stuff related to scalar inverses are pretty high up (in particular `secp256k1_scalar_sqr_512`). though I don't like that I can't see it ordered by caller. i don't care about the inner functions, I care about the outer ones, that way I can know which algorithm/thing we should try and optimize :/ 14:02 < sipa> elichai2: you can only use the public scalar api 14:02 < sipa> all you need is multiplications and inverses 14:02 < elichai2> what do you mean? (I'm taking about the perf report stuff) 14:02 < sipa> oh 14:03 < sipa> sorry i thought you were tryinf to do batch scalar i version 14:03 < sipa> *inversion 14:04 < sipa> elichai2: i think you should try passing in the pubkeys and avoid inversions 14:05 < sipa> with a slight layer violation you can even avoid some square roots 14:06 < elichai2> I'm still not sure how does passing the pubkeys help avoid the inversion of rx 14:06 < elichai2> The pubkey is the sum of the point times the inverse of rx, right? So how does having the point helps? 14:08 < sipa> elichai2: oh you need the inverses of the r values still 14:08 < sipa> but not inverses to compute affine points 14:08 < elichai2> Ohhh that's to avoid the conversions from ge to gejs? 14:10 < sipa> yeah 14:10 < sipa> i think you can do even better 14:11 < sipa> the ECDSA validation equation is ±sR = mG + (R.x)P 14:11 < sipa> you can compute sR for every signature (as a gej) 14:11 < sipa> and mG for every distinct message (as a gej) 14:12 < sipa> oh nevermind 14:17 < elichai2> I think i'll give batching inversions a try to try and batch the r inversions together 14:37 < elichai2> got 10 us less with batching r inverses (no gmp, i'll check the difference with gmp) 14:37 < elichai2> but this requires another 2*n*sizeof(scalar) memory in scratch, altough I might be able to optimize some of it 14:40 < sipa> really, 1280 bytes is not a problem 14:40 < sipa> not even on the weakest of hardware 14:45 < elichai2> so 10us is pretty nice without having to pass the pubkey list. and with gmp it's a 4us improvement 14:46 < elichai2> now would love to understand how having the pubkey can help with checking if it's equal to a gej without converting to ge :) 14:48 < sipa> look at ge_equals_gej in tests.c 14:49 < sipa> the good old "verification does not require computation" trick 14:49 < sipa> for x1/z^2 == x2 to hold you can just check x1 == x2*z^2 14:50 < sipa> and for y1/z^3 == y2 to hold you can check y1 == y2*z^3 14:54 < elichai2> oh just like converting the pubkey to gej and comparing, but with the bonus that by using the existing z of the gej you don't need to normalize anything :) 14:55 < sipa> right 14:56 < sipa> i think you still need an API that can expose multiple resulting points per sig in case several of them are in the list of passed pubkeys 14:56 < sipa> because the logic of matching up sigs with keys should remain on the bitcoin core side 14:59 < elichai2> why? if I pass in the pubkeys+signatures I can just return a bool if all the signatures are valid to these pubkeys or not 14:59 < elichai2> that way there's a lot more to optimize 14:59 < elichai2> (not compute more than one key if the first one verifies, not convert gej->ge etc.) 14:59 < sipa> elichai2: but that's not the exact same semantics as OP_CHECKMULTISIG 14:59 < elichai2> hmm 14:59 < sipa> the keys need to be in the same order as the sigs 15:00 < elichai2> right, because of the order 15:00 < elichai2> yeah 15:00 < elichai2> and that will be weird semantics for secp 15:00 < sipa> you can have a secp256k1_implement_bitcoin_checkmultisig that has its exact semantics 15:00 < sipa> or try to be generic 15:00 < elichai2> lol yeah 15:01 < elichai2> I just don't like the idea that you both pass in pubkeys and return them back 15:01 < elichai2> (and return potentially 4 times more keys) 15:02 < sipa> it could return just the indexes of the matching pubkeys in the list 15:06 < elichai2> hmmm but then it can't return keys not in the list 15:06 < sipa> why would it need to? 15:06 < sipa> that's the point, i think? you pass in a list of pubkeys to filter the results by 15:07 < elichai2> you're right. What I do think is that you might need to return multiple indexes per signature 15:08 < sipa> yes 15:08 < sipa> so you need a way to return up to 4 indexes per input sig 15:08 < sipa> actually 15:09 < sipa> this API is great 15:09 < sipa> because it could be smart and use normal ECDSA verification 15:09 < sipa> ah, no :( 15:09 < sipa> it needs the "sigs-and-pubkeys-are-in-same-order" assumption to do that 15:10 < elichai2> right, here it does matching after recovery 15:11 < elichai2> I'm curious what's the cost of a single key recovery without batching the multiplication 15:11 < andytoshi> isn't there a bench_recover? 15:12 < sipa> single key recovery isn't really the right thing, as it doesn't find both keys 15:12 < elichai2> there is, i'm testing as i'm speaking :) 15:13 < elichai2> sipa: but it can match after the first find and with 50% probablity stop there. oh wait. you can give 2 different keys for the same signature nvm 15:13 < elichai2> lol 15:14 < andytoshi> you have to give the signature twice if you do that 15:14 < andytoshi> with checkmultisig 15:20 < elichai2> the answer is 3us (gej negation + gej addition) 15:20 < elichai2> no a lot 17:56 -!- Netsplit *.net <-> *.split quits: afk11, Lightsword, GAit, echonaut2, niftynei, fjahr, roasbeef, harding, meshcollider, jonatack, (+23 more, use /NETSPLIT to show all of them) 18:13 -!- Netsplit over, joins: GAit, meshcollider, Apocalyptic, waxwing, echonaut2, andytoshi 18:13 -!- Madars_ [~null@aero-astro-estates.mit.edu] has joined #secp256k1 18:13 -!- Netsplit over, joins: kallewoof 18:13 -!- eragmus_ [sid136308@gateway/web/irccloud.com/session] has joined #secp256k1 18:13 -!- Netsplit over, joins: jonasschnelli, nsh, jonatack 18:13 -!- CodeShark_ [sid126576@gateway/web/irccloud.com/x-qeqbstzegjpqgsew] has joined #secp256k1 18:13 -!- Netsplit over, joins: roasbeef, harding, wallet42, fjahr, TD-Linux, nickler, ariard, BlueMatt, roconnor, elichai2 (+1 more) 18:14 -!- eragmus_ [sid136308@gateway/web/irccloud.com/session] has quit [Changing host] 18:14 -!- eragmus_ [sid136308@gateway/web/irccloud.com/x-dzhgmkdqjtyobiic] has joined #secp256k1 18:14 -!- zmanian_ [sid113594@gateway/web/irccloud.com/session] has joined #secp256k1 18:15 -!- Netsplit over, joins: afk11, arubi, sipa, luke-jr, niftynei, Lightsword 18:15 -!- zmanian_ [sid113594@gateway/web/irccloud.com/session] has quit [Changing host] 18:15 -!- zmanian_ [sid113594@gateway/web/irccloud.com/x-ukxwhoqxxpaxzwib] has joined #secp256k1 18:20 -!- Madars_ is now known as Guest69260 18:22 -!- real_or_random [~real_or_r@2a02:c207:3002:7468::1] has joined #secp256k1 18:22 < elichai2> Opened a draft PR with current numbers in case anyone want to give early feedback, tomorrow i'll pass in the pubkeys and see what will be the performance in that case. 18:24 -!- midnightmagic [~midnightm@unaffiliated/midnightmagic] has joined #secp256k1 19:04 < roconnor> Great work. I'm personally not too worried about the added complexity. Even with my previously optimistic outlook, we would probably still want the orginal code path for n-of-n, and if so I don't see the big deal if the threshold to change algorithms lies below k=n-1. OTOH I don't get to decide this sort of thing either. :) 19:06 < roconnor> I was under the impression that key recovery was almost as fast as signature verification, but that seems to only be the case when you have the bits telling you which specific key to recover. It does seem like the cost of key recovery is closer to 1.5*k work in terms of signature verification units of work. 19:59 < elichai2> Yeah, without the recid you can't batch ecmult so it's a significant increase, although batching inverses can be a good countermeasure (batching the r inverse gave extra 10us for none gmp) 20:01 < elichai2> And thanks :) 22:17 -!- elichai2 [uid212594@gateway/web/irccloud.com/x-virceihzfqfvnmqe] has quit [Quit: Connection closed for inactivity] --- Log closed Thu Aug 22 00:00:48 2019