--- Log opened Wed Aug 21 00:00:46 2019
05:40 < nickler> elichai2: sorry for the late reply, re https://github.com/bitcoin-core/secp256k1/pull/558/files#diff-4ad2dd2b8b6840915f8f0e99d18e1dc8R657 this is not there to deal with constness but to create array of pointers to messages which is required by verify_batch instead of an array of messages.
06:08 -!- elichai2 [uid212594@gateway/web/irccloud.com/x-jlacsifbunngnxdy] has joined #secp256k1
06:35 < elichai2> Didn't know it's a thing to look up into structs using array indexes `&a[np]` (saw this in `secp256k1_ecmult_strauss_wnaf`)
07:14 -!- elichai2 [uid212594@gateway/web/irccloud.com/x-jlacsifbunngnxdy] has quit [Ping timeout: 250 seconds]
07:15 -!- Lightsword [~Lightswor@2604:a880:1:20::1d3:9001] has quit [Ping timeout: 250 seconds]
07:16 -!- eragmus [sid136308@gateway/web/irccloud.com/x-fwdtmrtipljtluit] has quit [Ping timeout: 250 seconds]
07:17 -!- Lightsword [~Lightswor@107.170.253.193] has joined #secp256k1
07:18 -!- eragmus [sid136308@gateway/web/irccloud.com/x-mbskwmzzvasdcmyt] has joined #secp256k1
07:19 -!- elichai2 [uid212594@gateway/web/irccloud.com/x-virceihzfqfvnmqe] has joined #secp256k1
08:12 < real_or_random> elichai2: yeah hm, I don't think it's good style
08:13 < real_or_random> it simply relies on the guarantee that you can cast a pointer to a struct into a pointer to the first element
08:17 < real_or_random> ah no!
08:20 < real_or_random> "const secp256k1_gej *a" ... so this is just an array subscript, so we're not looking into the struct
08:20 < real_or_random> it's an array of structs
08:21 < roconnor> &a[np] is equivalent to (a + np)
08:26 < sipa> or better: a[b] is equivalent to *(a + b)
08:32 < elichai2> real_or_random: ohh I missed that emult calls this with `1`. so I didn't realized it looks at it like an array
08:41 < elichai2> Is there a place I can put this function? I find it very handy with in my tests https://gist.github.com/rust-play/16f64d5d9d7310e7cc68e2b35b660b3e (otherwise I'll put it in a local file because I keep deleting it when I think i'm done with the tests and then write it all over again when something fails lol)
08:43 < elichai2> (maybe invalid instead of infinity is more correct but whatever)
09:31 -!- sipa [~pw@gateway/tor-sasl/sipa1024] has quit [Remote host closed the connection]
09:31 -!- sipa [~pw@gateway/tor-sasl/sipa1024] has joined #secp256k1
09:59 -!- afk11 [~afk11@gateway/tor-sasl/afk11] has quit [Ping timeout: 260 seconds]
10:00 -!- afk11 [~afk11@gateway/tor-sasl/afk11] has joined #secp256k1
10:52 -!- jonatack [~jon@2a01:e35:8aba:8220:6627:dad:d967:649d] has quit [Ping timeout: 250 seconds]
11:05 -!- jonatack [~jon@jau64-1-88-171-168-34.fbx.proxad.net] has joined #secp256k1
11:19 -!- jonatack [~jon@jau64-1-88-171-168-34.fbx.proxad.net] has quit [Ping timeout: 276 seconds]
11:20 -!- jonatack [~jon@2a01:e35:8aba:8220:6627:dad:d967:649d] has joined #secp256k1
11:44 < elichai2> So i'm pretty much finished. but not really satisfied with the benchmarks, `ecdsa_recover_batch: 150: min 82.0us / avg 82.5us / max 83.4us` though I don't see any obvious optimization here. (maybe thinking of a way to optimize `S=A-X, R=A+X`)
11:46 < sipa> how fast is it for a batch of 2 keys?
11:47 < elichai2> Same `ecdsa_recover_batch: 2: min 83.0us / avg 83.7us / max 85.2us`
11:48 < sipa> and an ecdsa verification on the same hardware?
11:49 < elichai2> `ecdsa_verify: min 49.0us / avg 51.7us / max 53.0us`
11:49 < elichai2> (the amounts are per signature)
11:52 < andytoshi> shouldn't the recover and verify numbers be the same?
11:52 < sipa> so this means it's basically only usable when k < 3/5*n
11:52 < andytoshi> maybe there's some decompression and comparisons involved
11:53 < andytoshi> or do we sometimes need to do 2x the work to recover?
11:53 < sipa> andytoshi: recovery needs two independent EC multiplications; verification needs one combined EC multiplication
11:53 < andytoshi> ah
11:54 < sipa> elichai2: can i see your code?
11:54 < elichai2> yes https://github.com/elichai/secp256k1/blob/2019-08-batch-recover/src/modules/recovery/main_impl.h#L123
11:55 < elichai2> Trying to think of a way to test how much time is spent on `secp256k1_gej_add_var` and if it's worth looking for an algorithm to optimize this
11:55 < sipa> you could batch the R.x inversions
11:56 < elichai2> you mean across signatures?
11:56 < sipa> yes
11:57 < elichai2> is there a function that already does this?
11:58 < elichai2> (didn't see any other batching functions except `secp256k1_ecmult` and `secp256k1_ecmult_multi_var`
12:00 < sipa> ah right, that would need scalar batch inversion
12:00 < sipa> not field batch inversion
12:02 < elichai2> oh right I see there's `secp256k1_fe_inv_all_var` for fields, should I try writing a similiar one for scalars? (do you think it's gonna give more performance?) altough this stats are with gmp. so it should already be fast (but without endo)
12:03 < sipa> without gmp the gains will be nigger i think
12:03 < sipa> *bigger
12:14 < elichai2> i'll try to profile this, see how much time is spent on inversions vs secp256k1_gej_add_var vs maybe something else
12:53 < roconnor> sipa: can you expand on "recovery needs two independent EC multiplications; verification needs one combined EC multiplication"?
12:56 < andytoshi> i believe he is referring to both branches of P = (1/r) (mG ± sR)
12:56 < andytoshi> er rather, computing both mG and sR separately
12:57 < sipa> right; you need to compute (m/r)G and (s/r)R, and then compute their sum and difference
12:58 < sipa> rather than a single multi-multiplication (s/r)R + (-m/r)G
12:58 < roconnor> ah because there are two pubkeys that need to be computed.
12:58 < sipa> yeah
12:58 < roconnor> dang
12:59 < sipa> of course, we _have_ the pubkeys to match against
12:59 < sipa> i wonder if we can avoid computing multiple versions by taking that information into account
13:06 < andytoshi> probbaly not without doing key-many EC ops
13:06 < sipa> yeah
13:07 < andytoshi> there are fast multiexp algorithms for computing all the different (m/r)Gs in parallel
13:07 < andytoshi> if you interpret pippinger as a matrix and take the transpose and then *handwave* *magic*
13:07 < elichai2> well in theory, we could replace some of the multiplication time with comparing time. i.e. passing also a list of pubkeys and if the recovered pubkey matches something on the list then don't compute the next pubkeys and remove that one from the list
13:08 < elichai2> but then we depend on luck
13:09 < andytoshi> well, the unused keys will never get removed from that list
13:09 < sipa> yeah
13:09 < andytoshi> so no matter how many comparisons we do i don't think it'll save us anything
13:09 < andytoshi> if i'm understanding your suggestion correctly
13:10 < elichai2> I mean go to the first signature, recover one pubkey, go over the pubkey list, if it's there you don't need to compute the next 1-3 keys
13:11 < sipa> we can also avoid the batch inversion entirely by providing the list of pubkeys
13:11 < elichai2> then there's ~50% that each key will require only the first recovery, without negating and adding again
13:11 < sipa> as you can compare affine with jacobian points efficiently
13:11 < andytoshi> sipa: oh that's a good point
13:11 < andytoshi> elichai2: ah i see
13:11 < sipa> you don't even need to decompress them actually
13:11 < sipa> just test the x coordinates
13:11 < sipa> and then sanity check y on match
13:11 < andytoshi> so it'd be a 50% chance it costs verify-much computation, 50% chance it costs as much as the current algo
13:11 < andytoshi> which sounds like a win
13:12 < andytoshi> sipa: yeah nice
13:12 < elichai2> how having the x helps with not inversing the rx?
13:14 < andytoshi> oh, it's not the average of the verify perf and the current perf
13:14 < andytoshi> it's the verify per + 50% chance of doing another mult-by-G
13:14 < andytoshi> which i -think- will be a win but i'm not sure
13:16 < sipa> i'm not sure either
13:16 < elichai2> hmm I don't think it's even a mult-by-G, I think it's point addition. becuase to go from the first pubkey to the second you negate the u2Xr point and add the u1 point
13:17 < andytoshi> you don't have the u2Xr point, if you're doing a combined mult
13:17 < elichai2> oh right
13:18 < roconnor> Isn't mult-by-G faster than ecmult?
13:19 < elichai2> usually it should be faster
13:20 < andytoshi> yes. i'm trying to hack the bench_ecmult code to get numbers
13:20 < roconnor> G comes with a giant precomputed table.
13:24 < elichai2> That's weird. I wrote a hardcoded benchmark so I can easily profile it. and in that benchmark I'm seeing `ecdsa_recover_batch: 10: min 56.1us / avg 58.7us / max 62.5us`
13:25 < elichai2> can the sig parsing/pubkey serializing take so much time?! something is wrong here
13:26 < andytoshi> both of those should be extremely fast. though if the pubkey "serializing" includes the gej→ge conversion that'll cost a few µs
13:27 < elichai2> why does it include gej->ge?
13:28 < andytoshi> it doesn't. i'm just speculating that you misspoke
13:28 < andytoshi> or are miscounting
13:28 < elichai2> makes more sense
13:28 < andytoshi> because parsing and serialization of field elems and scalars should be within the noise threshold
13:29 < elichai2> oh lol. I compiled with endo
13:29 < elichai2> that's the difference
13:29 < elichai2> forgot about that
13:29 < andytoshi> lol. sorry, i sholud've asked what you were actually comparing
13:29 < andytoshi> a 25-µs difference is definitely the endo
13:30 < elichai2> I'll compile without endo and without gmp as this are the defaults for bitcoin
13:32 < roconnor> oh the default for bitcoin is without gmp?  I didn't know that.  What's the reason?
13:32 < real_or_random> andytoshi says because it's a PITA in the build process
13:32 < roconnor> I can believe that.
13:32 < elichai2> I asked that in the past, andytoshi gave me some answer that I don't remember lol, but I have a diff for when I compile bitcoin for myself that enables gmp and endo and some other stuff
13:32 < elichai2> real_or_random: yeah
13:33 < sipa> also additional worry about bringing in gmp into consensus
13:33 < real_or_random> sipa: right, that was my first guess
13:33 < real_or_random> and it seems a reasonable thing
13:33 < real_or_random> e.g. after the openssl parsing bug
13:33 < real_or_random> "bug"
13:34 < elichai2> I think gmp is very different than openssl but nvm
13:34 < andytoshi> neither of them are written with consensus in mind
13:35 < real_or_random> some people would argue neither of them are written with security in  mind
13:35 < real_or_random> (to be clear, I won't)
13:35 < sipa> gmp has some constant time functions
13:35 < sipa> for cryptographic applications
13:36 < elichai2> perf without endo, without gmp, and with a benchmark that basically has everything hardcoded except the actual call to `secp256k1_ecdsa_recover_batch` https://pastebin.com/raw/bphSMJUt. the only influence here is that I don't know how to get the context creation outside of the profile
13:36 < sipa> but still, additional code == additional attack surface
13:38 < real_or_random> well there are some old PRs that would avoid gmp entirely
13:38 < andytoshi> i was never able to get the perf to be comparable
13:39 < andytoshi> there are original mathematics in the gmp implementation as near as i can tell
13:39 < andytoshi> (the author of the gcd code wrote a bunch of papers but i think the papers don't tell the whole story)
13:40 < real_or_random> gmp-satoshi
13:40 < andytoshi> lol. he's a real guy who works at a university in sweden iirc
13:42 < elichai2> real_or_random: I would be very suprised to see something that can match gmp's performance
13:43 < andytoshi> elichai2: i mean, i read the papers and read the gmp code. in principle i should've been able to -beat- gmp's performance since we only work with small fixed fields and need no allocations
13:43 < andytoshi> but implementing the contents of the papers wasn't sufficient, and i wasn't able to distangle the gmp code enough to see how it differed
13:45 < elichai2> yeah. i'm cirious if the gmp guys have like an archive of all the papers they've used
13:45 < elichai2> would be interesting to spend a week-month-year reading lol
13:45 < andytoshi> lol. the gmp guys wrote the relevant papers.
14:01 < elichai2> sipa: it does seem that stuff related to scalar inverses are pretty high up (in particular `secp256k1_scalar_sqr_512`). though I don't like that I can't see it ordered by caller. i don't care about the inner functions, I care about the outer ones, that way I can know which algorithm/thing we should try and optimize :/
14:02 < sipa> elichai2: you can only use the public scalar api
14:02 < sipa> all you need is multiplications and inverses
14:02 < elichai2> what do you mean? (I'm taking about the perf report stuff)
14:02 < sipa> oh
14:03 < sipa> sorry i thought you were tryinf to do batch scalar i version
14:03 < sipa> *inversion
14:04 < sipa> elichai2: i think you should try passing in the pubkeys and avoid inversions
14:05 < sipa> with a slight layer violation you can even avoid some square roots
14:06 < elichai2> I'm still not sure how does passing the pubkeys help avoid the inversion of rx
14:06 < elichai2> The pubkey is the sum of the point times the inverse of rx, right? So how does having the point helps?
14:08 < sipa> elichai2: oh you need the inverses of the r values still
14:08 < sipa> but not inverses to compute affine points
14:08 < elichai2> Ohhh that's to avoid the conversions from ge to gejs?
14:10 < sipa> yeah
14:10 < sipa> i think you can do even better
14:11 < sipa> the ECDSA validation equation is ±sR = mG + (R.x)P
14:11 < sipa> you can compute sR for every signature (as a gej)
14:11 < sipa> and mG for every distinct message (as a gej)
14:12 < sipa> oh nevermind
14:17 < elichai2> I think i'll give batching inversions a try to try and batch the r inversions together
14:37 < elichai2> got 10 us less with batching r inverses (no gmp, i'll check the difference with gmp)
14:37 < elichai2> but this requires another 2*n*sizeof(scalar) memory in scratch, altough I might be able to optimize some of it
14:40 < sipa> really, 1280 bytes is not a problem
14:40 < sipa> not even on the weakest of hardware
14:45 < elichai2> so 10us is pretty nice without having to pass the pubkey list. and with gmp it's a 4us improvement
14:46 < elichai2> now would love to understand how having the pubkey can help with checking if it's equal to a gej without converting to ge :)
14:48 < sipa> look at ge_equals_gej in tests.c
14:49 < sipa> the good old "verification does not require computation" trick
14:49 < sipa> for x1/z^2 == x2 to hold you can just check x1 == x2*z^2
14:50 < sipa> and for y1/z^3 == y2 to hold you can check y1 == y2*z^3
14:54 < elichai2> oh just like converting the pubkey to gej and comparing, but with the bonus that by using the existing z of the gej you don't need to normalize anything :)
14:55 < sipa> right
14:56 < sipa> i think you still need an API that can expose multiple resulting points per sig in case several of them are in the list of passed pubkeys
14:56 < sipa> because the logic of matching up sigs with keys should remain on the bitcoin core side
14:59 < elichai2> why? if I pass in the pubkeys+signatures I can just return a bool if all the signatures are valid to these pubkeys or not
14:59 < elichai2> that way there's a lot more to optimize
14:59 < elichai2> (not compute more than one key if the first one verifies, not convert gej->ge etc.)
14:59 < sipa> elichai2: but that's not the exact same semantics as OP_CHECKMULTISIG
14:59 < elichai2> hmm
14:59 < sipa> the keys need to be in the same order as the sigs
15:00 < elichai2> right, because of the order
15:00 < elichai2> yeah
15:00 < elichai2> and that will be weird semantics for secp
15:00 < sipa> you can have a secp256k1_implement_bitcoin_checkmultisig that has its exact semantics
15:00 < sipa> or try to be generic
15:00 < elichai2> lol yeah
15:01 < elichai2> I just don't like the idea that you both pass in pubkeys and return them back
15:01 < elichai2> (and return potentially 4 times more keys)
15:02 < sipa> it could return just the indexes of the matching pubkeys in the list
15:06 < elichai2> hmmm but then it can't return keys not in the list
15:06 < sipa> why would it need to?
15:06 < sipa> that's the point, i think? you pass in a list of pubkeys to filter the results by
15:07 < elichai2> you're right. What I do think is that you might need to return multiple indexes per signature
15:08 < sipa> yes
15:08 < sipa> so you need a way to return up to 4 indexes per input sig
15:08 < sipa> actually
15:09 < sipa> this API is great
15:09 < sipa> because it could be smart and use normal ECDSA verification
15:09 < sipa> ah, no :(
15:09 < sipa> it needs the "sigs-and-pubkeys-are-in-same-order" assumption to do that
15:10 < elichai2> right, here it does matching after recovery
15:11 < elichai2> I'm curious what's the cost of a single key recovery without batching the multiplication
15:11 < andytoshi> isn't there a bench_recover?
15:12 < sipa> single key recovery isn't really the right thing, as it doesn't find both keys
15:12 < elichai2> there is, i'm testing as i'm speaking :)
15:13 < elichai2> sipa: but it can match after the first find and with 50% probablity stop there. oh wait. you can give 2 different keys for the same signature nvm
15:13 < elichai2> lol
15:14 < andytoshi> you have to give the signature twice if you do that
15:14 < andytoshi> with checkmultisig
15:20 < elichai2> the answer is 3us (gej negation + gej addition)
15:20 < elichai2> no a lot
17:56 -!- Netsplit *.net <-> *.split quits: afk11, Lightsword, GAit, echonaut2, niftynei, fjahr, roasbeef, harding, meshcollider, jonatack,  (+23 more, use /NETSPLIT to show all of them)
18:13 -!- Netsplit over, joins: GAit, meshcollider, Apocalyptic, waxwing, echonaut2, andytoshi
18:13 -!- Madars_ [~null@aero-astro-estates.mit.edu] has joined #secp256k1
18:13 -!- Netsplit over, joins: kallewoof
18:13 -!- eragmus_ [sid136308@gateway/web/irccloud.com/session] has joined #secp256k1
18:13 -!- Netsplit over, joins: jonasschnelli, nsh, jonatack
18:13 -!- CodeShark_ [sid126576@gateway/web/irccloud.com/x-qeqbstzegjpqgsew] has joined #secp256k1
18:13 -!- Netsplit over, joins: roasbeef, harding, wallet42, fjahr, TD-Linux, nickler, ariard, BlueMatt, roconnor, elichai2 (+1 more)
18:14 -!- eragmus_ [sid136308@gateway/web/irccloud.com/session] has quit [Changing host]
18:14 -!- eragmus_ [sid136308@gateway/web/irccloud.com/x-dzhgmkdqjtyobiic] has joined #secp256k1
18:14 -!- zmanian_ [sid113594@gateway/web/irccloud.com/session] has joined #secp256k1
18:15 -!- Netsplit over, joins: afk11, arubi, sipa, luke-jr, niftynei, Lightsword
18:15 -!- zmanian_ [sid113594@gateway/web/irccloud.com/session] has quit [Changing host]
18:15 -!- zmanian_ [sid113594@gateway/web/irccloud.com/x-ukxwhoqxxpaxzwib] has joined #secp256k1
18:20 -!- Madars_ is now known as Guest69260
18:22 -!- real_or_random [~real_or_r@2a02:c207:3002:7468::1] has joined #secp256k1
18:22 < elichai2> Opened a draft PR with current numbers in case anyone want to give early feedback, tomorrow i'll pass in the pubkeys and see what will be the performance in that case.
18:24 -!- midnightmagic [~midnightm@unaffiliated/midnightmagic] has joined #secp256k1
19:04 < roconnor> Great work.  I'm personally not too worried about the added complexity.  Even with my previously optimistic outlook, we would probably still want the orginal code path for n-of-n, and if so I don't see the big deal if the threshold to change algorithms lies below k=n-1.  OTOH I don't get to decide this sort of thing either. :)
19:06 < roconnor> I was under the impression that key recovery was almost as fast as signature verification, but that seems to only be the case when you have the bits telling you which specific key to recover.  It does seem like the cost of key recovery is closer to 1.5*k work in terms of signature verification units of work.
19:59 < elichai2> Yeah, without the recid you can't batch ecmult so it's a significant increase, although batching inverses can be a good countermeasure (batching the r inverse gave extra 10us for none gmp)
20:01 < elichai2> And thanks :)
22:17 -!- elichai2 [uid212594@gateway/web/irccloud.com/x-virceihzfqfvnmqe] has quit [Quit: Connection closed for inactivity]
--- Log closed Thu Aug 22 00:00:48 2019