--- Day changed Thu Jun 29 2017 05:28 -!- arubi [~ese168@gateway/tor-sasl/ese168] has quit [Remote host closed the connection] 05:29 -!- arubi [~ese168@gateway/tor-sasl/ese168] has joined #secp256k1 06:16 < andytoshi> gmaxwell: if you can tell me what's wrong i'd prefer that 07:24 -!- arubi [~ese168@gateway/tor-sasl/ese168] has quit [Ping timeout: 248 seconds] 07:28 -!- arubi [~ese168@gateway/tor-sasl/ese168] has joined #secp256k1 11:54 -!- SopaXorzTaker [~SopaXorzT@unaffiliated/sopaxorztaker] has quit [Quit: Leaving] 13:54 < sipa> cfields: hi! 13:55 < cfields> heh, hi 13:55 < sipa> how much detail do you want? 13:56 < cfields> zero more detail :) 13:56 < cfields> I need to read and catch up first 13:56 < gmaxwell> I think you shout step back and unbefuddle the different things we're talking about. :P 13:56 < gmaxwell> not give more detail. 13:57 < sipa> a Schnorr signature is verified by takes (R,s), doing some computation on s (together with the public key and message), and checking whether the result matches R 13:58 < cfields> yes 13:58 < sipa> if you have two Schnorr signatures to verify, you can just add up the left and right hand sides 13:59 < sipa> you need to multiply each by a random number first 13:59 < sipa> to avoid cancelling things out unintentionally, but apart from that, that's basically it 14:00 < cfields> ok, easy enough 14:01 < sipa> instead of checking R = s*G + P*H(m || P || R), you check h1*R1 + h2*R2 = h1*s*G + h1*P1*H(m1 || P1 || R1) + h2*s*G + h2*P2*H(m2 || P2 || R2) 14:02 < sipa> note that the result is really just one big multi-multiply-add 14:07 < gmaxwell> Maybe some overview? 14:07 < gmaxwell> --- 14:07 < gmaxwell> aggregate signature -- N keys, one signature, key holders interact, faster processing than N but slower than 1 14:07 < gmaxwell> batch signature -- N keys, N signatures, key holders do not interact, faster processing than N 14:07 < gmaxwell> batch aggregat -- the aggregates above can be batched, so you get non-interactive batching of multiple interactive batches 14:07 < gmaxwell> Block Wide agg. -- N keys, N messages, 1 signature: There are three kinds we know of == Interactive, this is just the first thing but users in the block all interact-- not realistic == Non-interactive pairing crypto also called OWAS. Needs pairing which is a 8x slowdown and new crypto assumptions 14:07 < gmaxwell> == Mimble Wimble style agregation, no pairing but needs 32 bytes of additional per original sign ature. 14:07 < gmaxwell> The blockwide agg stuffs all have negative interactions with caching-- because you can't validate the blockwide aggregate 14:07 < gmaxwell> without validating the whole thing. Though for OWAS part of the computation can be cached (though it requires about 400 bytes of memory 14:07 < gmaxwell> per transaction). 14:07 < gmaxwell> Batch/Agg don't really have any negative interaction with caching. Just don't batch things you've already checked. 14:07 < gmaxwell> The MW style blockwide aggregate only saves 32bytes per signature and requires some gnarly layer violations, so it hasn't 14:07 < gmaxwell> been that exciting to me. 14:07 < gmaxwell> ---- 14:11 < cfields> sipa: ok. took a bit to parse, but I'm following. just trivial factoring. 14:12 < sipa> cfields: now, if one of the signatures is invalid, that equation will fail 14:12 < sipa> cfields: it won't tell you which of the signatures was invalid 14:12 < sipa> 14:12 < cfields> right 14:14 < cfields> gmaxwell: yes, you're right. I was befuddled wrt terminology. 14:14 < cfields> thanks for that. 14:16 < sipa> so ultimately, everything here leads to one big equation with one term for each signature, and one term for each public key 14:17 < sipa> for example a batch of 20 aggregate signatures each composed of 5 signatures 14:17 < sipa> means you have 100 pubkeys total, and 20 signatures total 14:17 < sipa> so 120 terms in the equation 14:19 < cfields> makes sense 14:22 < sipa> now, if you want to compute a*A + b*B, you can rewrite that as (a-b)*A + b*(A+B) 14:22 < sipa> agree? 14:28 < cfields> yes 14:28 < sipa> so, if you have multiply-adds a*A + b*B + c*C + ... 14:28 < sipa> sort the a,b,c,... 14:28 < sipa> take the largest two, and apply the above transformation 14:29 < sipa> now you have strictly smaller integers to work with 14:29 < sipa> and repeat 14:29 < sipa> with very high probability, you end up with all 0s and 1 1 14:30 < cfields> hmm? 14:33 < sipa> say you start with 7*A + 5*B + 4*C 14:34 < sipa> in one step you get (7-5)*A + 5*(A+B) + 4*C 14:34 < sipa> or 2*A + 5*(A+B) + 4*C 14:34 < sipa> next step, 5 and 4 are largest 14:35 < cfields> ok i see 14:35 < sipa> so you get 2*A + (5-4)*(A+B) + 4*(C+A+B) = 2*A + 1*(A+B) + 4*(A+B+V) 14:35 < sipa> and so on 14:36 < sipa> in almost every step, you reduce a number by 1 bit 14:37 < cfields> right 14:47 < cfields> ok, I see the advantage. Very simple in those terms. Thanks for spelling it out. 17:36 < gmaxwell> well it's surprising how well it works. 17:56 < gmaxwell> andytoshi: so, counting doubling as 0.5 adds, our current ecmult approch should be faster than the bos-coster up to size 15. 17:59 < gmaxwell> I'm ignoring the effective affine improvement too, so the cutoff may even be higher. 18:01 < gmaxwell> https://0bin.net/paste/h1TOB0Kd-2zq4doC#bqy1Me4xoRdqTX5FttuNZdyVfIWnXpygyoIeIIEG6pg is my stupid program that tries 100 bos coster runs (and has hardcoded numbers from measuring our current ecmult code in the verify program) 18:01 < gmaxwell> (no endomorphism) 18:02 < andytoshi> by "our current ecmult approach" you mean doing wnaf on every scalar? 18:03 < gmaxwell> andytoshi: yes doing the WNAF strauss. 18:04 < gmaxwell> to do 15 keys that way would be about 10kb of stack unless I'm mistaken. 18:09 < gmaxwell> oh might be off by one (in a way that favors bos, checking 18:11 < gmaxwell> oh yea, I am. 18:13 < gmaxwell> uh, yea, I think bos-coster never wins. o_0 18:13 < gmaxwell> (well not up to size 64) 18:15 < gmaxwell> oh actually. measuring is hard. 18:16 < gmaxwell> yea, forgot we use different G and A window sizes. fixing. 18:21 < gmaxwell> So with correct numbers, without endo bos-coster wins at a size of 59 that huge G table has a non-trivial impact. With endo bos-coster wins at 32. 18:21 < andytoshi> surprising. why would sigagg verify be so much faster than regular verify then 18:22 < gmaxwell> (assming our multi-exp takes the form of aP1 + bP2 ... + zG) 18:22 < gmaxwell> because strauss' algorithim is also much faster with a bigger multiexp. 18:22 < sipa> sigagg gets down to 30ish additions per element 18:22 < andytoshi> ah kk 18:25 < gmaxwell> here is the number of adds we do in the current code as a function of cnt (number of keys in) 18:25 < gmaxwell> (42.942+7.5)*(cnt-1)+15.494+(256/2.) #no-endo 18:26 < gmaxwell> (43.344+7.5)*(cnt-1)+16.936+128/2. #endo 18:26 < gmaxwell> the cnt-1 is because G is specially handled with a much bigger wnaf table. 18:27 < gmaxwell> (those numbers are just from instrumenting and measuring the additions in 500 verifies) 18:28 < sipa> that 42.942 number should be 256/6 = 42.6 on average 18:28 < sipa> likewise for the 43.344, actually 18:28 < gmaxwell> I agree on the first, but not the second. I suspect there is a +1 in your wnaf math that is getting doubled. 18:28 < gmaxwell> I'll run more of them. 18:30 < gmaxwell> 43.2242 16.9992 < for 200k w/ endo. 18:33 < sipa> full jacobian add: 3, affine add: 2.1, double: 1.3 18:34 < gmaxwell> 42.9344 15.4929 < is for 200k runs w/o endo fwiw. 18:34 < gmaxwell> okay I'll plug in those multipleers for the different add types 18:45 < gmaxwell> so without endo, using those numbers for effective affine the bos-coster is still 37% faster at a batch size of 4096. 18:46 < gmaxwell> (bos starts winning somewhere between 256 and 512) 18:49 < gmaxwell> w/ endo the breakeven is between 128 and 256. 18:53 < andytoshi> the stack space for the precomp starts getting onerous quite a bit before that 18:53 < andytoshi> should we just use the heap for aggsig? 18:54 < gmaxwell> andytoshi: sure but to do 4096 with bos-coster we'll need a scratch space in any case. So we could have a scratch space context and use it for both. 19:09 < gmaxwell> w/ endo and batch of 4096 pubkeys, bos is a 48% speedup.