--- Day changed Thu Jun 29 2017
05:28 -!- arubi [~ese168@gateway/tor-sasl/ese168] has quit [Remote host closed the connection]     
05:29 -!- arubi [~ese168@gateway/tor-sasl/ese168] has joined #secp256k1     
06:16 < andytoshi> gmaxwell: if you can tell me what's wrong i'd prefer that
07:24 -!- arubi [~ese168@gateway/tor-sasl/ese168] has quit [Ping timeout: 248 seconds]     
07:28 -!- arubi [~ese168@gateway/tor-sasl/ese168] has joined #secp256k1     
11:54 -!- SopaXorzTaker [~SopaXorzT@unaffiliated/sopaxorztaker] has quit [Quit: Leaving]     
13:54 < sipa> cfields: hi!
13:55 < cfields> heh, hi
13:55 < sipa> how much detail do you want?
13:56 < cfields> zero more detail :)
13:56 < cfields> I need to read and catch up first
13:56 < gmaxwell> I think you shout step back and unbefuddle the different things we're talking about. :P
13:56 < gmaxwell> not give more detail.
13:57 < sipa> a Schnorr signature is verified by takes (R,s), doing some computation on s (together with the public key and message), and checking whether the result matches R
13:58 < cfields> yes     
13:58 < sipa> if you have two Schnorr signatures to verify, you can just add up the left and right hand sides
13:59 < sipa> you need to multiply each by a random number first
13:59 < sipa> to avoid cancelling things out unintentionally, but apart from that, that's basically it
14:00 < cfields> ok, easy enough
14:01 < sipa> instead of checking R = s*G + P*H(m || P || R), you check h1*R1 + h2*R2 = h1*s*G + h1*P1*H(m1 || P1 || R1) + h2*s*G + h2*P2*H(m2 || P2 || R2)
14:02 < sipa> note that the result is really just one big multi-multiply-add
14:07 < gmaxwell> Maybe some overview?
14:07 < gmaxwell> ---     
14:07 < gmaxwell> aggregate signature  -- N keys, one signature, key holders interact, faster processing than N but slower than 1
14:07 < gmaxwell> batch signature      -- N keys, N signatures, key holders do not interact, faster processing than N
14:07 < gmaxwell> batch aggregat       -- the aggregates above can be batched, so you get non-interactive batching of multiple interactive batches
14:07 < gmaxwell> Block Wide agg.      -- N keys, N messages, 1 signature:  There are three kinds we know of == Interactive, this is just the first thing but users in the block all interact-- not realistic == Non-interactive pairing crypto also called OWAS.  Needs pairing which is a 8x slowdown and new crypto assumptions
14:07 < gmaxwell>  == Mimble Wimble style agregation, no pairing but needs 32 bytes of additional per original sign ature.
14:07 < gmaxwell> The blockwide agg stuffs all have negative interactions with caching-- because you can't validate the blockwide aggregate
14:07 < gmaxwell> without validating the whole thing. Though for OWAS part of the computation can be cached (though it requires about 400 bytes of memory
14:07 < gmaxwell> per transaction).
14:07 < gmaxwell> Batch/Agg don't really have any negative interaction with caching. Just don't batch things you've already checked.
14:07 < gmaxwell> The MW style blockwide aggregate only saves 32bytes per signature and requires some gnarly layer violations, so it hasn't
14:07 < gmaxwell> been that exciting to me.
14:07 < gmaxwell> ----
14:11 < cfields> sipa: ok. took a bit to parse, but I'm following. just trivial factoring.
14:12 < sipa> cfields: now, if one of the signatures is invalid, that equation will fail
14:12 < sipa> cfields: it won't tell you which of the signatures was invalid
14:12 < sipa> <lunch break>
14:12 < cfields> right
14:14 < cfields> gmaxwell: yes, you're right. I was befuddled wrt terminology.
14:14 < cfields> thanks for that.
14:16 < sipa> so ultimately, everything here leads to one big equation with one term for each signature, and one term for each public key
14:17 < sipa> for example a batch of 20 aggregate signatures each composed of 5 signatures
14:17 < sipa> means you have 100 pubkeys total, and 20 signatures total
14:17 < sipa> so 120 terms in the equation
14:19 < cfields> makes sense
14:22 < sipa> now, if you want to compute a*A + b*B, you can rewrite that as (a-b)*A + b*(A+B)
14:22 < sipa> agree?
14:28 < cfields> yes     
14:28 < sipa> so, if you have multiply-adds a*A + b*B + c*C + ...
14:28 < sipa> sort the a,b,c,...
14:28 < sipa> take the largest two, and apply the above transformation
14:29 < sipa> now you have strictly smaller integers to work with
14:29 < sipa> and repeat
14:29 < sipa> with very high probability, you end up with all 0s and 1 1
14:30 < cfields> hmm?
14:33 < sipa> say you start with 7*A + 5*B + 4*C
14:34 < sipa> in one step you get (7-5)*A + 5*(A+B) + 4*C
14:34 < sipa> or 2*A + 5*(A+B) + 4*C
14:34 < sipa> next step, 5 and 4 are largest
14:35 < cfields> ok i see
14:35 < sipa> so you get 2*A + (5-4)*(A+B) + 4*(C+A+B) = 2*A + 1*(A+B) + 4*(A+B+V)
14:35 < sipa> and so on
14:36 < sipa> in almost every step, you reduce a number by 1 bit
14:37 < cfields> right
14:47 < cfields> ok, I see the advantage. Very simple in those terms. Thanks for spelling it out.
17:36 < gmaxwell> well it's surprising how well it works.
17:56 < gmaxwell> andytoshi: so, counting doubling as 0.5 adds,  our current ecmult approch should be faster than the bos-coster up to size 15.
17:59 < gmaxwell> I'm ignoring the effective affine improvement too, so the cutoff may even be higher.
18:01 < gmaxwell> https://0bin.net/paste/h1TOB0Kd-2zq4doC#bqy1Me4xoRdqTX5FttuNZdyVfIWnXpygyoIeIIEG6pg is my stupid program that tries 100 bos coster runs (and has hardcoded numbers from measuring our current ecmult code in the verify program)
18:01 < gmaxwell> (no endomorphism)
18:02 < andytoshi> by "our current ecmult approach" you mean doing wnaf on every scalar?
18:03 < gmaxwell> andytoshi: yes doing the WNAF strauss.
18:04 < gmaxwell> to do 15 keys that way would be about 10kb of stack unless I'm mistaken.
18:09 < gmaxwell> oh might be off by one (in a way that favors bos, checking
18:11 < gmaxwell> oh yea, I am.
18:13 < gmaxwell> uh, yea, I think bos-coster never wins. o_0
18:13 < gmaxwell> (well not up to size 64)
18:15 < gmaxwell> oh actually. measuring is hard.
18:16 < gmaxwell> yea, forgot we use different G and A window sizes. fixing.
18:21 < gmaxwell> So with correct numbers, without endo bos-coster wins at a size of 59 that huge G table has a non-trivial impact.   With endo bos-coster wins at 32.
18:21 < andytoshi> surprising. why would sigagg verify be so much faster than regular verify then
18:22 < gmaxwell> (assming our multi-exp takes the form of aP1 + bP2 ... + zG)
18:22 < gmaxwell> because strauss' algorithim is also much faster with a bigger multiexp.
18:22 < sipa> sigagg gets down to 30ish additions per element
18:22 < andytoshi> ah kk
18:25 < gmaxwell> here is the number of adds we do in the current code as a function of cnt (number of keys in)
18:25 < gmaxwell> (42.942+7.5)*(cnt-1)+15.494+(256/2.) #no-endo
18:26 < gmaxwell> (43.344+7.5)*(cnt-1)+16.936+128/2. #endo
18:26 < gmaxwell> the cnt-1 is because G is specially handled with a much bigger wnaf table.
18:27 < gmaxwell> (those numbers are just from instrumenting and measuring the additions in 500 verifies)
18:28 < sipa> that 42.942 number should be 256/6 = 42.6 on average
18:28 < sipa> likewise for the 43.344, actually
18:28 < gmaxwell> I agree on the first, but not the second. I suspect there is a +1 in your wnaf math that is getting doubled.
18:28 < gmaxwell> I'll run more of them.
18:30 < gmaxwell> 43.2242 16.9992 < for 200k w/ endo.
18:33 < sipa> full jacobian add: 3, affine add: 2.1, double: 1.3
18:34 < gmaxwell> 42.9344 15.4929 < is for 200k runs w/o endo fwiw.
18:34 < gmaxwell> okay I'll plug in those multipleers for the different add types
18:45 < gmaxwell> so without endo, using those numbers for effective affine the bos-coster is still 37% faster at a batch size of 4096.
18:46 < gmaxwell> (bos starts winning somewhere between 256 and 512)
18:49 < gmaxwell> w/ endo the breakeven is between 128 and 256.
18:53 < andytoshi> the stack space for the precomp starts getting onerous quite a bit before that
18:53 < andytoshi> should we just use the heap for aggsig?
18:54 < gmaxwell> andytoshi: sure but to do 4096 with bos-coster we'll need a scratch space in any case.  So we could have a scratch space context and use it for both.
19:09 < gmaxwell> w/ endo and batch of 4096 pubkeys, bos is a 48% speedup.