--- Day changed Mon Nov 27 2017
00:16 -!- nickler [~nickler@185.12.46.130] has quit [Ping timeout: 255 seconds]     
00:17 -!- nickler [~nickler@185.12.46.130] has joined #secp256k1     
02:35 -!- jtimon [~quassel@164.31.134.37.dynamic.jazztel.es] has joined #secp256k1     
04:13 -!- nickler [~nickler@185.12.46.130] has quit [Ping timeout: 252 seconds]     
04:13 -!- nickler [~nickler@185.12.46.130] has joined #secp256k1     
06:12 -!- SopaXorzTaker [~SopaXorzT@unaffiliated/sopaxorztaker] has joined #secp256k1     
06:32 -!- SopaXorzTaker [~SopaXorzT@unaffiliated/sopaxorztaker] has quit [Remote host closed the connection]     
06:32 -!- SopaXorzTaker [~SopaXorzT@unaffiliated/sopaxorztaker] has joined #secp256k1     
09:47 -!- arubi [~ese168@gateway/tor-sasl/ese168] has quit [Remote host closed the connection]     
09:54 -!- arubi [~ese168@gateway/tor-sasl/ese168] has joined #secp256k1     
10:48 -!- arubi [~ese168@gateway/tor-sasl/ese168] has quit [Ping timeout: 248 seconds]     
10:53 < andytoshi> so, i finally finished the rangeproof verify (had to add the missing 5 exponentiations of equation (61) and randomize those).. today my laptop is back to giving 130us/bit for the old rangeproofs, so 8250us for 64-bits, 4130 for 32-bit
10:54 < andytoshi> meanwhile the 64-bit bulletproof verifies in 2400us, for a 3.4x speedup
10:54 -!- arubi [~ese168@gateway/tor-sasl/ese168] has joined #secp256k1     
10:54 < sipa> w00t
10:54 < andytoshi> yeppers
10:54 < andytoshi> and this is for one proof, when we aggregate we can expect better
10:55 < andytoshi> we'd be going from a 145-point exp to a (145+128)-point exp for an aggregate of two proofs
10:56 < andytoshi> specifically, using N/log(N) and assuming all the verify time is in the multiexp, it looks like we go up to 4.07x
11:00 < andytoshi> https://github.com/ElementsProject/secp256k1-zkp/pull/16
11:02 < andytoshi> oh, lemme rebase to not have so many extraneous commits
11:04 < andytoshi> better
11:04 < gmaxwell> Fantastic!
11:05 < andytoshi> BTW regarding 48-bit or whatever rangeproofs, benedikt and i chatted by email about this and i'm fairly confident that the prover can just round up to the nearest power of 2, using the identity in place of the extra generators. then the verifier doesn't change at all, it just truncates its multiexp after 48 generators rather than doing the full 64
11:06 < andytoshi> i want to update the proof in the paper and PR this so that the stanford people can take a look at it. i think i know how but i haven't worked out all of the details at once yet
11:07 < sipa> so... anothing 4/3 / log(4/3) speedup?
11:07 < gmaxwell> you're still using the jonasless multi-exp, right?  those are big enough that it'll be faster though not enormously so.
11:08 < andytoshi> gmaxwell: correct
11:08 < sipa> eh, 4/3 / (log(64)/log(48))
11:08 < andytoshi> sipa: yeah. though it's a little artificial since we're actually using 32-bit proofs in practice right now
11:08 < sipa> i see
11:09 < andytoshi> so even though it's unfair it might make more sense to compare the 32-bit proofs to 64-bit bulletproofs, so only a ~1.75x speedup
11:09 < gmaxwell> yes, though the 32-bit proofs are too small. they were going to drive us to implement private exponent, which would be a further slowdown.
11:09 < andytoshi> good point
11:10 < gmaxwell> esp when the goal of the CT is CJ, you can't just leak the LSBs of your amounts. (vs the goal being commercial contracting amount privacy, where you can)
11:10 < andytoshi> very good point
11:20 < andytoshi> with the endomorphism on, the old rangeproofs are 91.5us/bit (so 5856us for 64-bits). bulletproofs actually take a 2-3% longer (matching sipa's graph). though with endo on, i'm firmly into "should be using pippenger" territory
11:21 < andytoshi> the speedup drops to a paltry 2.37x because the old rangeproofs do so much better
11:22 < sipa> right, multiexp with larger point count removes the advantages of endo
11:25 < andytoshi> i will email you two, and benedikt and dan, with these numbers
11:25 < gmaxwell> perhaps patch in jonas' code and try again? :P
11:25 < andytoshi> oo yeah, good call, that might be trivial
11:26 < gmaxwell> endo is useful longer with the wnaf-pippenger, IIRC.
11:26 < andytoshi> also, do either of you have a machine you can benchmark on? i am just running `./bench_rangeproof` vs `./bench_bulletproof`, it's super easy to do. my laptop is not a good benchmark machine, it's doing lots at once plus the CPU clock speed is constantly changing
11:26 < gmaxwell> well, you can pin the cpu speed.
11:26 < andytoshi> yeah, sipa's graph showing endo being useful with wnaf-pippenger out to thousands of points
11:27 < andytoshi> i'm not sure, a few kernel upgrades ago cpufreq stopped working because intel something something..
11:27 < sipa> andytoshi: you need to boot with kernel option intel_pstate=no
11:27 < andytoshi> oh, neat, thanks
11:28 < sipa> sorry, intel_pstate=disable
11:28 < sipa> https://www.kernel.org/doc/html/v4.12/admin-guide/pm/intel_pstate.html lists more values for that option; perhaps some other ones don't interfere with benchmarking either
11:36 -!- hdevalence [~hdevalenc@199-188-193-243.PUBLIC.monkeybrains.net] has joined #secp256k1     
11:37 -!- hdevalence is now known as hdevalence_     
11:41 < andytoshi> bleh, in dc055200 (get rid of precomputed H tables) sipa changed the WNAF_SIZE macro to take two arguments. jonas moves the one that takes 1 from ecmult_const_impl to ecmult_impl because he reuses it in strauss
11:42 < andytoshi> ok, just renamed one of them for now
11:51 < andytoshi> ok, with pippenger we drop from 2460us to 2260us, an additional 9%. total speedup 3.78x
11:52 < andytoshi> lemme reboot with that pstate option though and benchmark without a gui running
12:13 < andytoshi> ok, done. numbers are roughly the same: without the endomorphism both strauss and pippinger give 2360us for a 64-bit bulletproof. the old code would've taken 8200us.
12:14 < andytoshi> with the endomorphism, strauss slows down to 2420us while pippinger speeds up to 2230us. old code speeds up to 5840us.
12:15 < andytoshi> having said this, during compilation, but not during the benchmarks, i would get kernel messages whining about the CPU overheating and it throttling cores. i'd walk away from the system for a minute or two each time this happened, but it still makes me lose confidence in these numbers
12:17 < andytoshi> anyway i'll email y'all and dan/benedikt with this data
12:18 < sipa> well artifically reduce your cpu clock
12:19 < sipa> which may not be entirely representative, as you get relatively speaking faster RAM that way
12:19 < sipa> but it at least gives a fair comparison
12:22 < andytoshi> ah, i see, i should be able to set frequencies if pstate is disabled
12:23 < sipa> indeed
12:47 < andytoshi> awesome, thanks! TIL
12:47 < andytoshi> so, with my cpu set to 800mhz i get similar results (though more consistent ones after multiple runs)
12:48 < andytoshi> with endo on, a 64-bit rangeproof takes 13760us, vs 5706us for bulletproof-strauss and 5230us for bulletproof-pip. so 2.36x speedup
12:48 < andytoshi> without endo, the rangeproof takes 19584us, bf-strauss 5341us, and bf-pip 5554us. so 3.67x speedup
13:16 < andytoshi> ok, email sent
13:18 < maaku> andytoshi: each aggregation adds 2*b point exp, where b is the number of bits? so 17 + 128*b*n?
13:18 < sipa> andytoshi: i suspect you wrote "without" twice in the performance line of your mail
13:19 < andytoshi> oops
13:19 < andytoshi> maaku: yep
13:19 < maaku> cool thanks
13:20 < andytoshi> 17 seems high, i should investigate that, i had expected 9
13:21 < andytoshi> oh, i see, it's not a fixed 17
13:21 < andytoshi> it's 128*b*n + 2*log2(b*n) + 9
13:21 < andytoshi> err, +5
13:22 < andytoshi> and i'm undercounting by 1, it should be +6 (and my earlier "145" should've been 146). it's a quirk of the multiexp API that it takes a separate scalar to multiply the generator by
15:09 < andytoshi> in C, left-shifting by too many bits is actually UB right? even if i don't use the value ever?
15:45 < sipa> unsure about that
15:45 < sipa> the specification has a concept of "indeterminate value", which is not actually UB to produce, but is UB is you use it
16:30 < gmaxwell> I am vaguely thinking it is implementation defined.
16:36 < hdevalence_> it is UB
16:37 < hdevalence_> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf#page=574
16:38 < sipa> yup, shifting negative values is implementation defined
16:38 < sipa> shifting beyond the range is undefined
16:49 < gmaxwell> sipa: C89 and C99 differe on the negative values front.
16:51 < sipa> oh?     
16:57 -!- andytoshi-web [ac3a8820@gateway/web/freenode/ip.172.58.136.32] has joined #secp256k1     
16:59 < andytoshi-web> :/ that's annoying. can't do `const size_t n = 1 << d` without risking UB, can't declare it after checking because of declaration/statement ordering rules, can't declare it then check then initialize because of constness rules
16:59 < andytoshi-web> sometimes C89 really irritates me
17:01 -!- hdevalence_ [~hdevalenc@199-188-193-243.PUBLIC.monkeybrains.net] has quit [Quit: hdevalence_]     
17:01 < sipa> what's the problem with checking first and returning before defining?
17:04 < andytoshi-web> ?? if i return then my function won't do anything
17:05 < andytoshi-web> if i do `const size_t n; /* ...check... */ n = 1 << d;` the compiler claims i am assigning to an immutable variable
17:06 < sipa> but you can check whether d is outside of the range
17:06 < sipa> maybe i misunderstand what your conditional would do if it's outside of tbe range
17:07 < andytoshi-web> just return
17:07 < sipa> if (d within range) {
17:08 < andytoshi-web> the check is not the problem, the problem is that I can never assign n. If i do it before the check it's UB. if i do it after it's mutating a const
17:08 < sipa>     const size_t n = 1 << n;
17:08 < sipa>     ...
17:08 < sipa> } else {
17:08 < sipa>     return;
17:08 < sipa> }     
17:08 < andytoshi-web> ohh yes, i can wrap my entire function in an if
17:09 < andytoshi-web> that's a very hard-to-read sanity check pattern
17:10 < sipa> otherwise, wrap the function
17:10 < sipa> make a version that takes as input n
17:10 < sipa> and another function which takes as input d, and either returns immediately or calls the n version
17:11 < andytoshi-web> that is still verbose and moves logic far apart and now the inner function has a contract that can be violated by any direct callers
17:16 < sipa> perhaps just don't make it const :)
17:20 < andytoshi-web> Yeah, that'd be the cleanest thing I think :)
17:22 < sipa> constness is a tool provided by the compiler that helps you avoid certain mistakes
17:22 < sipa> but if the tool gets in the way of writing readable code, don't use it
17:29 -!- andytoshi-web [ac3a8820@gateway/web/freenode/ip.172.58.136.32] has quit [Ping timeout: 260 seconds]     
17:36 -!- andytoshi-web [ac3a89f5@gateway/web/freenode/ip.172.58.137.245] has joined #secp256k1     
17:37 < andytoshi-web> well, const on local variables is a tool to aid readability ... but i guess if it gets in the way of compileability I shouldn't use it
17:39 < sipa> in C99 you could have size_t val_mutable; if (cond) { return } else { val_mutable = ...; }; const size_t val = val_mutable; ...
17:41 -!- nickler [~nickler@185.12.46.130] has quit [Ping timeout: 268 seconds]     
17:44 < andytoshi-web> yeah, that'd be nice
17:49 -!- nickler [~nickler@185.12.46.130] has joined #secp256k1     
17:53 -!- andytoshi-web [ac3a89f5@gateway/web/freenode/ip.172.58.137.245] has quit [Ping timeout: 260 seconds]     
18:21 -!- kallewoof [~karl@67.205.138.199] has quit [Ping timeout: 260 seconds]     
18:22 -!- kallewoof [~karl@67.205.138.199] has joined #secp256k1     
19:10 -!- jtimon [~quassel@164.31.134.37.dynamic.jazztel.es] has quit [Ping timeout: 268 seconds]