--- Day changed Sat Aug 15 2015
07:49 <@andytoshi>  with the lehmer stuff i linked, i have a 45% perf improvement on the const inversion :)
07:49 <@andytoshi> still far away from the 82% improvement that i see with gmp
07:50 <@sipa> wait, can you give actual timings?
07:51 <@andytoshi> far scalar_inverse_var:    13.6us constant, 2.6us gmp, 8.2us my code
07:51 <@sipa> ok, thanks
07:51 <@andytoshi> actually 7.8us my code, i found another optimization
07:51 <@sipa> and field inverse?
07:51 <@andytoshi> one sec
07:53 <@andytoshi> const 6.23us   gmp  3.03us   my code 9.10us
07:53 <@andytoshi> weeird, lemme check on that..
07:53 <@sipa> ha
07:53 <@sipa> the multiplication ladder used for field inverses is pretty efficient
07:54 <@sipa> due to the large number of 1s
07:55 <@andytoshi> that explains it
07:56 <@andytoshi> i'll keep working, i'm sure there's low-hanging fruit still. (and there is some bug that causes it to infinite-loop about 25% of the times that i run the test binary, fixing that might also be a perf improvement)
08:29 -!- GAit [~GAit@2-230-161-158.ip202.fastwebnet.it] has joined #secp256k1
08:51 -!- GAit [~GAit@2-230-161-158.ip202.fastwebnet.it] has quit [Read error: Connection reset by peer]
08:51 -!- GAit [~GAit@2-230-161-158.ip202.fastwebnet.it] has joined #secp256k1
09:28 <@andytoshi> nope, found the bug, it was perf neutral
09:41 <@andytoshi> my current code is at https://github.com/apoelstra/secp256k1/tree/jacobi if you guys are curious
09:42 <@andytoshi> new stuff is in num_4x64_impl.h and num_native_impl.h; i'm using `bench_internal inverse` for benchmarks
09:49 <@sipa> i think the verify in native impl needs a subscript of num words rather than 4
09:50 <@andytoshi> yup, thx
09:52 <@andytoshi> and i never actually use that _verify function, initially i'd thought i wanted the top word to be 0 except inside of the div_mod algorithm, but it turned out i needed a couple extra bits of space in a bunch of places (including tests.c)
10:27 -!- GAit [~GAit@2-230-161-158.ip202.fastwebnet.it] has quit [Quit: Leaving.]
12:02 -!- GAit [~GAit@2-230-161-158.ip202.fastwebnet.it] has joined #secp256k1
12:51 -!- GAit [~GAit@2-230-161-158.ip202.fastwebnet.it] has quit [Read error: Connection reset by peer]
12:52 -!- GAit [~GAit@2-230-161-158.ip202.fastwebnet.it] has joined #secp256k1
13:59 -!- jtimon [~quassel@69.29.134.37.dynamic.jazztel.es] has quit [Ping timeout: 260 seconds]
15:44 -!- GAit [~GAit@2-230-161-158.ip202.fastwebnet.it] has quit [Quit: Leaving.]
16:26 <@andytoshi> i've pushed a new version. the numbers for my code are now 6.4us (scalar_inverse) and 6.7us (field_inverse). vs const-time ladder this is better than 50% speedup for scalar; almost caught up with field
16:29 <@andytoshi> i spent a while implementing this http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.49.5661&rep=rep1&type=pdf (sec 4.3) to optimize specifically for field inversions, but it was waay slower for a variety of reasons, not all fixable
16:30 <@andytoshi> MPI and java bignum use that; gmp does not.
19:09 -!- jtimon [~quassel@69.29.134.37.dynamic.jazztel.es] has joined #secp256k1