--- Day changed Sat Aug 15 2015 07:49 <@andytoshi> with the lehmer stuff i linked, i have a 45% perf improvement on the const inversion :) 07:49 <@andytoshi> still far away from the 82% improvement that i see with gmp 07:50 <@sipa> wait, can you give actual timings? 07:51 <@andytoshi> far scalar_inverse_var: 13.6us constant, 2.6us gmp, 8.2us my code 07:51 <@sipa> ok, thanks 07:51 <@andytoshi> actually 7.8us my code, i found another optimization 07:51 <@sipa> and field inverse? 07:51 <@andytoshi> one sec 07:53 <@andytoshi> const 6.23us gmp 3.03us my code 9.10us 07:53 <@andytoshi> weeird, lemme check on that.. 07:53 <@sipa> ha 07:53 <@sipa> the multiplication ladder used for field inverses is pretty efficient 07:54 <@sipa> due to the large number of 1s 07:55 <@andytoshi> that explains it 07:56 <@andytoshi> i'll keep working, i'm sure there's low-hanging fruit still. (and there is some bug that causes it to infinite-loop about 25% of the times that i run the test binary, fixing that might also be a perf improvement) 08:29 -!- GAit [~GAit@2-230-161-158.ip202.fastwebnet.it] has joined #secp256k1 08:51 -!- GAit [~GAit@2-230-161-158.ip202.fastwebnet.it] has quit [Read error: Connection reset by peer] 08:51 -!- GAit [~GAit@2-230-161-158.ip202.fastwebnet.it] has joined #secp256k1 09:28 <@andytoshi> nope, found the bug, it was perf neutral 09:41 <@andytoshi> my current code is at https://github.com/apoelstra/secp256k1/tree/jacobi if you guys are curious 09:42 <@andytoshi> new stuff is in num_4x64_impl.h and num_native_impl.h; i'm using `bench_internal inverse` for benchmarks 09:49 <@sipa> i think the verify in native impl needs a subscript of num words rather than 4 09:50 <@andytoshi> yup, thx 09:52 <@andytoshi> and i never actually use that _verify function, initially i'd thought i wanted the top word to be 0 except inside of the div_mod algorithm, but it turned out i needed a couple extra bits of space in a bunch of places (including tests.c) 10:27 -!- GAit [~GAit@2-230-161-158.ip202.fastwebnet.it] has quit [Quit: Leaving.] 12:02 -!- GAit [~GAit@2-230-161-158.ip202.fastwebnet.it] has joined #secp256k1 12:51 -!- GAit [~GAit@2-230-161-158.ip202.fastwebnet.it] has quit [Read error: Connection reset by peer] 12:52 -!- GAit [~GAit@2-230-161-158.ip202.fastwebnet.it] has joined #secp256k1 13:59 -!- jtimon [~quassel@69.29.134.37.dynamic.jazztel.es] has quit [Ping timeout: 260 seconds] 15:44 -!- GAit [~GAit@2-230-161-158.ip202.fastwebnet.it] has quit [Quit: Leaving.] 16:26 <@andytoshi> i've pushed a new version. the numbers for my code are now 6.4us (scalar_inverse) and 6.7us (field_inverse). vs const-time ladder this is better than 50% speedup for scalar; almost caught up with field 16:29 <@andytoshi> i spent a while implementing this http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.49.5661&rep=rep1&type=pdf (sec 4.3) to optimize specifically for field inversions, but it was waay slower for a variety of reasons, not all fixable 16:30 <@andytoshi> MPI and java bignum use that; gmp does not. 19:09 -!- jtimon [~quassel@69.29.134.37.dynamic.jazztel.es] has joined #secp256k1