--- Day changed Tue Sep 15 2015
01:48 -!- andytoshi [~andytoshi@unaffiliated/andytoshi] has quit [Ping timeout: 250 seconds]
--- Log opened Thu Sep 24 12:57:41 2015
12:57 -!- kanzure [~kanzure@unaffiliated/kanzure] has joined #secp256k1
12:57 -!- Irssi: #secp256k1: Total of 19 nicks [2 ops, 0 halfops, 0 voices, 17 normal]
12:57 -!- Irssi: Join to #secp256k1 was synced in 1 secs
12:57 -!- Guest24663 [~jorn@g227014.upc-g.chello.nl] has joined #secp256k1
12:58 -!- Guest24663 [~jorn@g227014.upc-g.chello.nl] has left #secp256k1 ["Konversation terminated!"]
12:58 -!- CodeShark [~CodeShark@cpe-76-167-237-202.san.res.rr.com] has joined #secp256k1
13:00 <@gmaxwell> I guess that brings us back to which DER subset it handles. Right now it's broad enough that it will not roundtrip.
13:02 < andytoshi> current consensus rules are too broad to roundtrip...if we can change consensus rules i think we should use the compact encoding
13:02 < andytoshi> (i think)
13:03 <@gmaxwell> andytoshi: In Bitcoin the BIP66 rules will round-trip.
13:03 <@sipa> unless the signature is {}
13:03 <@sipa> which is allowed by BIP66, but not a valid signature
13:05 <@gmaxwell> The context for my question is that I'm using a fuzz tester to generate parser test cases.  A round tripping test sounded good, but I knew it wouldn't work. And indeed it doesn't work. I'll probably move over the bip66 test from my older harness and test roundtripping only in that case.
13:05 < andytoshi> i think it'd be very useful if we could roundtrip
13:05 < andytoshi> except for the one bit in s, ECDSA is a strong signature. i'd hate to lose that property for the sake of encoding
13:05 <@sipa> gmaxwell: so you think we should document exactly which DER subset is supported?
13:06 <@gmaxwell> andytoshi: we do roundtrip if restricted to the BIP66 subset (minus the empty signature, as sipa nodes).  Or at least we damn well should. I'll have results later. :)
13:07 <@gmaxwell> sipa: In terms of safe interfaces our current interface encourages people to reproduce Bitcoin's mistakes.
13:07 < andytoshi> ok, so i'm suggesting that be the only subset we accept
13:07 <@sipa> gmaxwell: agree
13:07 <@gmaxwell> andytoshi: so we have a compatiblity problem because in bitcoin we need to check the historical chain.
13:07 < andytoshi> ahh gross
13:08 <@gmaxwell> We could have seperate parse functions, e.g. a normal one and a _lax.
13:08 <@sipa> gmaxwell: that is fine by me
13:09 < andytoshi> ditto
13:10 < andytoshi> that makes me a bit more comfortable since _lax wouldn't be "consensus code" except in the sense that'd it'd have to validate the historical chain, which is an easily determinable property
13:10 < andytoshi> s/a bit//
13:12 <@gmaxwell> We could even make the availablity of lax a module. __ducks__
13:14 < andytoshi> hehe, actually i would like that
13:15 < andytoshi> it really ought to be bitcoin-specific, any new blockchains shouldn't need it
13:15 <@sipa> we can of course also switch to actually pure strict DER in libsecp256k1
13:38 -!- CoinMuncher [~jannes@178.132.211.90] has joined #secp256k1
14:10 -!- CoinMuncher [~jannes@178.132.211.90] has quit [Quit: Leaving.]
14:27 -!- zmanian [uid113594@gateway/web/irccloud.com/x-rdnzemvwameoygoy] has joined #secp256k1
14:31 <@gmaxwell> I have to say, the parsing split has made testing a lot easier.
14:41 <@gmaxwell> FWIW, currently generating parser test cases using AFL and https://people.xiph.org/~greg/parse_harness.patch ... opinions on putting tools like this (without build system integration?) in a verify directory?
14:57 <@sipa> sounds good
14:58 <@sipa> also, making it compile as part of the test system may be uaeful
14:58 <@sipa> so its code does not go uncompilable
14:59 <@gmaxwell> K. I guess I could add ifdefs for module support.
14:59 -!- btcdrak [uid115429@gateway/web/irccloud.com/x-mmesabtwospzgqsy] has joined #secp256k1
15:01 <@gmaxwell> hm. we don't really have a way to test if a pubkey or a signature object is invalid.
15:01 <@gmaxwell> I mean externally to the library.
15:02 <@sipa> of course not... they always are
15:02 <@sipa> (valid)
15:03 <@sipa> ah, no
15:03 <@gmaxwell> e.g. if parsing fails.
15:03 <@sipa> when pubkey_parse fails, it creates an excplititly invalid object
15:03 <@sipa> which is guaranteed to not result in undefined behaviour if used
15:03 <@sipa> yes, we should add that
15:04 <@gmaxwell> I want to test that in the api tests, but it seemed to me like something that should be public. ... ugh, more API surface area. :(
15:14 <@gmaxwell> sipa: we have some defines like DETERMINSTIC and VERIFY which should be namespaced.
15:14 <@sipa> ugh, yes
15:20 <@gmaxwell> speaking of ugh. so most of the gap remaining for MISRA conformance is small, one thing that is a little obnoxious is in the C spec (e.g. C99) it is only required that the first 31 characters of identifers are significant (in C90 this was actually only the first 6 which is nuts).  We violate the 31 rule, in enough places to be annoying but (somewhat surprisingly) not everywhere.
15:22 <@gmaxwell> E.g. secp256k1_fe_normalizes_to_zero_var is allowed to be interperted the same as secp256k1_fe_normalizes_to_zero according to C99. You can have more characters, you just can't depend on them being distinct.
15:22 <@sipa> ugh
15:23 <@sipa> let's prefix every internal function name by the base32 encoding of the md5 sum of what follows
15:23 <@gmaxwell> fortunately, a checker tool will tell me where all the cases are.... but they will not prevent you from wanting to kill me if I go fix it. :)
15:24 <@gmaxwell> I think a few abbrivations e.g. normalizes -> norms and likewise will handle it. At least for C99 (and fuck C90 on this, I've never personally encountered a compiler that would only look at 6 characters.. except maybe the DMR compiler on PDP)
15:25 <@sipa> ok
15:25 <@sipa> because certifications are awesome
15:25 <@sipa> and being able to say "We are MISRA <long version number sequence> compliant!" sounds almost as good as "We are ISO9001 compliant"
15:26 <@gmaxwell> beyond certifying, which is fun, I could imagine someone using some embedded compiler and getting some really nasty bugs.
15:26 < TD-Linux> gmaxwell, does that mean the entirety of daala's API with the "daala_" prefix is not C89 compliant?
15:27 <@gmaxwell> TD-Linux: I believe it's implementation defined, indeed.
15:29 <@sipa> gmaxwell: So, that means that having two functions with the same 31/6-character prefix in a codebase can lead to 1) working fine 2) result in the wrong function being called 3) link error (if the linker truncates the names)?
15:29 <@gmaxwell> sipa: Yes.
15:30 <@sipa> epic
15:30 <@gmaxwell> Fixing suckyness in C like this is why these standards exist.
15:30  * TD-Linux submits a patch to gold to speed up linking
15:30 <@gmaxwell> Fortunately GCC/clang sucks less than C, but who knows what awful compiler someone will use on the code.
15:31 <@gmaxwell> same people also probably will not bother running the tests (their target has no screen, how would they see their output?? :P )
15:32 <@gmaxwell> sipa: with respect to static linking, with libtool the source doesn't even get compiled twice for static and shared. Thats part of why I really don't like having preprocessor macros for static vs not.
15:38 <@gmaxwell> In any case, most of the work required for MISRA left is documentation work. To claim complaince we'll need to write a requirements document (and all the software functionality should be tracable to the requirements),  and create a compliance matrix which documents deviations; here is what one looks like: www.state-machine.com/doc/AN_QP-C_MISRA.pdf  (though I'll make an ascii one... :P )
15:39 <@gmaxwell> I figure it doesn't need to hold up the release, we can just make progress and get any remaining disruptive changes in...
15:48 < maaku> there's no reason _lax has to be in the secp256k1 codebase. it could be a 'secp256k1_legacy.c' file in bitcoin repo
15:51 <@gmaxwell> thats true, though this would potentially make the library less useful for others.
15:52 <@gmaxwell> as until very recently openssl was also lax, though even laxer than what we'd implement. :-/
15:55 <@sipa> so there are 3 options
15:56 <@sipa> 1) have a as-wide-as-possible parser (in addition to maybe a strict der only one) in libsecp and require that consensus-critical callers do their own sanity checking on the input
15:56 <@sipa> 2) only strict DER, and put a parse-and-reencode-as-DER in bitcoin (which may be skipped post BIP66, but that's somewhat of a layer violation)
15:58 <@sipa> 3) have a bunch of flags to the parser to indicate what der violations are allowed
15:58 <@gmaxwell> (1) is a large development excercise, and would require a lot of testing. E.g. as it would have really be a BER parser.  It also has the bad effect of presenting an unsafe default. Even outside of consensus most applications really do not want non-canonical signatures.
15:58 <@gmaxwell> oh lol we actually need to report another vulnerability to openssl.
15:58 <@gmaxwell> damnit.
16:02 <@gmaxwell> (3) is also perhaps just as bad as (1) depending on how far we go with it.
16:03 <@sipa> i guess we could do (2) and have an exposed parse+reencode function that is more lax, with no guarantees about what it actually supports
16:03 <@sipa> that will make it less convenient to use the non-safe behaviour in new apps
16:04 <@gmaxwell> I think thats my preference.  Or instead of parse/reencode.. just a second parser?
16:04 <@sipa> a second parser works too
16:04 <@sipa> (though is less annoying)
16:05 <@sipa> i feel like we shouldn't intentionally have anti features, though
16:05 <@sipa> so a second parser it is
16:05 <@gmaxwell> we can call it _risky. :)
16:05 <@gmaxwell> or _sloppy  ... who wants to use sloppy?
16:06 <@sipa> secp256k1_signature_parse_yes_i_have_read_the_terms_and_conditions(...);
16:06 <@gmaxwell> _postel_was_wrong_but_I_wont_admit_it()
16:06 <@sipa> _dont_use_if_uncertain()
16:06 <@gmaxwell> _ThErE_bE_dRaGoNs()
16:07 <@sipa> oh, 31-character limit :(
16:07 <@gmaxwell> you can go over, it just has to be unique before that point. :)
16:07 <@sipa> yes, so a suffix won't help
16:07 <@sipa> function names are case sensitive and preserving, right?
16:07 <@gmaxwell> Yes.
16:08 <@sipa> if so, we can encode one bit of checksum in each of the first 31 characters
16:08 <@sipa> in its lower/uppercaseness
16:08  * gmaxwell stabs
16:08 <@sipa> is there a word for that?
16:08 < TD-Linux> does that mean my type names aren't reserved if they end with a _t and are longer than 31 characters?
16:08 <@gmaxwell> sipa: abusive
16:08 <@gmaxwell> TD-Linux: LOL
16:08 <@sipa> gmaxwell: a word for "lower/upper caseness" i mean
16:09 <@gmaxwell> sipa: capitolization. (I likely spelled that wrong)
16:09 <@sipa> capitalization, i believe
16:10 <@sipa> capitolization would either be the act of turning a city into a capitol
16:10 <@sipa> or the influence of the california town capitola :p
16:13 <@gmaxwell> In secp256k1_ec_pubkey_serialize's docs... the flags paramter seems to not tell you how to get uncompressed. :P
16:14 <@sipa> we should list the flags' bits and their effect explicitly at the call site
16:14 <@sipa> also, they shouldn't be passed through to the module
16:15 <@sipa> module shouldn't depend on public api definitions, only the .c file
16:17 < cfields> sipa: i forgot to update yesterday.. i kinda gave up on the clang formatting thing. it was taking forever and no end in sight :\
16:17 < cfields> i was hoping to break it off into chunks, but i'm beginning to think that doing it all at once might actually be the less painful route
16:19 <@sipa> cfields: agree
16:20 <@gmaxwell> I'm fine with it happening at once, especially if doing it that way makes it easier to shed paint the settings without wasting your time. :)
16:20 < cfields> sipa: there is one quick/easy one for me though, if you're still in favor of moving the pregenerated file from .h -> .c
16:21 <@sipa> cfields: that's fine by me
16:21 < cfields> gmaxwell: yea, i think that's what sucked the life out of me. knowing that even after it was done, there would be a list of ~30 things that would just cause more bickering
16:22 < cfields> a good example was that it wanted to do (void*) -> (void *) all over the place
16:22 < TD-Linux> moving the generated file to .c makes sense. the only reason it wasn't was because the idea before was to only have one object file generated
16:23 <@sipa> we can support multiple object files just fine... but it prevents inter-module optimization, which we need
16:23 <@sipa> but for a bit of pregenerated data it is not useful
16:24 <@gmaxwell> cfields: yea, I was kinda worrying about that when I heard you were doing a lot of manual work. I doubt we'll care about most things, but I don't want to get stuck with some formatting decision that will be hard to stick with just because I don't want to burn you out with redoing work.
16:24 < cfields> sipa: not arguing that point, but out of curiosity, have you measured lto's effect on smarter inlining?
16:25 <@sipa> cfields: nope, i assume lto greatly avoids that
16:25 <@sipa> though what's the point, compiling is super fast
16:25 < cfields> gmaxwell: well it was basically just 'clang-format blah...' && git add -p. but i underestimated how long it'd take
16:25 < cfields> sipa: avoids smarter inlining?
16:25 <@gmaxwell> cfields: well currently it has ~no effect, so it's hard to measure. Still-- not available everywhere, and the inilining is really critical in this codebase.
16:26 < cfields> sipa: again, i wasn't arguing for it. just curious as to the effect
16:26 <@sipa> cfields: avoid the problem
16:26 <@sipa> cfields: so i think lto would work fine as a replacement for having everything in one compilation unit
16:26 <@gmaxwell> cfields: based on my expirence elsewhere LTO should more or less work here.
16:26 <@sipa> cfields: but with little benwfit
16:27 <@sipa> which re
16:27 < cfields> sipa: ah, thanks for clarifying
16:27 <@sipa> which reminds me: benchmark the effects of -O1 -O2 -O3 -Os etc
16:27 < cfields> gmaxwell: roger
16:28 < cfields> sipa: you're forgetting -Ofast for the ricers :)
16:28 <@sipa> oh
16:29 <@sipa> and -O42 -fomit-broken-code, i guess
16:29 <@gmaxwell> in GCC Ofast should be the same as O3 for us, we have no floating point!. :P
16:29 <@sipa> the gentoo default
16:29 <@gmaxwell> (IIRC Ofast is just O3 and ffast-math currently)
16:29 < cfields> gmaxwell: i was just about to ask that, actually. Are there no flags that make non-floating point operations less safe somehow?
16:30 <@gmaxwell> cfields: none that matter for us. There is a flag that breaks errno handling IIRC.
16:30 < cfields> seems like a ridiculous question as i ask it, but i'm usually surprised by what compilers let you get away with
16:32 <@gmaxwell> cfields: or another way to look at it, the 'less safe' flags are O2 and especially O3.
16:32 <@gmaxwell> shocking as that may sound.
16:32 <@gmaxwell> Due to C aliasing rules.
16:32 < cfields> on obscure platforms? or real world concerns?
16:32 <@sipa> the strict aliasing rules... are they C99?
16:32 <@gmaxwell> Oh reall world, I'm just referring to the fact that C has very strict rules for what pointers can alias other pointers, _most_ C programmers are not very familar with them, lots of code violates them.. and optimization with respect to them can cause miscompliation in practice. And exploiting these rules is enabled by default.
16:33 <@gmaxwell> sipa: no. they apply everwhere.
16:33 < cfields> sipa: we hit some vocal aliasing warnings in the sha256 code, i'm not sure what set those apart
16:34 <@gmaxwell> esp. people who try to write in-place parsing code that accesses the same memory with two different non-char types and without a union, this stuff actually does get 'miscomplied' in practice on modern compilers.
16:35 < cfields> gmaxwell: i assumed it mostly had to do with alignment where (for ex) a dereferenced int64 ptr has 32bit alignment. Is that the usual case you're referring to, or is there more black magic i've missed out on?
16:35 <@sipa> it's not about alignment
16:36 <@sipa> but more about the fact that some value written to a variable may still be in a register and not flushed to memory if you access the same memory address through a different type pointer
16:36 <@sipa> because the compiler cannot infer that you're referring to that same memory
16:36 <@gmaxwell> cfields: No-- though alignment is another thing that people get wrong (mostly because x86 is astonishingly permissive).
16:37 < cfields> oh wow
16:37  * cfields has some reading to do
16:37 <@gmaxwell> The simplest statement of the rule is that you may not have two pointers of different types to the same memory, except where one is a character type.
16:38 <@gmaxwell> The compiler is allowed to assume that pointers to non-character types never alias, and can optimize loops with respect to the assumption (e.g. leaving data in registers as sipa notes).
16:38 <@gmaxwell> C99 adds the restrict flag to get even _more_ strict aliasing control, where you say that a pointer doesn't even alias any other pointer of the same type.
16:39 < cfields> i see
16:39 <@sipa> which we actually use, and actually improves the generated code
16:39 <@gmaxwell> In general the strict aliasing stuff in C improves performance a lot, thats why the compiler writers are so keen to use it.
16:40 <@sipa> cfields: if the hashing code gives warnings we should definitely look at it
16:40 < cfields> mm, isn't that very common in (for ex) byte-swapping macros?
16:40 < cfields> sipa: no, you were right. that was an alignment issue, and you took care of it already
16:40 <@sipa> those typically have char pointers :)
16:40 <@sipa> ah!
16:41 <@gmaxwell> cfields: you should use a union for that.. hopefully. (or go through char). (there is even some language lawyers debate if union is enough to bypass the rules, but that debate is not taken seriously because deseralization code without using unions would be hell).
16:42 < cfields> gmaxwell: yea, i can't think of any actual cases off the top of my head. but i can swear i've seen macros that do high/low swapping of integral types that way
16:43 <@gmaxwell> Some of this gets wrapped up in this decades long debate where the language authors say that C is a language, with no promises that the commands have any relation to what the machine does.  Vs lots of engineers who think C is a fancy macro assembler that maps directly to the machine.  ... which is how compilers more or less worked... 30 years ago. :)
16:43 < cfields> well, that was great to learn. I'll read up on the details for sure. Thank you both for the quick tips :)
16:44 <@gmaxwell> cfields: oh yea, there is lots of code that is flat out wrong with respect to the aliasing rules.
16:44 < cfields> heh
17:50 < CodeShark> the direct mapping to machine code is a lot more relevant to those who work on embedded systems
17:50 < CodeShark> generally speaking, that is
17:50 < CodeShark> most people who program for PCs don't even know how to use a compiler :p
17:52 < CodeShark> I should add systems programming to that list, I suppose - not just embedded systems
17:52 < CodeShark> but to most app developers, meh :p
18:42 -!- Pierre_Rochard [~Pierre@unaffiliated/pierre-rochard/x-3593157] has joined #secp256k1
19:10 <@sipa> :)
19:13 <@gmaxwell> holy crap. debian installer is kind of offensive. It needlessly hard binds language/country/timezone... you pick a language and it limits what countries you can select, pick a country and it limits what timezones you can select.
19:13 <@gmaxwell> "screw you computer, I want US english + GMT"
19:15 <@sipa> hmm, never noticed!
19:15 <@sipa> but you can easily change the timezone later
19:15 <@sipa> independently.of the rest
19:16 <@gmaxwell> yea, I know. just the assumption that language implies country implies timezone is really culturally/politically myopic. (also, I dunno why everyone doesn't keep the computer times in GMT; geesh.)
19:18 <@sipa> gmaxwell: that question is equivalent to "I dunno why everyonr doesn't just use GMT everywhere; geesh
19:18 <@gmaxwell> well I'm fine with using civil time generally, but as soon as you have infrastructure managed by multiple people all over the world...
19:37 < midnightmagic> Don't suppose y'all noticed my comment about the benchmark/unification+windowG override commit I stuffed into my secp branch..?
19:39 <@sipa> i have not looked at the commit at all
19:40 < midnightmagic> Okay. It's enough for me to know you're aware of it.
19:41 <@sipa> i am not aware of anything, and shall soon forget this conversation
19:49 -!- adam3us1 [~Adium@195.138.228.20] has quit [Quit: Leaving.]
19:58 < midnightmagic> MY KIND OF HUMAN
20:32 <@gmaxwell> heh. New desktop here does 392k ECDSA verifies per second.
20:41 <@sipa> 2.55us per verify?
20:41 <@sipa> is that a 32-core machine...?
20:45 <@gmaxwell> 24 cores of haswell v3.
20:45 <@gmaxwell> sipa: you are asleep.
20:55 <@sipa> is that actually benchmarked by running 24 bench_verify's in parallel?
20:55 <@sipa> or by taking the number from one and extrapolating
20:59 <@gmaxwell> 24 in parallel.
21:00 <@gmaxwell> I think these CPUs don't have turbo enabled in any case.
21:49 -!- Pierre_Rochard [~Pierre@unaffiliated/pierre-rochard/x-3593157] has quit [Quit: Pierre_Rochard]
22:07 -!- maaku [~quassel@173-228-107-141.dsl.static.fusionbroadband.com] has quit [Remote host closed the connection]
22:26 -!- maaku [~quassel@173-228-107-141.dsl.static.fusionbroadband.com] has joined #secp256k1
22:27 -!- maaku is now known as Guest16568