--- Log opened Mon Jul 20 00:00:26 2020
00:19 < kcalvinalvin> One good news is that besides the txvalidation, now the acculmulator is the bottleneck on master
00:19 < kcalvinalvin> Those sha hashes do really add up once you get around block ~300,000
02:02 < kcalvinalvin> hmmm I did some more profiling and it's the gc that's taking up 40% of the time when CheckBlock is disabled on CSN
07:07 < dergoegge>  adiabat: #161 fixes the unexpected EOF crash at the end of genproofs. i know you wanted to this but i already had this in my live blocks PR.
07:51 < adiabat> dergoegge: cool will check it out
07:51 < adiabat> kcalvinalvin: 40% does seem kind of high but makes sense with pollard and all the pointers and stuff
07:52 < adiabat> what I'm trying to figure out now is the server taking so much CPU to serve blocks...
07:52 < adiabat> it feels like it shouldn't take any, but just reading the blocks off the disk takes 100% of a core
07:53 < adiabat> I think I need to write a like... send block to network thing, because right now it's running deserialize only to run serialize right after
07:53 < adiabat> so seems like if we know the length of the block we can just tread it as unknown bytes and send it directly from the disk to network.  will try that
08:17 < dergoegge> adiabat: additionally you could make so pushBlocks reads multiple blocks at once instead of each block individually. something like GetRawBlocksFromDisk but without Deserialize.
08:23 < kcalvinalvin> So I did a little benchmark on mainnet without txverify and it's slow...
08:24 < kcalvinalvin> Just working on optimizing CSN
08:30 < adiabat> is it GC stuff that gets bigger?  Or hashing
08:31 < adiabat> one huge performance issue is that it doesn't have any pollard caching right now
08:31 < adiabat> so it's not just re-transferring recent hashes but also re-computing them
08:32 < adiabat> and... hm, the main hard part of enabling the pollard caching stuff was synchronization between the server and client
08:32 < adiabat> but if we keep the server sending the whole thing, but have pollard caching on just to skip re-computing
08:32 < adiabat> that might make it a lot faster
08:33 < adiabat> and that seems like something I could change without too much trouble
08:33 < kcalvinalvin> Hashes do build up. Towards the tip of syncing mainnet, hashes pile up
08:33 < kcalvinalvin> but in the beginning it's a lot of gc
08:33 < kcalvinalvin> I guess it'
08:33 < adiabat> I think both would be alleviated with caching the pollard nodes instead of throwing them all away each time
08:33 < kcalvinalvin> s just that hashes start becoming bigger while gc is still taking long
08:33 < adiabat> maybe the GC less so, but it also would go down
08:33 < kcalvinalvin> *more hashes
08:34 < adiabat> ok so this afternoon will get server to not deserialize (might be easy) and try to enable some caching / ram useage for client
08:35 < adiabat> (that part might be harder, but hopefully not bad)
08:35 < kcalvinalvin> grabPos is one of the slower functions in the accumulator. I guess it's just that it's called a ton and things pile up
08:35 < adiabat> also I kindof want to get backwards validation running because it's super fun heh
08:35 < adiabat> yeah grabPos is inefficient as well
08:36 < kcalvinalvin> client uses a lot of memory too.. Not sure if we're leaking something but it took up 6gbs of ram in my testing to block ~400,000
08:36 < adiabat> a lot of times you grabPos leaves, and they're right next to each other and you end up descending down the same paths almost to the end
08:36 < adiabat> client uses 6GB? that's not right...
08:36 < adiabat> oh i think go's GC just like never actually frees ram or something
08:36 < kcalvinalvin> Yeah we're probably just pointing to trash
08:37 < adiabat> like it's never actually "using" much but since there's so much churn
08:37 < adiabat> I don't understand the intricacies of GC / OS interaction but I think it shows up in 
08:37 < adiabat> 'top' but won't actually cause OOM errors
08:37 < adiabat> at least I've seen that in go before
08:38 < kcalvinalvin> https://github.com/mit-dci/utreexo/blob/d94de1beeda69594ee6bf6977e398131a56dee69/accumulator/pollard.go#L358
08:38 < kcalvinalvin> Also this is taking 5 seconds according to pprof..?
08:39 < kcalvinalvin> breakdown shows this `5.53s      5.53s   5407a2:                     XORL $0x1, BX`
08:39 < kcalvinalvin> Not sure why xor is taking forever
08:41 < adiabat> wait what's the line 358 thing?
08:41 < adiabat> if n.niece[lr^1] == nil {
08:41 < adiabat> is that if statement taking a lot of time?
08:42 < kcalvinalvin> Well, not even the if just the xor
08:42 < adiabat> or... yeah causing a ton of GC
08:42 < adiabat> nah it's gotta be the new(polNode) part
08:42 < adiabat> an xor can't take any time right?
08:43 < kcalvinalvin> So I did think that and I tested it by doing the new(podNode) part beforehand
08:43 < kcalvinalvin> pprof says it's the xor...
08:43 < adiabat> I assumed the lr^1 is not actually happening more than once, because the compiler optimizes that?
08:43 < adiabat> it's straightforward optimization but if it doesn't, then sure we can say xlr := lr^1 at the top of that loop
08:44 < kcalvinalvin> The assembly breakdown shows lr^1 just being done everytime
08:44 < adiabat> aw man
08:44 < adiabat> that is disappointing.
08:44 < adiabat> well, guess the go optimizer is not very good
08:45 < adiabat> lr := uint8(bits>>h) & 1; lrx := lr^1
08:45 < kcalvinalvin> One thing I also found out today is that any fmt will be allocated to the heap, even if it doesn't escape
08:45 < adiabat> or heck, compute all the lrs and lrxs before the loop
08:45 < kcalvinalvin> So every error fmt is putting on gc pressure
08:46 < adiabat> huh weird
08:46 < kcalvinalvin> We should make error bitflag stuff
08:46 < adiabat> well we can get rid of errors in grabPos then yeah
08:46 < kcalvinalvin> Yeah there is a github issue about it (made in 2017) but they haven't really implemented anything yet
08:46 < adiabat> so on the one hand, make grabPos internally faster with not re-xoring
08:46 < adiabat> a go compiler issue?
08:47 < kcalvinalvin> not sure
08:47 < adiabat> about optimizations?  yeah that seems like a super simple optimization, I assumed all compilers did that
08:47 < adiabat> like if you keep saying f(a+b)
08:47 < adiabat> if (a+b) > 2, etc etc
08:47 < adiabat> then... don't add a and b each time
08:48 < adiabat> just... do it once.  Maybe it thinks lr can change
08:48 < kcalvinalvin> go compiler is also sort of an oddball in that they have a pesudo go-assembly that is translated into actual assembly. This was done for good reasons that I forgot
08:48 < adiabat> but even that... nothing changes lr in the loop which is also easy to see
08:48 < adiabat> anyway we can compute all the lrs and their compliments before even entering the loop
08:48 < kcalvinalvin> Ok yeah I'll try that
08:49 < adiabat> the larger issue is that grabPos could be optimized a lot in that it could cache things
08:49 < adiabat> that's harder... i think I tried doing something like that and gave up a while back
08:49 < kcalvinalvin> When are we aiming to do the binary release? Still tuesday?
08:49 < adiabat> yeah I mean if it's slow that just makes performance improvements look better :)
08:50 < adiabat> v 0.1 takes 4 hours.  2 days later, v0.2 takes 3 hours
08:50 < adiabat> at this rate, in a few weeks IBD will be instantaneous! :)
08:50 < kcalvinalvin> gah well I hoped maybe we could release it and have it beat bitcoind by a lot
08:51 < kcalvinalvin> It's a lot slower than bitcoind at the moment even without sig verification
08:51 < kcalvinalvin> What's the most urgent thing to do right now?
08:54 < kcalvinalvin> Ah and also, proofs for bridgenode are sorta big. utree/ takes up 251gb when synced to #500,000. I ran out of space while I was trying to sync to the tip
08:54 < kcalvinalvin> I'm assuming you need 400gb of free space
09:04 < adiabat> yeah sounds about right
09:04 < adiabat> but I don't think we should touch mainnet initially
09:05 < adiabat> I think it's cool even if slower
09:05 < adiabat> because... you get this "pollardFile" which is 500 bytes! you can scp it between computers instantly and it starts right back up
09:06 < adiabat> you can xxd it and it doesn't even scroll
09:06 < adiabat> maybe we should have pollardFile saved in base64 or something so you can copy and paste it
09:06 < adiabat> yeah if I do `base64 pollardFile` it's 10 lines
09:08 < dergoegge> if we are slower than bitcoind without CheckBlock then btcd/blockchain.ValidateTransactionScripts is a lot slower than the bitcoind equivalent right?
09:09 < kcalvinalvin> No doubt. It's just a way slower implementation in general from the script execution to the elliptical curve stuff
09:09 < dergoegge> i also think it's ok if it's slower right now. i mean if we point out the benefits e.g the scp thing then the slowness is just something to improve
09:10 < kcalvinalvin> Ethereum's bindings to libsecp256k1 was really just about the same as btcec since the calls to cgo was taking up half the time
09:10 < dergoegge> so there is no way to do fast ec in go?
09:11 < kcalvinalvin> Well you could batch the ec so you don't do one call per operation
09:11 < kcalvinalvin> But there really isn't any easy way per se
09:11 < kcalvinalvin> *one call per operation to cgo
09:21 < dergoegge> why does btcd or ethereum not do that? it should be a lot faster
09:27 < kcalvinalvin> idk about Ethereum but btcd doesn't really have active contributors nowadays. irc seems to be picking back up and dan is looking at things as well but not sure
09:28 < kcalvinalvin> There are some things btcec isn't doing that they have in a idle PR too so that would also help
19:56 -!- dergoegge [uid453889@gateway/web/irccloud.com/x-wqwekxwzcpkzfqdf] has quit [Quit: Connection closed for inactivity]
20:01 < adiabat> ok not super well tested but I'm merging the server skip-deserialize change
20:01 < adiabat> it makes the server take less CPU, but... it still takes a bunch.  it feels like it should take basically none...
20:01 < adiabat> maybe good enough for now
20:08 < adiabat> hm yeah I guess it's reasonable.  The server has pretty high CPU in the beginning, but after the blocks get bigger the client slows down
20:08 < adiabat> then the server only uses 5% of a core or so.  So that's pretty good.
20:09 < adiabat> I think before the server was actually slower than the client, so now maybe the client goes faster
20:15 < fanquake> utreexo meeting is the late one tonight isn't ?
20:39 -!- aaronc2347 [uid451246@gateway/web/irccloud.com/x-bpmwofjwmlinmloy] has joined #utreexo
20:41 < adiabat> yes the call will be at 12:00 EDT tomorrow
20:41 < adiabat> I think tomorrow is the "release".  There's still lots to improve but it does most of the basics
20:42 < adiabat> I might try to get it to do backwards validation tomorrow before release because that's a bit of a gimmick but fun / show-off-y
20:42 < adiabat> like if you can fly an airplane upside down it must be a pretty good airplane right?
20:43 < adiabat> and if you can verify the blockchain backwards, well clearly that's a very important blockchain validation method
20:44 < fanquake> 👍
--- Log closed Tue Jul 21 00:00:26 2020