--- Log opened Mon Jul 20 00:00:26 2020 00:19 < kcalvinalvin> One good news is that besides the txvalidation, now the acculmulator is the bottleneck on master 00:19 < kcalvinalvin> Those sha hashes do really add up once you get around block ~300,000 02:02 < kcalvinalvin> hmmm I did some more profiling and it's the gc that's taking up 40% of the time when CheckBlock is disabled on CSN 07:07 < dergoegge> adiabat: #161 fixes the unexpected EOF crash at the end of genproofs. i know you wanted to this but i already had this in my live blocks PR. 07:51 < adiabat> dergoegge: cool will check it out 07:51 < adiabat> kcalvinalvin: 40% does seem kind of high but makes sense with pollard and all the pointers and stuff 07:52 < adiabat> what I'm trying to figure out now is the server taking so much CPU to serve blocks... 07:52 < adiabat> it feels like it shouldn't take any, but just reading the blocks off the disk takes 100% of a core 07:53 < adiabat> I think I need to write a like... send block to network thing, because right now it's running deserialize only to run serialize right after 07:53 < adiabat> so seems like if we know the length of the block we can just tread it as unknown bytes and send it directly from the disk to network. will try that 08:17 < dergoegge> adiabat: additionally you could make so pushBlocks reads multiple blocks at once instead of each block individually. something like GetRawBlocksFromDisk but without Deserialize. 08:23 < kcalvinalvin> So I did a little benchmark on mainnet without txverify and it's slow... 08:24 < kcalvinalvin> Just working on optimizing CSN 08:30 < adiabat> is it GC stuff that gets bigger? Or hashing 08:31 < adiabat> one huge performance issue is that it doesn't have any pollard caching right now 08:31 < adiabat> so it's not just re-transferring recent hashes but also re-computing them 08:32 < adiabat> and... hm, the main hard part of enabling the pollard caching stuff was synchronization between the server and client 08:32 < adiabat> but if we keep the server sending the whole thing, but have pollard caching on just to skip re-computing 08:32 < adiabat> that might make it a lot faster 08:33 < adiabat> and that seems like something I could change without too much trouble 08:33 < kcalvinalvin> Hashes do build up. Towards the tip of syncing mainnet, hashes pile up 08:33 < kcalvinalvin> but in the beginning it's a lot of gc 08:33 < kcalvinalvin> I guess it' 08:33 < adiabat> I think both would be alleviated with caching the pollard nodes instead of throwing them all away each time 08:33 < kcalvinalvin> s just that hashes start becoming bigger while gc is still taking long 08:33 < adiabat> maybe the GC less so, but it also would go down 08:33 < kcalvinalvin> *more hashes 08:34 < adiabat> ok so this afternoon will get server to not deserialize (might be easy) and try to enable some caching / ram useage for client 08:35 < adiabat> (that part might be harder, but hopefully not bad) 08:35 < kcalvinalvin> grabPos is one of the slower functions in the accumulator. I guess it's just that it's called a ton and things pile up 08:35 < adiabat> also I kindof want to get backwards validation running because it's super fun heh 08:35 < adiabat> yeah grabPos is inefficient as well 08:36 < kcalvinalvin> client uses a lot of memory too.. Not sure if we're leaking something but it took up 6gbs of ram in my testing to block ~400,000 08:36 < adiabat> a lot of times you grabPos leaves, and they're right next to each other and you end up descending down the same paths almost to the end 08:36 < adiabat> client uses 6GB? that's not right... 08:36 < adiabat> oh i think go's GC just like never actually frees ram or something 08:36 < kcalvinalvin> Yeah we're probably just pointing to trash 08:37 < adiabat> like it's never actually "using" much but since there's so much churn 08:37 < adiabat> I don't understand the intricacies of GC / OS interaction but I think it shows up in 08:37 < adiabat> 'top' but won't actually cause OOM errors 08:37 < adiabat> at least I've seen that in go before 08:38 < kcalvinalvin> https://github.com/mit-dci/utreexo/blob/d94de1beeda69594ee6bf6977e398131a56dee69/accumulator/pollard.go#L358 08:38 < kcalvinalvin> Also this is taking 5 seconds according to pprof..? 08:39 < kcalvinalvin> breakdown shows this `5.53s 5.53s 5407a2: XORL $0x1, BX` 08:39 < kcalvinalvin> Not sure why xor is taking forever 08:41 < adiabat> wait what's the line 358 thing? 08:41 < adiabat> if n.niece[lr^1] == nil { 08:41 < adiabat> is that if statement taking a lot of time? 08:42 < kcalvinalvin> Well, not even the if just the xor 08:42 < adiabat> or... yeah causing a ton of GC 08:42 < adiabat> nah it's gotta be the new(polNode) part 08:42 < adiabat> an xor can't take any time right? 08:43 < kcalvinalvin> So I did think that and I tested it by doing the new(podNode) part beforehand 08:43 < kcalvinalvin> pprof says it's the xor... 08:43 < adiabat> I assumed the lr^1 is not actually happening more than once, because the compiler optimizes that? 08:43 < adiabat> it's straightforward optimization but if it doesn't, then sure we can say xlr := lr^1 at the top of that loop 08:44 < kcalvinalvin> The assembly breakdown shows lr^1 just being done everytime 08:44 < adiabat> aw man 08:44 < adiabat> that is disappointing. 08:44 < adiabat> well, guess the go optimizer is not very good 08:45 < adiabat> lr := uint8(bits>>h) & 1; lrx := lr^1 08:45 < kcalvinalvin> One thing I also found out today is that any fmt will be allocated to the heap, even if it doesn't escape 08:45 < adiabat> or heck, compute all the lrs and lrxs before the loop 08:45 < kcalvinalvin> So every error fmt is putting on gc pressure 08:46 < adiabat> huh weird 08:46 < kcalvinalvin> We should make error bitflag stuff 08:46 < adiabat> well we can get rid of errors in grabPos then yeah 08:46 < kcalvinalvin> Yeah there is a github issue about it (made in 2017) but they haven't really implemented anything yet 08:46 < adiabat> so on the one hand, make grabPos internally faster with not re-xoring 08:46 < adiabat> a go compiler issue? 08:47 < kcalvinalvin> not sure 08:47 < adiabat> about optimizations? yeah that seems like a super simple optimization, I assumed all compilers did that 08:47 < adiabat> like if you keep saying f(a+b) 08:47 < adiabat> if (a+b) > 2, etc etc 08:47 < adiabat> then... don't add a and b each time 08:48 < adiabat> just... do it once. Maybe it thinks lr can change 08:48 < kcalvinalvin> go compiler is also sort of an oddball in that they have a pesudo go-assembly that is translated into actual assembly. This was done for good reasons that I forgot 08:48 < adiabat> but even that... nothing changes lr in the loop which is also easy to see 08:48 < adiabat> anyway we can compute all the lrs and their compliments before even entering the loop 08:48 < kcalvinalvin> Ok yeah I'll try that 08:49 < adiabat> the larger issue is that grabPos could be optimized a lot in that it could cache things 08:49 < adiabat> that's harder... i think I tried doing something like that and gave up a while back 08:49 < kcalvinalvin> When are we aiming to do the binary release? Still tuesday? 08:49 < adiabat> yeah I mean if it's slow that just makes performance improvements look better :) 08:50 < adiabat> v 0.1 takes 4 hours. 2 days later, v0.2 takes 3 hours 08:50 < adiabat> at this rate, in a few weeks IBD will be instantaneous! :) 08:50 < kcalvinalvin> gah well I hoped maybe we could release it and have it beat bitcoind by a lot 08:51 < kcalvinalvin> It's a lot slower than bitcoind at the moment even without sig verification 08:51 < kcalvinalvin> What's the most urgent thing to do right now? 08:54 < kcalvinalvin> Ah and also, proofs for bridgenode are sorta big. utree/ takes up 251gb when synced to #500,000. I ran out of space while I was trying to sync to the tip 08:54 < kcalvinalvin> I'm assuming you need 400gb of free space 09:04 < adiabat> yeah sounds about right 09:04 < adiabat> but I don't think we should touch mainnet initially 09:05 < adiabat> I think it's cool even if slower 09:05 < adiabat> because... you get this "pollardFile" which is 500 bytes! you can scp it between computers instantly and it starts right back up 09:06 < adiabat> you can xxd it and it doesn't even scroll 09:06 < adiabat> maybe we should have pollardFile saved in base64 or something so you can copy and paste it 09:06 < adiabat> yeah if I do `base64 pollardFile` it's 10 lines 09:08 < dergoegge> if we are slower than bitcoind without CheckBlock then btcd/blockchain.ValidateTransactionScripts is a lot slower than the bitcoind equivalent right? 09:09 < kcalvinalvin> No doubt. It's just a way slower implementation in general from the script execution to the elliptical curve stuff 09:09 < dergoegge> i also think it's ok if it's slower right now. i mean if we point out the benefits e.g the scp thing then the slowness is just something to improve 09:10 < kcalvinalvin> Ethereum's bindings to libsecp256k1 was really just about the same as btcec since the calls to cgo was taking up half the time 09:10 < dergoegge> so there is no way to do fast ec in go? 09:11 < kcalvinalvin> Well you could batch the ec so you don't do one call per operation 09:11 < kcalvinalvin> But there really isn't any easy way per se 09:11 < kcalvinalvin> *one call per operation to cgo 09:21 < dergoegge> why does btcd or ethereum not do that? it should be a lot faster 09:27 < kcalvinalvin> idk about Ethereum but btcd doesn't really have active contributors nowadays. irc seems to be picking back up and dan is looking at things as well but not sure 09:28 < kcalvinalvin> There are some things btcec isn't doing that they have in a idle PR too so that would also help 19:56 -!- dergoegge [uid453889@gateway/web/irccloud.com/x-wqwekxwzcpkzfqdf] has quit [Quit: Connection closed for inactivity] 20:01 < adiabat> ok not super well tested but I'm merging the server skip-deserialize change 20:01 < adiabat> it makes the server take less CPU, but... it still takes a bunch. it feels like it should take basically none... 20:01 < adiabat> maybe good enough for now 20:08 < adiabat> hm yeah I guess it's reasonable. The server has pretty high CPU in the beginning, but after the blocks get bigger the client slows down 20:08 < adiabat> then the server only uses 5% of a core or so. So that's pretty good. 20:09 < adiabat> I think before the server was actually slower than the client, so now maybe the client goes faster 20:15 < fanquake> utreexo meeting is the late one tonight isn't ? 20:39 -!- aaronc2347 [uid451246@gateway/web/irccloud.com/x-bpmwofjwmlinmloy] has joined #utreexo 20:41 < adiabat> yes the call will be at 12:00 EDT tomorrow 20:41 < adiabat> I think tomorrow is the "release". There's still lots to improve but it does most of the basics 20:42 < adiabat> I might try to get it to do backwards validation tomorrow before release because that's a bit of a gimmick but fun / show-off-y 20:42 < adiabat> like if you can fly an airplane upside down it must be a pretty good airplane right? 20:43 < adiabat> and if you can verify the blockchain backwards, well clearly that's a very important blockchain validation method 20:44 < fanquake> 👍 --- Log closed Tue Jul 21 00:00:26 2020