--- Log opened Wed Sep 02 00:00:06 2020 00:33 -!- jb55 [~jb55@gateway/tor-sasl/jb55] has quit [Remote host closed the connection] 00:34 -!- jb55 [~jb55@gateway/tor-sasl/jb55] has joined #utreexo 14:04 -!- belcher [~belcher@unaffiliated/belcher] has joined #utreexo 14:55 < adiabat> a problem with changing forest.Transform to "read everything first, compute, then write everything back" 14:56 < adiabat> is that sometimes there are huge moves, where you're swapping two large subtrees, and so you don't calculate anything but you have to read and write a lot of data 14:57 < adiabat> so you end up calling swapHashRange and on the bottom that can be millions of hashes to move 14:59 < adiabat> and there isn't really a bound for how big a swap can be; if you have 1 big perfect tree and delete a signle element on the left side 15:00 < adiabat> then you end up moving the whole right side to the left side, which means you're re-writing the whole thing 15:00 < adiabat> so "read everything first" sometimes means read the whole forest into ram, do stuff, then write the whole thing back to disk 15:00 < adiabat> that's not going to work 15:01 < adiabat> an idea is... a pollard-like disk representation 15:02 < adiabat> where you have, say 40 byte chunks, the first 32 bytes being the hash, and the next 8 being 2 "pointers" 15:02 < adiabat> the pointers being byte offsets in the file 15:03 < adiabat> (well pointer*40 is the byte offset) 15:03 < adiabat> this might be a kind of 2-birds-1-stone thing since it would also give an on-disk persistent pollard 15:04 < adiabat> the hard part is fragmentation 15:04 < adiabat> basically you'd have to do your own garbage collection / memory management 15:05 < adiabat> as when a node is deleted, we need to mark that that disk location is empty and overwrite it 15:05 < adiabat> or stash it for overwriting later 15:06 < adiabat> I don't know if it's an improvement though... it would completely get rid of giant swaps on disk 15:07 < adiabat> but then you also get the problem pollard has with descendToPos 15:07 < adiabat> where you have to read the location, get the pointer, seek, read next location, all the way down 15:09 < adiabat> I think it would be faster but I guess we need to know more about what makes forest / genproofs slow 15:09 < adiabat> or what the distribution of sizes of swaps is, maybe big swaps don't happen much 15:10 < adiabat> there are probably also ways to tweak the remove algorithm to reduce the size / row height of the swaps but that seems like a bad idea 15:13 < adiabat> ... maybe the fragmentation / GC of an on-disk pollard wouldn't be that bad 15:13 < adiabat> since in most cases you can overwrite instead of mark for deletion, you probably only need to keep track of deleted nodes when numleaves goes down 15:14 < adiabat> usually it doesn't go down too much, and just keeping a 4 byte record of each wouldn't be that much 15:15 < adiabat> ... that only applies for a full representation (forest) though; for pollard usage deletions would work differently 15:15 < adiabat> but even there, you can replace a lot, so still only need to track (currentNodes - maxEverNodes) 15:16 < adiabat> ok well I'm leaning towards trying to implement this; I don't think changing Modify() to cache or have different read/write patterns will work or help 15:17 < adiabat> anyway kcalvinalvin you've been looking at ... not exactly similar but a different way to change up how forest writes to disk 15:18 < adiabat> I'm leaning towards this because it feels like it could help unify pollard & forest which would be nice, right now there's a bunch of code doing... almost but not quite the same thing 15:19 < adiabat> having a ForestData style interface that handled the descendToPos stuff would be nice 15:20 < adiabat> this also might be a bad idea but maybe have to start working on it to see. or if someone has a reason why it will be a bad idea could save me time :) 17:03 -!- achow101 [~achow101@unaffiliated/achow101] has quit [Read error: Connection reset by peer] 17:03 -!- achow101_ [~achow101@unaffiliated/achow101] has joined #utreexo 17:16 -!- achow101_ [~achow101@unaffiliated/achow101] has quit [Ping timeout: 256 seconds] 17:17 -!- achow101 [~achow101@unaffiliated/achow101] has joined #utreexo 18:56 < kcalvinalvin> adiabat heh PR #192 sounds awful lot like what you're describing here 18:58 < kcalvinalvin> I could go over this on the next call. Or maybe I think I can finish the implementation this week 19:19 < adiabat> from what I understand of #192 it's pretty different though 19:19 < adiabat> for example, if you want to swap something at row 10 19:20 < adiabat> in pollard, you change two pointers and you're done 19:20 < adiabat> for forest, you have to move 2047 hashes 19:25 < adiabat> I guess the cowtree wouldn't need to move the bottom 4K structures, just the pointers above? 19:28 < adiabat> but the cowtree design doesn't seem to lend itself to sparse representations, which would be nice because then pollard could use the same storage as forest... 20:36 < kcalvinalvin> adiabat If it's actually sparse, then you're wasting a lot of disk space 20:37 < kcalvinalvin> Since you do want to group the bottom hashes, you're gonna have to have a smallest representation, which would be the 127 node tree (forestRows = 6) 20:38 < kcalvinalvin> There would need to be a lot of empty nodes here for a sparse representation right? 20:38 < kcalvinalvin> I'm not sure if there is a clean design for both forest and pollard... --- Log closed Thu Sep 03 00:00:07 2020