--- Log opened Wed Sep 02 00:00:06 2020
00:33 -!- jb55 [~jb55@gateway/tor-sasl/jb55] has quit [Remote host closed the connection]
00:34 -!- jb55 [~jb55@gateway/tor-sasl/jb55] has joined #utreexo
14:04 -!- belcher [~belcher@unaffiliated/belcher] has joined #utreexo
14:55 < adiabat> a problem with changing forest.Transform to "read everything first, compute, then write everything back"
14:56 < adiabat> is that sometimes there are huge moves, where you're swapping two large subtrees, and so you don't calculate anything but you have to read and write a lot of data
14:57 < adiabat> so you end up calling swapHashRange and on the bottom that can be millions of hashes to move
14:59 < adiabat> and there isn't really a bound for how big a swap can be; if you have 1 big perfect tree and delete a signle element on the left side
15:00 < adiabat> then you end up moving the whole right side to the left side, which means you're re-writing the whole thing
15:00 < adiabat> so "read everything first" sometimes means read the whole forest into ram, do stuff, then write the whole thing back to disk
15:00 < adiabat> that's not going to work
15:01 < adiabat> an idea is... a pollard-like disk representation
15:02 < adiabat> where you have, say 40 byte chunks, the first 32 bytes being the hash, and the next 8 being 2 "pointers"
15:02 < adiabat> the pointers being byte offsets in the file
15:03 < adiabat> (well pointer*40 is the byte offset)
15:03 < adiabat> this might be a kind of 2-birds-1-stone thing since it would also give an on-disk persistent pollard
15:04 < adiabat> the hard part is fragmentation
15:04 < adiabat> basically you'd have to do your own garbage collection / memory management
15:05 < adiabat> as when a node is deleted, we need to mark that that disk location is empty and overwrite it
15:05 < adiabat> or stash it for overwriting later
15:06 < adiabat> I don't know if it's an improvement though... it would completely get rid of giant swaps on disk
15:07 < adiabat> but then you also get the problem pollard has with descendToPos
15:07 < adiabat> where you have to read the location, get the pointer, seek, read next location, all the way down
15:09 < adiabat> I think it would be faster but I guess we need to know more about what makes forest / genproofs slow
15:09 < adiabat> or what the distribution of sizes of swaps is, maybe big swaps don't happen much
15:10 < adiabat> there are probably also ways to tweak the remove algorithm to reduce the size / row height of the swaps but that seems like a bad idea
15:13 < adiabat> ... maybe the fragmentation / GC of an on-disk pollard wouldn't be that bad
15:13 < adiabat> since in most cases you can overwrite instead of mark for deletion, you probably only need to keep track of deleted nodes when numleaves goes down
15:14 < adiabat> usually it doesn't go down too much, and just keeping a 4 byte record of each wouldn't be that much
15:15 < adiabat> ... that only applies for a full representation (forest) though; for pollard usage deletions would work differently
15:15 < adiabat> but even there, you can replace a lot, so still only need to track (currentNodes - maxEverNodes)
15:16 < adiabat> ok well I'm leaning towards trying to implement this; I don't think changing Modify() to cache or have different read/write patterns will work or help
15:17 < adiabat> anyway kcalvinalvin you've been looking at ... not exactly similar but a different way to change up how forest writes to disk
15:18 < adiabat> I'm leaning towards this because it feels like it could help unify pollard & forest which would be nice, right now there's a bunch of code doing... almost but not quite the same thing
15:19 < adiabat> having a ForestData style interface that handled the descendToPos stuff would be nice
15:20 < adiabat> this also might be a bad idea but maybe have to start working on it to see.  or if someone has a reason why it will be a bad idea could save me time :)
17:03 -!- achow101 [~achow101@unaffiliated/achow101] has quit [Read error: Connection reset by peer]
17:03 -!- achow101_ [~achow101@unaffiliated/achow101] has joined #utreexo
17:16 -!- achow101_ [~achow101@unaffiliated/achow101] has quit [Ping timeout: 256 seconds]
17:17 -!- achow101 [~achow101@unaffiliated/achow101] has joined #utreexo
18:56 < kcalvinalvin> adiabat heh PR #192 sounds awful lot like what you're describing here
18:58 < kcalvinalvin> I could go over this on the next call. Or maybe I think I can finish the implementation this week
19:19 < adiabat> from what I understand of #192 it's pretty different though
19:19 < adiabat> for example, if you want to swap something at row 10
19:20 < adiabat> in pollard, you change two pointers and you're done
19:20 < adiabat> for forest, you have to move 2047 hashes
19:25 < adiabat> I guess the cowtree wouldn't need to move the bottom 4K structures, just the pointers above?
19:28 < adiabat> but the cowtree design doesn't seem to lend itself to sparse representations, which would be nice because then pollard could use the same storage as forest...
20:36 < kcalvinalvin> adiabat If it's actually sparse, then you're wasting a lot of disk space
20:37 < kcalvinalvin> Since you do want to group the bottom hashes, you're gonna have to have a smallest representation, which would be the 127 node tree (forestRows = 6)
20:38 < kcalvinalvin> There would need to be a lot of empty nodes here for a sparse representation right?
20:38 < kcalvinalvin> I'm not sure if there is a clean design for both forest and pollard...
--- Log closed Thu Sep 03 00:00:07 2020