--- Log opened Fri Jun 12 00:00:47 2020
00:39 -!- ThomasV [~thomasv@unaffiliated/thomasv] has joined #utreexo
02:48 < ThomasV> is deletion of leaves order-independent?
02:52 < ThomasV> also, does batch deletion result in the same tree as multiple DeleteOne operations?
05:04 -!- ThomasV [~thomasv@unaffiliated/thomasv] has quit [Ping timeout: 272 seconds]
05:52 -!- gojiHmPFPN [~textual@c-73-47-220-190.hsd1.ma.comcast.net] has joined #utreexo
06:03 -!- gojiHmPFPN [~textual@c-73-47-220-190.hsd1.ma.comcast.net] has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
06:17 -!- gojiHmPFPN [~textual@c-73-47-220-190.hsd1.ma.comcast.net] has joined #utreexo
06:25 -!- rafalcpp [~racalcppp@ip-178-214.ists.pl] has joined #utreexo
08:11 -!- ThomasV [~thomasv@unaffiliated/thomasv] has joined #utreexo
09:20 < adiabat> ThomasV: from the accumulator's point of view, deletion order does matter, but we delete everythin in a block at the same time
09:21 < adiabat> I don't think DeleteOne is used at all... maybe we should get rid of it
09:21 < ThomasV> adiabat: ok, so it means an implementation that deletes in a different order would still be compatible
09:22 < adiabat> Possibly not; calling DeleteOne many times is much less efficient than deleting all at once, and I may result in the leaves being in different order
09:22 < ThomasV> oh, sorry, I read "does not matter" :(
09:23 < adiabat> so that would change the proofs.  It may end up with the same ordering, but there's ni guarantee
09:23 < adiabat> *no
09:23 < ThomasV> hmm, ok, so another implementation would have to go block after block
09:23 < adiabat> It's possible to get the same result deleting one at a time, but it would be a lot slower
09:24 < adiabat> because with each deletion, you need to hash all the way back up to the root
09:24 < ThomasV> yeah thats what I'm currently doing :D
09:24 < adiabat> in a tree with say 1 million leaves, and you delete 1000, that'd be about 1000*20 hashes
09:25 < adiabat> but deleting all at the same time ends up being much fewer hashes
09:25 < adiabat> especially if they are near each other
09:26 < ThomasV> adiabat: in the paper you mention maintaining an index of utxo->path_to_leaf. how is that cheaper than using pointers and not having to maintain that index?
09:26 < adiabat> How would pointers work?
09:26 < ThomasV> (for the bridge node, where all hashes are written contiguously)
09:27 < adiabat> The main thing is given a hash, you need to know where it is
09:27 < adiabat> so you can do a linear search, it just takes too long
09:27 < ThomasV> well, you could do a malloc for each node and have pointer to the parent
09:28 < adiabat> yes you can get there, but you don't know where "there" is.  If I'm given some 32 byte leaf hash, I know that leaf is somewhere in my tree
09:28 < adiabat> but my tree has millions of leaves, and they're not in order
09:29 < ThomasV> oh, but you would still have an index, utxo->pointer to the node in memory
09:29 < adiabat> right, which is basically what it is
09:29 < ThomasV> the thing is you would not need to maintain it
09:29 < adiabat> it's just a mapping of hash to position, which is uint64
09:29 < ThomasV> yes but you need to update those positions everytime you shuffle a subtree
09:29 < adiabat> oh if it was all pointer based, the pointer would follow it you mean
09:29 < ThomasV> yes
09:30 < adiabat> yeah, I actually had pollard working that way for a while
09:30 < adiabat> and changed it because it was trickier... it does work though
09:30 < ThomasV> was it slower?
09:30 < adiabat> I don't think so
09:31 < adiabat> but... it's not really needed in pollard ( the sparse forest)
09:31 < adiabat> the full forest is where it's needed, and I haven't tried it there
09:31 < ThomasV> maybe batch deletion is easier with the array approach because you can easily sort leaves
09:32 < adiabat> I can see how it might be faster, you just swap pointers when moving things
09:32 < adiabat> but it also seems a bit harder to code, but maybe worth it
09:33 < ThomasV> hmm.. it's actually easier to code.. unless there is something I don't get in the array approach
09:33 < adiabat> I'm not sure how much the position map slows things down... I don't *think* it's a bottleneck compared to hashing but haven't measured
09:33 < ThomasV> btw, why go and not c++/bitcoind ?
09:34 < adiabat> I just liking working with go better :)  Eventually I will need to make a c++ version I think
09:35 < adiabat> the part that would be harder with pointer based is keeping it all on disk.  In ram it's not too bad, but then need to write to disk sometimes, maybe not every block
09:36 < adiabat> also forest currently has no pointers involved, so switching to pointers is 50% more memory usage
09:36 < adiabat> (every 32 byte hash gets an extra 2 8-byte pointers, 32 -> 48 bytes per node in tree)
09:39 < ThomasV> how do you update the position map? do you maintain a list of all leaves that have been shuffled in the current batch?
09:42 < adiabat> every time a leave moves (anything on the bottom row) the position map is updated
09:42 < adiabat> https://github.com/mit-dci/utreexo/blob/master/accumulator/forest.go#L226
09:43 < adiabat> I think that function is the only place it happens
09:47 < ThomasV> adiabat: another question. when you do a swap, I guess you swap with the closest leaf on the row. choosing another candidate would probably affect the result, just like deletion is order dependent
09:47 < ThomasV> so, to perform swaps in a compatible way, I kind of need the position index, regardless of using pointers
09:49 < adiabat> The swaps are not always with closest
09:49 < adiabat> there's some freedom in how the swaps work, especially when deleting many things at once
09:49 < ThomasV> ok.. but the choice would affect the result, right?
09:49 < adiabat> with DeleteOne there's basically only one way to do it
09:49 < adiabat> yeah that would affect the ordering of the leaves
09:50 < adiabat> It's very possible that there are better ways to swap things, that result in shorter proofs by keeping leaves together
09:50 < ThomasV> so it would be good to document how the candidate is chosen :-)
09:50 < adiabat> I haven't played around with it, and it probably doesn't make a huge difference, but it could be something to optimize
09:51 < ThomasV> my question is not about performance, it's about reproducibility
09:51 < adiabat> ah OK then yeah everyone has to do it the same way otherwise they'll get different proofs
09:52 < ThomasV> exactly
09:52 < adiabat> right now it's pretty simple, it's "extract twins" so ignore two deletions that are siblings
09:53 < adiabat> then the swaps are with the sibling of the next deletion to the right
09:53 < ThomasV> ok, that's what I thought :-)
09:53 < adiabat> so that's "closest" but only looking at the next over; there could be an algorithm that minimizes swap distance over the whole range
09:53 < adiabat> https://github.com/mit-dci/utreexo/blob/master/accumulator/transform.go#L104
09:53 < ThomasV> but you can see that if I only use pointers, I will not know who is the 'next to the right'
09:54 < adiabat> that's the swap algo now which is pretty simple
09:54 < adiabat> well even with pointers it should be possible I think
09:55 < adiabat> you can tell position by the path you take to get to a node, how many left / right pointers you take
09:55 < ThomasV> I think too.. I just need to know what you are doing 
09:56 < adiabat> the current way is to look at the positions of all the deletions
09:56 < adiabat> and operate first on those positions to get a list of what to swap
09:57 < adiabat> in DeleteOne, you can't actually swap anything, you can only move roots
09:58 < adiabat> swapping is the optimization deleting many at a time gives you, as you can move everything around into the final ordering and then re-hash everything from there
09:58 < adiabat> interestingly, when *adding* leaves, adding one at a time is just as efficient as adding many at once
10:01 < adiabat> there are a bunch of things where you could definitely do it differently and it would still work, and maybe even be better
10:01 < adiabat> one of the biggest: right now it takes the block, deletes all inputs then adds all outputs
10:01 < adiabat> this seems simplest, but might not be most efficient!
10:02 < adiabat> doing add first then delete would be simple to change, but think that would be worse / slower
10:02 < adiabat> but what might be better is to mark all the deletions, overwrite them with the new additions, and then only do deletions / swapping / etc if there are more deletions than additions
10:03 < adiabat> that would minimize the swapping part, in many (most!) blocks there wouldn't be any
10:03 < ThomasV> oh but it would be hard to maintain compatibility with your implementation then
10:03 < adiabat> the downside is the leaves would not be grouped together as much by creation time.  So maybe it ends up being worse due to longer proofs
10:03 < adiabat> yeah this would be totally incompatible
10:04 < adiabat> right now though nobody's using it in the wild yet so
10:04 < adiabat> easy to switch :)
10:05 < adiabat> I haven't tried looking at the performance of different ways, maybe I should... my guess is it won't make much difference so I've just left it the simplest way
10:05 < adiabat> but if it ends up being 20% faster or something, definitely worth switching
10:09 < ja> with the c bindings i made it would be easier for any alternate implementation to ensure compatibility ;)
10:10 < adiabat> ja: yeah bindings are easiest.  I think it's also cool if there's a different language rewrite, because then might find some edge cases & nail down the spec better
10:11 < adiabat> but yeah it is a "consensus" kind of thing; if two different computers do different things with the accumulator, their proofs won't work for each other
10:12 < ThomasV> heh
10:14 < adiabat> hmm also if we do find better ways to do it, there can be different versions, so when they connect they should announce versions to check compatibility
10:14 < adiabat> like maybe people start using it and a year later there's a better algorithm
10:15 < adiabat> for nodes to upgrade / update to a different algo, they can be presented with a proof
10:15 < adiabat> that whatever accumulator state they have now, here's the equivalent with the new algo
10:16 < adiabat> proof would be pretty big though.  But it beats having to re-sync the whole blockchain
10:22 < ThomasV> adiabat: suppose I have a light node with only the accumulator. I get a block, with a proof for each utxo that is spent in the block. I guess I will also need to receive a special ordering, that matches the batch deletion process?
10:24 < ThomasV> or at least I will need to implement a strategy that mimics the batch deletion
10:24 < adiabat> the way we have it now, is it gets the proof, and the utxo data for each input
10:25 < adiabat> and the proof does cointain a list of positions within the forest
10:26 < ThomasV> oh, but the swap candidate is part of the proof, so I guess that's how you can do it
10:26 < adiabat> yes, the deletion algorithm is such that only the proofs are needed / used for swaps
10:26 < ThomasV> ...my brain hurts..
10:29 < adiabat> they both use the exact same data
10:29 < adiabat> heh :)  You can kind of thing of "verify proof" and "delete" as the same function; 
10:29 < adiabat> (hm those two lines got switched)
12:57 -!- ThomasV [~thomasv@unaffiliated/thomasv] has quit [Ping timeout: 272 seconds]
13:37 -!- ThomasV [~thomasv@unaffiliated/thomasv] has joined #utreexo
13:53 -!- jb55 [~jb55@gateway/tor-sasl/jb55] has quit [Remote host closed the connection]
13:54 -!- jb55 [~jb55@gateway/tor-sasl/jb55] has joined #utreexo
14:18 -!- ThomasV [~thomasv@unaffiliated/thomasv] has quit [Ping timeout: 272 seconds]
14:54 -!- ThomasV [~thomasv@unaffiliated/thomasv] has joined #utreexo
15:21 -!- ThomasV [~thomasv@unaffiliated/thomasv] has quit [Ping timeout: 246 seconds]
19:05 -!- gojiHmPFPN [~textual@c-73-47-220-190.hsd1.ma.comcast.net] has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
19:07 -!- gojiHmPFPN [~textual@c-73-47-220-190.hsd1.ma.comcast.net] has joined #utreexo
19:35 -!- fanquake [sid369002@gateway/web/irccloud.com/x-dpentetezvtqljkq] has quit [Ping timeout: 256 seconds]
20:03 -!- fanquake [sid369002@gateway/web/irccloud.com/x-atimgvzlvyuuhsrv] has joined #utreexo
20:09 -!- gojiHmPFPN [~textual@c-73-47-220-190.hsd1.ma.comcast.net] has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
20:54 -!- fanquake [sid369002@gateway/web/irccloud.com/x-atimgvzlvyuuhsrv] has quit [Ping timeout: 264 seconds]
20:59 -!- ThomasV [~thomasv@unaffiliated/thomasv] has joined #utreexo
21:18 -!- fanquake [sid369002@gateway/web/irccloud.com/x-zpfuishxfwzmfiwj] has joined #utreexo
22:02 -!- ThomasV [~thomasv@unaffiliated/thomasv] has quit [Ping timeout: 256 seconds]
22:34 -!- ThomasV [~thomasv@unaffiliated/thomasv] has joined #utreexo
23:17 -!- ThomasV [~thomasv@unaffiliated/thomasv] has quit [Ping timeout: 264 seconds]
--- Log closed Sat Jun 13 00:00:48 2020