--- Day changed Wed Apr 22 2020
00:06 -!- brakmic [~brakmic@ip-176-198-41-116.hsi05.unitymediagroup.de] has joined #bitcoin-core-pr-reviews     
00:11 -!- Talkless [~Talkless@hst-227-49.splius.lt] has quit [Remote host closed the connection]     
00:13 -!- jb55 [~jb55@gateway/tor-sasl/jb55] has quit [Ping timeout: 240 seconds]     
00:14 -!- Talkless [~Talkless@hst-227-49.splius.lt] has joined #bitcoin-core-pr-reviews     
00:18 -!- jb55 [~jb55@gateway/tor-sasl/jb55] has joined #bitcoin-core-pr-reviews     
00:19 -!- brakmic [~brakmic@ip-176-198-41-116.hsi05.unitymediagroup.de] has quit [Remote host closed the connection]     
00:19 -!- Talkless [~Talkless@hst-227-49.splius.lt] has quit [Client Quit]     
00:19 -!- brakmic [~brakmic@185.183.85.108] has joined #bitcoin-core-pr-reviews     
00:20 -!- Talkless [~Talkless@hst-227-49.splius.lt] has joined #bitcoin-core-pr-reviews     
01:18 -!- mol [~molly@unaffiliated/molly] has joined #bitcoin-core-pr-reviews     
03:05 -!- Louisa26Johnson [~Louisa26J@ns334669.ip-5-196-64.eu] has joined #bitcoin-core-pr-reviews     
03:10 -!- Louisa26Johnson [~Louisa26J@ns334669.ip-5-196-64.eu] has quit [Ping timeout: 250 seconds]     
03:41 -!- vasild_ [~vd@gateway/tor-sasl/vasild] has joined #bitcoin-core-pr-reviews     
03:44 -!- vasild [~vd@gateway/tor-sasl/vasild] has quit [Ping timeout: 240 seconds]     
03:44 -!- vasild_ is now known as vasild     
04:43 -!- mol_ [~molly@unaffiliated/molly] has joined #bitcoin-core-pr-reviews     
04:44 -!- molz_ [~molly@unaffiliated/molly] has joined #bitcoin-core-pr-reviews     
04:46 -!- mol [~molly@unaffiliated/molly] has quit [Ping timeout: 256 seconds]     
04:48 -!- mol_ [~molly@unaffiliated/molly] has quit [Ping timeout: 258 seconds]     
05:37 -!- michaelfolkson [~textual@2a00:23c5:be01:b201:1885:5393:cbbe:8254] has joined #bitcoin-core-pr-reviews     
05:43 -!- grunch__ [~grunch@2800:810:547:857b:b467:713c:becb:5f23] has joined #bitcoin-core-pr-reviews     
06:09 -!- jonatack [~jon@2a01:e0a:53c:a200:bb54:3be5:c3d0:9ce5] has quit [Quit: jonatack]     
06:17 -!- MasterdonX [~masterdon@144.48.39.84] has quit [Ping timeout: 256 seconds]     
06:17 -!- masterdonx2 [~masterdon@81.19.209.60] has joined #bitcoin-core-pr-reviews     
06:23 -!- jonatack [~jon@2a01:e0a:53c:a200:bb54:3be5:c3d0:9ce5] has joined #bitcoin-core-pr-reviews     
06:59 -!- watchtower [~watchtowe@ool-4355d8ee.dyn.optonline.net] has joined #bitcoin-core-pr-reviews     
07:01 -!- michaelfolkson [~textual@2a00:23c5:be01:b201:1885:5393:cbbe:8254] has quit [Quit: Sleep mode]     
07:12 -!- watchtower [~watchtowe@ool-4355d8ee.dyn.optonline.net] has quit [Quit: This computer has gone to sleep]     
07:20 -!- watchtower [~watchtowe@ool-4355d8ee.dyn.optonline.net] has joined #bitcoin-core-pr-reviews     
07:30 -!- watchtower [~watchtowe@ool-4355d8ee.dyn.optonline.net] has quit [Quit: This computer has gone to sleep]     
07:46 -!- vasild [~vd@gateway/tor-sasl/vasild] has quit [Ping timeout: 240 seconds]     
07:46 -!- jb55 [~jb55@gateway/tor-sasl/jb55] has quit [Ping timeout: 240 seconds]     
07:46 -!- ghost43 [~daer@gateway/tor-sasl/daer] has quit [Ping timeout: 240 seconds]     
07:46 -!- kristapsk [~KK@gateway/tor-sasl/kristapsk] has quit [Ping timeout: 240 seconds]     
07:46 -!- sipa [~pw@gateway/tor-sasl/sipa1024] has quit [Ping timeout: 240 seconds]     
07:47 -!- _andrewtoth_ [~andrewtot@gateway/tor-sasl/andrewtoth] has quit [Ping timeout: 240 seconds]     
07:47 -!- jonatack [~jon@2a01:e0a:53c:a200:bb54:3be5:c3d0:9ce5] has quit [Ping timeout: 256 seconds]     
07:49 -!- jonatack [~jon@213.152.161.138] has joined #bitcoin-core-pr-reviews     
07:53 -!- vasild [~vd@gateway/tor-sasl/vasild] has joined #bitcoin-core-pr-reviews     
08:00 -!- ghost43 [~daer@gateway/tor-sasl/daer] has joined #bitcoin-core-pr-reviews     
08:00 -!- shaunsun [shaunsun@gateway/vpn/privateinternetaccess/shaunsun] has joined #bitcoin-core-pr-reviews     
08:00 -!- watchtower [~watchtowe@ool-4355d8ee.dyn.optonline.net] has joined #bitcoin-core-pr-reviews     
08:02 -!- jb55 [~jb55@gateway/tor-sasl/jb55] has joined #bitcoin-core-pr-reviews     
08:06 -!- michaelfolkson [~textual@2a00:23c5:be01:b201:1885:5393:cbbe:8254] has joined #bitcoin-core-pr-reviews     
08:07 -!- michaelfolkson [~textual@2a00:23c5:be01:b201:1885:5393:cbbe:8254] has quit [Client Quit]     
08:15 -!- michaelfolkson [~textual@2a00:23c5:be01:b201:1885:5393:cbbe:8254] has joined #bitcoin-core-pr-reviews     
08:18 -!- watchtower [~watchtowe@ool-4355d8ee.dyn.optonline.net] has quit [Quit: This computer has gone to sleep]     
08:19 -!- Tomas123 [bcc3e48d@ipbcc3e48d.dynamic.kabel-deutschland.de] has joined #bitcoin-core-pr-reviews     
08:25 -!- watchtower [~watchtowe@ool-4355d8ee.dyn.optonline.net] has joined #bitcoin-core-pr-reviews     
08:28 -!- ghost43 [~daer@gateway/tor-sasl/daer] has quit [Remote host closed the connection]     
08:28 -!- watchtower [~watchtowe@ool-4355d8ee.dyn.optonline.net] has quit [Client Quit]     
08:28 -!- watchtower [~watchtowe@ool-4355d8ee.dyn.optonline.net] has joined #bitcoin-core-pr-reviews     
08:29 -!- ghost43 [~daer@gateway/tor-sasl/daer] has joined #bitcoin-core-pr-reviews     
08:35 < jonatack> vasild: regarding your unit test for PR #17994, you posted diffs but I did not find a link to a commit to pull in to try your test... could/did you share a commit link?
08:35 -!- sipa [~pw@gateway/tor-sasl/sipa1024] has joined #bitcoin-core-pr-reviews     
08:35 < vasild> hmm     
08:36 < jonatack> (I'm referring to this gist https://gist.github.com/vasild/8c06b3dbc493522f683a671d71b4c122 which is only a diff)
08:36 < jonatack> (If you don't have it in a commit somewhere, never mind)
08:37 < vasild> I did not push it on github and post a link because at some point I will clean up that branch in my bitcoin/ fork and the link will be broke. This is why I posted the patch on GH as a comment - it will stay there forever. Let me see if I can push it somewhere...
08:38 < vasild> hmm, gist is not permanet either. I later found out that one can attach files to GH pull requests, maybe that would have been best.
08:42 < vasild> jonatack: I found some commits locally, but I am not sure they are the correct ones because I did multiple changes over time. However I am sure that what I uploaded to github / gist is the correct one. It would be safer to download the patch and apply it with `git apply`.
08:44 -!- grunch__ [~grunch@2800:810:547:857b:b467:713c:becb:5f23] has quit [Ping timeout: 265 seconds]     
08:44 < jonatack> If feasible, I think a test to verify behavior and catch future regressions would be valuable here. Ideally as an added commit to this PR or make a follow-up PR.
08:45 -!- watchtower [~watchtowe@ool-4355d8ee.dyn.optonline.net] has quit [Quit: This computer has gone to sleep]     
08:45 -!- watchtower [~watchtowe@ool-4355d8ee.dyn.optonline.net] has joined #bitcoin-core-pr-reviews     
08:49 < vasild> jonatack: like unit_test.diff from https://gist.github.com/vasild/8c06b3dbc493522f683a671d71b4c122
08:50 < vasild> but that required some tweaks into the source code, mainly to reduce the max blk file from 128MiB - I did not wanted to create 100s of full blocks in order to fill it because then analyzing the debug prinouts would be too tedious
08:50 < vasild> and I also analyzed the debug printouts manually
08:55 -!- watchtower [~watchtowe@ool-4355d8ee.dyn.optonline.net] has quit [Quit: Leaving]     
08:56 < jonatack> Yes, saw that. The code wasn't designed for the components to be easily tested separately.
09:03 -!- ghost43 [~daer@gateway/tor-sasl/daer] has quit [Remote host closed the connection]     
09:03 -!- ghost43 [~daer@gateway/tor-sasl/daer] has joined #bitcoin-core-pr-reviews     
09:10 -!- kristapsk [~KK@gateway/tor-sasl/kristapsk] has joined #bitcoin-core-pr-reviews     
09:11 -!- mol_ [~molly@unaffiliated/molly] has joined #bitcoin-core-pr-reviews     
09:13 -!- Blank44 [60f858e5@pool-96-248-88-229.cmdnnj.fios.verizon.net] has joined #bitcoin-core-pr-reviews     
09:14 -!- molz_ [~molly@unaffiliated/molly] has quit [Ping timeout: 258 seconds]     
09:24 -!- jesseposner [~jesseposn@c-67-188-220-154.hsd1.ca.comcast.net] has joined #bitcoin-core-pr-reviews     
09:24 < Blank44> Hey everyone!
09:25 < vasild> Blank44: Hello!
09:41 -!- lola_dam [56c00d0d@lfbn-ami-1-546-13.w86-192.abo.wanadoo.fr] has joined #bitcoin-core-pr-reviews     
09:55 < vasild> Meeting in 5 minutes
09:56 -!- oyster [~oyster@195.181.160.175.adsl.inet-telecom.org] has joined #bitcoin-core-pr-reviews     
09:56 -!- shaunsun [shaunsun@gateway/vpn/privateinternetaccess/shaunsun] has quit [Ping timeout: 240 seconds]     
09:57 -!- hanhua [63687d99@99-104-125-153.lightspeed.sntcca.sbcglobal.net] has joined #bitcoin-core-pr-reviews     
09:58 -!- robot-visions [ac5c0435@172.92.4.53] has joined #bitcoin-core-pr-reviews     
09:59 -!- jules6 [d98ac6fe@217.138.198.254] has joined #bitcoin-core-pr-reviews     
09:59 < vasild> Before we start - you are welcome ask questions at any time. Feel free to jump right into the discussion.
10:00 -!- lightlike [~lightlike@p200300C7EF17E400A53FED2388D01DAA.dip0.t-ipconnect.de] has joined #bitcoin-core-pr-reviews     
10:00 < vasild> #startmeeting
10:00 < jnewbery> hi     
10:00 < kanzure> hi     
10:00 < robot-visions> hi     
10:00 < emzy> hi     
10:00 < willcl_ark> hi     
10:00 < felixweis> hi     
10:00 < lightlike> hi     
10:00 < fjahr> hi     
10:00 < pinheadmz> hi     
10:00 < vasild> hi     
10:01 < nehan_> hi     
10:01 < vasild> Let's rock!
10:01 < vasild> Did you get a chance to review the pull request or take a look at it?
10:01 < jnewbery> y     
10:01 < nehan_> y     
10:01 < fjahr> y     
10:01 < vasild> y     
10:01 < lightlike> n     
10:01 < amiti> hi     
10:01 < amiti> n     
10:01 < willcl_ark> Only a quick look this week :|
10:02 < felixweis> y     
10:02 < emzy> n     
10:02 < pinheadmz> n :-/ just lurking this week
10:02 < robot-visions> y, but have some questions before submitting review
10:02 < jonatack> hi     
10:02 < vasild> robot-visions: yes?
10:02 < jonatack> y     
10:03 < vasild> robot-visions: feel free to ask here now, or later.
10:03 -!- shaunsun [shaunsun@gateway/vpn/privateinternetaccess/shaunsun] has joined #bitcoin-core-pr-reviews     
10:03 < robot-visions> Is this bug fix concerned with data loss, or with wasting space?
10:03 -!- ecurrencyhodler [68ac287f@cpe-104-172-40-127.socal.res.rr.com] has joined #bitcoin-core-pr-reviews     
10:03 < robot-visions> (Which is the primary goal of the fix)
10:03 < vasild> It is about wasting space.
10:03 -!- theStack [~honeybadg@vps1648322.vs.webtropia-customer.com] has joined #bitcoin-core-pr-reviews     
10:03 < emzy> about 1MB per file.
10:03 < theStack> hi     
10:03 < robot-visions> Great, thanks!  That makes sense
10:04 < jkczyz> hi     
10:04 < vasild> So, lets scratch the surface a little bit - Can we create an undo file for a given block without having all prior blocks?
10:05 < jules6> yes i think?
10:05 < oyster> Don't see why it wouldn't be possible
10:05 < robot-visions> I think we cannot, because without prior blocks, we would not be able to restore the UTXOs spent by the block (we'd have the `COutPoint`s, but not the `CTxOut`s)
10:06 < pinheadmz> the undo data is a set of coins SPENT by a block (so that we can un-spend if it gets disconencted)
10:06 < pinheadmz> you need the utxo set to determine which coins are spent by a block
10:06 < theStack> i would think the same as robot-visions: no, it's not possible
10:06 < pinheadmz> so actually a pruned node could do this: you need UTXO set, not prior blocks
10:07 < robot-visions> pinheadmz: that makes sense to me!
10:07 < vasild> right, actually, now when I think about it - the question is misleading. We need the utxo for which we need the prior blocks. So the answer would be "no", but however if we somehow have the utxo - then we are good.
10:08 < pinheadmz> its a good question :-)
10:08 < robot-visions> Agreed :)  (good question), a lot of interesting nuance
10:08 < jnewbery> vasild: the undo data is written in ConnectBlock(), which we only call for the tip block, so we need to have processed all the blocks up to that point
10:08 < sipa> well, you need *all* utxos spent by a block, which may include utxos created the block before, so it can't be created before at least having validated the previous block
10:08 < sipa> and after the block itself is activated, those utxos are gone
10:09 < sipa> so the only time the undo data can be created for block B is in the narrow timespan betwee  block B-1 and B+1
10:09 < pinheadmz> this is making me wonder what happens if a node is pruning with depth (say) 100, and 101 blocks get disconencted
10:10 < sipa> pinheadmz: hell breaks lose; a pruned node cannot reorg past its prune point
10:10 < jnewbery> perhaps it's a trick question. If you're pruning, then you may have discarded the old raw blocks when you're connected blocks at the tip
10:10 < pinheadmz> i suppose it doesnt make a difference right?
10:10 < pinheadmz> sipa is that really right?
10:10 < jnewbery> pinheadmz: it is. You'd need to redownload the entire chain
10:10 < pinheadmz> shit!
10:11 < pinheadmz> and its because we dont have the undo data for block tip-101
10:11 < sipa> that's why you cannot prune less than 288 blocks deep
10:11 < sipa> undo data is useless without the corresponding block data
10:11 < jnewbery> that's why we have a healthy margin of error: https://github.com/bitcoin/bitcoin/blob/5dcb0615898216c503e965a01d855a5999a586b5/src/validation.cpp#L3951
10:11 < sipa> as it doesn't contain the txids
10:11 < vasild> luckily the minimum prune data one has to keep is 500MB IIRC, so that would be like 500MB worth of blocks a reorg. 500*10 minutes = 3.5 days
10:11 < pinheadmz> i always thought the 288 rule was just so we could still relay recent blocks to peers
10:11 < pinheadmz> hadnt considered this
10:12 < sipa> pinheadmz: the pruned relay came much later; for several major releases pruning disabled all block fetching
10:12 < pinheadmz> sure makes sense
10:12 < vasild> Do we ever modify existing data in the blocks database, or do we just append new blocks and their undos?
10:12 < felixweis> to restore the the utxo we need key=(txid,index) value=(script,amount,blockheight,is_coincase). some info (like original txid) is inferred from the last block (in blk*.dat), the rest is stored in rev*.dat, inputs are in the same canonical order as the inputs in the block spending. (iirc)
10:13 < pinheadmz> was the decision about the 288 value controversial at the time?
10:13 < sipa> vasild: the blocks leveldb db?
10:13 < felixweis> i think if there is every a 288 block reorg, hell breaks lose outside of the software realm
10:14 < sipa> pruning modifies existing blocks in it
10:14 < vasild> I mean the blk*.dat and rev*.dat
10:14 < sipa> no, they're write once & delete when pruning
10:14 < robot-visions> vasild: I think we never modify data already written to the blk*.dat and rev*.dat files (but is it possible to prune / delete later)?
10:14 < robot-visions> thanks sipa: for going back in time and answering my question before I wrote it
10:15 < vasild> right, so we only append to blk*.dat and rev*.dat and never update/modify or delete (except deletion in prune)
10:15 < sipa> in which case files are deleted as a whole
10:15 -!- lola_dam [56c00d0d@lfbn-ami-1-546-13.w86-192.abo.wanadoo.fr] has quit [Remote host closed the connection]     
10:16 -!- gloria3 [49fcfb03@c-73-252-251-3.hsd1.ca.comcast.net] has joined #bitcoin-core-pr-reviews     
10:16 < jnewbery> I found it confusing that ThreadImport is calling a function called SaveBlockToDisk (through AcceptBlock): https://doxygen.bitcoincore.org/validation_8cpp.html#a0042314b4feb61589ccfda9208b15306
10:17 < oyster> vasild how does "not modifying existing blk* and rev* data square with: "In addition to the blk*.dat files, we also generate and store “undo” information for each block in corresponding rev*.dat files. This can be used to revert all the transactions from the block if we need to disconnect the block from the chain during a reorg. Unlike blocks, this information is always stored in block height order."
10:17 < jnewbery> (actually I found the whole passing down of `const FlatFilePos* dbp` through that call stack confusing)
10:17 < oyster> Does it mean that in order to ensure "information is always stored in block height order." that the rev data isn't written until all the blocks are processed up until that point?
10:17 < vasild> oyster: you mean what happens with blk*.dat and rev*.dat when we disconnect a block?
10:18 < sipa> oyster: the undo (rev) data does not *exist* until a block is activated
10:18 < vasild> oyster: yes!
10:18 < sipa> and it can only be activated when its parent is active
10:18 < oyster> sipa ok that's what I'm thinking
10:18 < felixweis> im still stuggling to understand how blocks in blk* are out of order, but rev is in order. the pruning mechanism deletes blk and rev of the same number? what would happen if the tip is artificialy stored in blk000000.dat right after the genesis block?
10:18 < jnewbery> oyster: correct. See scrollback. A block's undo data can only be written as the block is connected
10:18 < sipa> felixweis: then blk00000.dat will not be pruned
10:19 < vasild> felixweis: it helps if you ignore pruning, it is not very relevant for this
10:19 < sipa> (and perhaps nothing would be pruned, unsure)
10:19 < robot-visions> Could heights in rev* be out of order if there was a disconnect?
10:19 < felixweis> thanks, sorry for derailing
10:19 < sipa> robot-visions: rev* is partially ordered (a block's parent always comes before the child)
10:20 < robot-visions> Makes sense, thanks!
10:20 < sipa> in the case of multiple branches it's possible that parent and child block's are not exactly adjacent
10:20 < oyster> so if it's the case that rev data isn't written until block is activated/connected, how is the case that  blk*.dat and rev*.dat files always contain the same block info, What if node starts and only hears about 128mb worth of blocks that are beyond it's tip? wouldn't those be written to blk* but couldn't be written to the corresponding rev number?
10:20 < vasild> so, the point is - we dump blocks in blk*.dat as they come from the network, which is usually out of order, but we cannot do the same with undo - which we only can generate once all previous blocks are activated
10:20 < pinheadmz> sipa: there is an index right? block hash => filename and start position ?
10:21 < sipa> pinheadmz: yes, the block index
10:21 < oyster> I think my question might be the same as felixweis's
10:21 < pinheadmz> so file position doesn't really matter
10:21 < robot-visions> oyster: if a block was written to blk_N.dat, then when we're writing the undo data, we open rev_N.dat and append it there
10:22 < sipa> blk* files are size limited
10:22 < robot-visions> rev_N are not size limited though?
10:22 < vasild> oyster: we come back later, when we have the undo and append it to the proper rev*.dat file - the one which corresponds to the block's blk*.dat file. Notice that this is also always the last rev*.dat file.
10:22 < sipa> rev* files just contain the undo data for the corresponding blocks in the same numbered blk fe
10:22 < sipa> indeed, rev* are not size limited
10:22 < oyster> ok thanks
10:22 < sipa> vasild: not necessarily in case of reorgs
10:22 < nehan_> sipa: they implicitly are because blk files are, right?
10:23 < vasild> sipa: hmm, right!
10:23 < sipa> nehan_: sure, but there is no explicit "rev files are limited to N MB" check in the code
10:23 < vasild> How is space allocated in the files when new data is appended? Why?
10:24 < felixweis> to have the data less fragmented
10:24 < sipa> you could have in theory a rev file up to 250 MB or so
10:24 < robot-visions> It's allocated in chunks (1 MB for undo, 16 MB for block), I'm guessing to reduce the number of allocations?
10:24 < sipa> robot-visions: yeah, to reduce fragmentation
10:25 < vasild> filesystem fragmentation
10:25 < robot-visions> basic question: why does allocating data in chunks of 1 MB reduce filesystem fragmentation?
10:25 < sipa> sorry, rev data for one block up to 250 MB, if it spent 25000 outputs all with a 10000 byte scriptPubKey
10:25 < felixweis> i see how this is super useful with hdds, what if the underying is an ssd? have there been benchmarks done?
10:25 < robot-visions> (what would happen if you always just extended by just enough to write the next piece of data)
10:25 < sipa> robot-visions: it depends on the filesystem
10:26 < oyster> seems like it depends on the FS block size
10:26 < sipa> felixweis: certainly filesystems on SSDs wouldn't be impacted as much
10:26 < vasild> filesystem != HDD vs SSD
10:27 < sipa> but filesystem fragmentation just increases overhead for the filesystem
10:27 < vasild> also, this pre-allocation guarantees that we will not get "disk full" error when writing the data later.
10:27 < sipa> the preallocation is essentially us letting the OS know that this file will grow in the future
10:28 < vasild> So, we preallocate some space in advance, even if we dont know if we will fully utilize it. What does it mean to "finalize" a file?
10:30 < felixweis> vasild: crasing during pre-allocatin if disk is full is a nice sideeffect.
10:30 < robot-visions> I believe "finalize" means truncate unused space that was preallocated (in addition to the usual flush + sync)
10:30 < willcl_ark> am I right in thinking that blk*.dat files are never re-processed to re-order the blocks within them (after being finalised)?
10:30 < nehan_> vasild: flush the stream to disk
10:31 < nehan_> er, to the filesystem. i'm not sure where fsync() is called
10:31 < vasild> felixweis: yes :) I don't know if this guarding from disk full is very useful. We can still get other IO errors and we have to be prepared for that anyway, even if disk full is not one of them.
10:31 < jnewbery> willcl_ark: yes, that's right. They remain in the order in which they arrived.
10:31 < vasild> willcl_ark: yes, we never modify them later
10:32 < vasild> willcl_ark: notice that if we tried to reorder them later we may have to move blocks across blk files, if e.g. block 100 is in blk5 and block 101 is in blk4
10:33 < willcl_ark> ah, interesting
10:33 < sipa> felixweis: at the time is was written i doubt it was tested on SSDs
10:34 < robot-visions> nehan_: does the `fsync` happen as part of the `fclose`?
10:34 < vasild> nehan_: that is what the Flush() method does (fflush() + fsync()). Finalize is that + return the claimed space to the FS because we know that we will not need it because we will not append to this file again.
10:34 < sipa> we fsync explicitly when flushing
10:34 -!- grunch__ [~grunch@2800:810:547:857b:b467:713c:becb:5f23] has joined #bitcoin-core-pr-reviews     
10:34 < jnewbery> sipa: has there ever been discussion about storing compressed blocks in the blk files? Any idea about how much space that could save?
10:35 < vasild> btw, it is funny that we do fflush() right after fopen()
10:35 < nehan_> ah, i see -- fsync() is in FileCommit which is always called during a flush. even if finalize is false?
10:36 < vasild> jnewbery: compressed like in general compression algo, e.g. zlib?
10:36 < theStack> if one would delete all blk*/rev* files from a (stopped) node and replace them by the copy from another node, would that cause any problems?
10:36 < nehan_> robot-visio: fclose calls fflush but not fsync
10:36 < pinheadmz> jnewbery: I thought bitcoin data was too random to benefit from compression (all those hashes)
10:36 < vasild> nehan_: yes
10:36 < robot-visions> Thanks nehan_!
10:37 < vasild> theStack: hah! I guess it will be out of sync with blocks/index/
10:37 < jnewbery> theStack: yes, because the leveldb blocks database stores where in the files the blocks are
10:37 < theStack> vasild: jnewbery: would starting with -reindex help in this case?
10:38 < vasild> I think it should fix the issue, but I am not very confident about that
10:38 < jnewbery> theStack: I believe so, but I'm also not confident
10:39 < jnewbery> pinheadmz: yeah you're right. Headers can be compressed by a bit but block data not very much
10:39 < vasild> to be sure, I would `rm -fr blocks/index/` so at least if it bricks it bricks with some "file not found" error instead of some obscure trying to lookup a block in position X at file Y and finding something else there
10:40 < fjahr> theStack: I was pretty confident until I saw jnewbery and vasild answers ;) But -reindex builds the blockindex so I don't know why it wouldn't work
10:40 < theStack> i remember years ago people were offering blk/rev files via bittorrent, that's why i'm asking
10:40 < pinheadmz> theStack: that was before headers-first sync i believe
10:40 < theStack> i was always assuming that everynode has the exact same block files, i.e. their hashes would. today i learned that this is obviously not the case :)
10:41 < sipa> jnewbery: sure, i even worked on a custom compression format with gmaxwell before
10:41 < pinheadmz> now bitcoind pulls blocks in parallel just like bittorrent would
10:41 < vasild> I am using zfs with lz4 compression for my blocks directory. "compressratio         1.11x"
10:41 < sipa> jnewbery: savings are only 20-30% though; if block space os an issue, pruning is probably more important
10:41 < felixweis> sipa: the notes for that would be so awesomly interesting xD
10:41 -!- ecurrencyhodler [68ac287f@cpe-104-172-40-127.socal.res.rr.com] has quit [Remote host closed the connection]     
10:42 < vasild> Main question: What is the bug that the PR is fixing?
10:42 < sipa> felixweis: blockstream is using it for satellite block broadcast
10:42 < felixweis> oh wow
10:42 < vasild> sipa: it is 1.11x compression ratio in my case (lz4)
10:43 < jnewbery> sipa: 20-30% saving on the header or the entire block?
10:43 < pinheadmz> sipa: does it introsepct the bitcoin data? for example i can imagine pruning default values like nlocktime or nsequence, maybe squeeze some bytes out of DER format
10:43 < vasild> That is counting everything inside ~/.bitcoin, but other stuff is too small
10:43 < robot-visions> vasild: If we're receiving new black faster than the chain tip is moving, we could run into situations where we (1) finalize a rev* file, (2) write some more undo data to the file later, (3) don't finalize it again
10:44 < sipa> pinheadmz: yes, exactly
10:44 < pinheadmz> brilliant
10:44 < robot-visions> block data*
10:44 < theStack> jnewbery: the header consists mostly of hashes (64 out of 80 bytes), i guess there is not much possible with compression
10:44 < sipa> jnewbery: whole block, but that number includes crazy stuff like exploiting pubkey recovery for p2pkh/p2wpkh
10:45 < sipa> which makes decompression slower than full validation
10:46 < jnewbery> sorry, I've derailed the conversation a bit. vasild's last question was: What is the bug that the PR is fixing?
10:46 < vasild> robot-visions: right, except it happens during IBD when we receive blocks out of order and finalize revN whenever we finalize blkN, but later come back and append stuff to revN and forget to finalize it
10:46 < robot-visions> (y)     
10:46 < felixweis> i got 1.427x with bzip2 -9 on blk00010.dat to blk00019.dat (earlier blk have lower ratio)
10:48 < felixweis> but takes 7s to decompress
10:48 < vasild> So, we should not finalize revN whenever finalizing blkN, unless we are sure that we will not come back to append more undo to it.
10:48 < emzy> so blocks on btrfs fith lz4 compression is a good option?
10:48 < vasild> How would you reproduce the bug or show it exists?
10:49 < emzy> s/fith/with/
10:49 < robot-visions> vasild: Is there harm in keeping the "finalize revN whenever finalizing blkN", but also add additional revN finalization as needed?
10:49 < robot-visions> (I think the benefit would be making the code simpler, and the downside would be sometimes an extra finalization—I'm not sure how much that would affect fragmentation)
10:50 < sipa> or we could benchmark how much the pre-allocation helps, and maybe get rid of ot entirely
10:50 < jonatack> +1 on benchmarking
10:50 < jnewbery> vasild: pedantically, we can't be *sure*, unless we've tried to connect every block in the blk file
10:50 < vasild> robot-visions: this is what the fix does - but not unconditionally "finalize revN whenever finalizing blkN". It is possible to detect if we will be going back to append
10:51 < vasild> I suspect the preallocation may turn out to have no effect on performance
10:52 < vasild> hmm, we already answered the next question: "How is the bug being fixed in the PR?"
10:52 < vasild> we natually moved to "how to improve things further" :)
10:52 < vasild> remove preallocation! :)
10:52 < robot-visions> :)     
10:52 < robot-visions> Just to make sure I understand correctly: Could you flush more often than needed in the rare edge case where the same file has multiple blocks with `nHeight == vinfoBlockFile[_pos.nFile].nHeightLast`?
10:52 < jnewbery> (e.g. if there are two blocks of the same height in the blk file, there's always a chance that we might re-org to the other one)
10:53 < robot-visions> s/flush/finalize
10:53 < emzy> I think modern filesystems like zfs will cache / preallocate anyway in the background.
10:53 < vasild> What about even removing rev*.dat files altogether? We only disconnect the block from the tip and at this point we have the utxo, so we can generate the undo on the fly as needed, no?
10:54 < jnewbery> vasild: we can't do that. We need to recreate the utxos that were used up in the block we're disconnecting.
10:54 < felixweis> vasild: you'd need a -txindex
10:54 < nehan_> vasild: I don't see how you have the UTXO.
10:55 < felixweis> or it will be painfully slow to find all those prevouts
10:55 < sipa> vasild: after the block is connected, the ut os it spent are gome from the utxo set
10:55 < sipa> so we need to keep them around somewhere
10:55 < sipa> that's what the undo files do
10:56 < sipa> *utxos it spent
10:56 < vasild> hmm     
10:57 < sipa> with txindex and no pruning they could in theory be recovered from the block data itself, but it would be slow (as they're scattered all over), and not be generically applocable
10:57 < sipa> *applicable
10:58 < vasild> it would save 29GB of disk space from rev*.dat
10:59 < vasild> what about prining then just the rev*.dat files, it is not like we will ever need the undo for some blocks from years ago
10:59 < emzy> The reorg must be fast. So good to have it handy.
10:59 < sipa> vasild: why not prune everything?
11:00 < felixweis> vasild: even more on OSX pre 0.20 https://github.com/bitcoin/bitcoin/issues/17827
11:00 < sipa> if you care about disk space, you should prune
11:00 < vasild> well, yes, if we just prune rev* then we still have all the complications of maintaining them
11:01 < vasild> Time is out. Take away - try to ditch preallocation completely, but not rev* files completely :)
11:01 < sipa> well we need rev* data
11:01 < felixweis> Undo data is awesome
11:02 < jnewbery> that's time!
11:02 < felixweis> ultraprune ftw! (listen to the podcast and read all there is on bitcoin.stackexchange.com)
11:02 < felixweis> thanks everyone!
11:02 < theStack> thanks for hosting
11:02 < vasild> Thank you!
11:02 < nehan_> thanks!
11:02 < sipa> thanks!
11:02 < jnewbery> thanks everyone!
11:02 < willcl_ark> thanks!
11:02 < emzy> Thanks!!
11:03 < robot-visions> thanks!
11:03 < jonatack> Thanks vasild and everyone! Great meeting
11:03 < jnewbery> thanks vasild. Great job!
11:03 -!- rjected [~dan@pool-71-184-77-198.bstnma.fios.verizon.net] has quit [Quit: ZNC 1.7.4+deb0+disco0 - https://znc.in]     
11:03 < oyster> thanks vasild
11:03 < vasild> #endmeeting
11:03 -!- jules6 [d98ac6fe@217.138.198.254] has quit [Remote host closed the connection]     
11:04 < jnewbery> I'll post the log later today. If anyone wants to host in the upcoming weeks, please let me know
11:04 < vasild> jnewbery: thanks for setting this whole thing up :)
11:06 < emzy> I will try the blocks on bttfs with lz4. I hope it saves 11% and is maybe a little bit faster.
11:07 < vasild> "so blocks on btrfs fith lz4 compression is a good option?"
11:07 < vasild> emzy: you have to try
11:09 < vasild> long time ago I enabled lz4 on everything because the decompression is so fast that "read compressed from disk + uncompress" is faster than "read uncompressed from disk". Of course it depends on the data itself, disk and CPU speed.
11:09 -!- rjected [~dan@pool-71-184-77-198.bstnma.fios.verizon.net] has joined #bitcoin-core-pr-reviews     
11:10 < emzy> Yes my thinking. And I have more then one VM hat is on the disk space limit.
11:11 < vasild> ah, yes, I forgot - it also saves disk space!
11:11 < emzy> 11% will help a little
11:13 -!- gloria3 [49fcfb03@c-73-252-251-3.hsd1.ca.comcast.net] has quit [Remote host closed the connection]     
11:14 -!- oyster [~oyster@195.181.160.175.adsl.inet-telecom.org] has left #bitcoin-core-pr-reviews []     
11:14 < sipa> hmm, i wonder how well zopfli can compress block data
11:17 < emzy> zpaq has super good compression ratios  http://mattmahoney.net/dc/zpaq.html
11:24 < sipa> -rw------- 1 pw pw 133925560 Apr 21 20:09 blk02043.dat
11:24 < sipa> -rw-rw-r-- 1 pw pw 111908911 Apr 22 11:14 blk02043.dat.bz2
11:24 < sipa> -rw-rw-r-- 1 pw pw 111091123 Apr 22 11:16 blk02043.dat.gzip.gz
11:24 < sipa> -rw-rw-r-- 1 pw pw 110891285 Apr 22 11:18 blk02043.dat.zopfli.gz
11:24 < sipa> -rw-rw-r-- 1 pw pw 100159172 Apr 22 11:12 blk02043.dat.xz
11:24 < sipa> -rw-rw-r-- 1 pw pw  99945288 Apr 22 11:23 blk02043.dat.zpaq
11:24 < emzy> wow you are fast. And zpaq won :)
11:25 < sipa> -rw-rw-r-- 1 pw pw 113306837 Apr 22 11:24 blk02043.dat.lz4
11:25 < sipa> actually, these are unfair numbers
11:25 < sipa> if integrated into core, they'd be compressing block per block, rather than entire blk files
11:26 < sipa> and if used at the filesystem level, they're compressiong per fs block
11:26 < emzy> right.
11:27 < emzy> but bz2 and lz4 are some how block based.
11:33 -!- robot-visions [ac5c0435@172.92.4.53] has quit [Remote host closed the connection]     
11:49 -!- shaunsun [shaunsun@gateway/vpn/privateinternetaccess/shaunsun] has quit [Ping timeout: 265 seconds]     
11:59 -!- robot-visions [ac5c0435@172.92.4.53] has joined #bitcoin-core-pr-reviews     
12:07 < robot-visions> Hi, I have two follow up questions from today's PR review club:
12:07 < robot-visions> 1) Is the UTXO set persisted to disk somewhere?
12:08 < robot-visions> 2) Could bitcoind crash after updating the UTXO set but before writing the undo data for a block?
12:12 < sipa> 1) yes, the chainstate directory (reconstructing it from the block data would take hours, and be impossible when pruned)
12:13 -!- Talkless [~Talkless@hst-227-49.splius.lt] has quit [Quit: Konversation terminated!]     
12:13 < sipa> 2) shouldn't be, as the block data is always flushed before writing the block index, and the block index is always flushed before writing the chainstaye
12:13 < sipa> (though now i want to go check)
12:14 < robot-visions> Thanks!  For (2) I'm looking at CChainState::ConnectBlock, and in particular where WriteUndoDataForBlock is called
12:20 -!- Tomas123 [bcc3e48d@ipbcc3e48d.dynamic.kabel-deutschland.de] has quit [Remote host closed the connection]     
12:42 -!- grunch__ [~grunch@2800:810:547:857b:b467:713c:becb:5f23] has quit [Ping timeout: 246 seconds]     
12:55 -!- grunch__ [~grunch@78-170-16-190.fibertel.com.ar] has joined #bitcoin-core-pr-reviews     
13:13 -!- michaelfolkson [~textual@2a00:23c5:be01:b201:1885:5393:cbbe:8254] has quit [Quit: Sleep mode]     
13:20 -!- luke-jr [~luke-jr@unaffiliated/luke-jr] has quit [Quit: ZNC - http://znc.sourceforge.net]     
13:20 -!- luke-jr [~luke-jr@unaffiliated/luke-jr] has joined #bitcoin-core-pr-reviews     
13:23 -!- michaelfolkson [~textual@2a00:23c5:be01:b201:1885:5393:cbbe:8254] has joined #bitcoin-core-pr-reviews     
13:36 < jnewbery> Today's meeting log is posted: https://bitcoincore.reviews/17994.html
13:37 < jnewbery> I've just realised that today was review club meeting number 50 :)
13:37 < pinheadmz> woohoo!
13:37 < pinheadmz> 🚀
13:47 -!- grunch__ [~grunch@78-170-16-190.fibertel.com.ar] has quit [Ping timeout: 260 seconds]     
13:51 -!- grunch__ [~grunch@78-170-16-190.fibertel.com.ar] has joined #bitcoin-core-pr-reviews     
13:56 -!- grunch__ [~grunch@78-170-16-190.fibertel.com.ar] has quit [Ping timeout: 256 seconds]     
13:56 < jonatack> congrats! 🏆
14:03 < emzy> wooohooo
14:26 -!- robot-visions [ac5c0435@172.92.4.53] has quit [Remote host closed the connection]     
14:45 -!- jimpo_ [~jimpo@ec2-13-57-39-52.us-west-1.compute.amazonaws.com] has quit [Quit: ZNC 1.7.1 - https://znc.in]     
14:45 -!- jesseposner [~jesseposn@c-67-188-220-154.hsd1.ca.comcast.net] has quit [Read error: Connection reset by peer]     
14:45 -!- jimpo [~jimpo@ec2-13-57-39-52.us-west-1.compute.amazonaws.com] has joined #bitcoin-core-pr-reviews     
14:45 -!- jesseposner [~jesseposn@c-67-188-220-154.hsd1.ca.comcast.net] has joined #bitcoin-core-pr-reviews     
15:20 -!- rjected [~dan@pool-71-184-77-198.bstnma.fios.verizon.net] has quit [Ping timeout: 256 seconds]     
15:41 -!- vasild_ [~vd@gateway/tor-sasl/vasild] has joined #bitcoin-core-pr-reviews     
15:44 -!- vasild [~vd@gateway/tor-sasl/vasild] has quit [Ping timeout: 240 seconds]     
15:44 -!- vasild_ is now known as vasild     
15:47 -!- hanhua [63687d99@99-104-125-153.lightspeed.sntcca.sbcglobal.net] has quit [Remote host closed the connection]     
15:52 -!- lightlike [~lightlike@p200300C7EF17E400A53FED2388D01DAA.dip0.t-ipconnect.de] has quit [Quit: Leaving]     
15:52 -!- brakmic [~brakmic@185.183.85.108] has quit [Ping timeout: 264 seconds]     
15:57 -!- seven_ [~seven@2a00:ee2:410c:1300:7c20:a66a:ebb6:c538] has joined #bitcoin-core-pr-reviews     
16:33 -!- shesek [~shesek@unaffiliated/shesek] has joined #bitcoin-core-pr-reviews     
17:43 -!- mol_ [~molly@unaffiliated/molly] has quit [Ping timeout: 260 seconds]     
17:51 -!- mol [~molly@unaffiliated/molly] has joined #bitcoin-core-pr-reviews     
18:14 -!- jonatack [~jon@213.152.161.138] has quit [Ping timeout: 256 seconds]     
18:16 -!- jonatack [~jon@37.173.202.229] has joined #bitcoin-core-pr-reviews     
18:20 -!- michaelfolkson [~textual@2a00:23c5:be01:b201:1885:5393:cbbe:8254] has quit [Quit: Sleep mode]     
18:36 -!- seven_ [~seven@2a00:ee2:410c:1300:7c20:a66a:ebb6:c538] has quit [Remote host closed the connection]     
18:36 -!- seven_ [~seven@2a00:ee2:410c:1300:7c20:a66a:ebb6:c538] has joined #bitcoin-core-pr-reviews     
19:46 -!- Blank44 [60f858e5@pool-96-248-88-229.cmdnnj.fios.verizon.net] has quit [Ping timeout: 240 seconds]     
21:38 -!- seven_ [~seven@2a00:ee2:410c:1300:7c20:a66a:ebb6:c538] has quit [Ping timeout: 246 seconds]     
22:09 -!- Talkless [~Talkless@hst-227-49.splius.lt] has joined #bitcoin-core-pr-reviews     
22:09 -!- Talkless [~Talkless@hst-227-49.splius.lt] has quit [Client Quit]     
22:09 -!- Talkless [~Talkless@hst-227-49.splius.lt] has joined #bitcoin-core-pr-reviews     
22:26 -!- brakmic [~brakmic@ip-176-198-41-116.hsi05.unitymediagroup.de] has joined #bitcoin-core-pr-reviews     
22:32 -!- brakmic_ [brakmic@gateway/vpn/nordvpn/brakmic] has joined #bitcoin-core-pr-reviews     
22:35 -!- brakmic [~brakmic@ip-176-198-41-116.hsi05.unitymediagroup.de] has quit [Ping timeout: 250 seconds]     
22:36 -!- jesseposner [~jesseposn@c-67-188-220-154.hsd1.ca.comcast.net] has quit [Ping timeout: 250 seconds]     
22:50 -!- seven_ [~seven@2a00:ee2:410c:1300:2503:f261:35ca:201b] has joined #bitcoin-core-pr-reviews     
22:50 -!- seven_ [~seven@2a00:ee2:410c:1300:2503:f261:35ca:201b] has quit [Read error: Connection reset by peer]     
23:09 -!- jesseposner [~jesseposn@c-67-188-220-154.hsd1.ca.comcast.net] has joined #bitcoin-core-pr-reviews     
23:18 -!- jesseposner [~jesseposn@c-67-188-220-154.hsd1.ca.comcast.net] has quit [Ping timeout: 265 seconds]