--- Log opened Wed Sep 21 00:00:26 2022
00:11 -!- jonatack [~jonatack@user/jonatack] has quit [Ping timeout: 248 seconds]
00:32 -!- kouloumos [uid539228@id-539228.tinside.irccloud.com] has joined #bitcoin-core-pr-reviews
00:37 -!- jonatack [~jonatack@user/jonatack] has joined #bitcoin-core-pr-reviews
00:44 -!- jonatack [~jonatack@user/jonatack] has quit [Ping timeout: 268 seconds]
00:49 -!- smudge-the-cat [smudge-the@2600:3c01::f03c:93ff:fe0c:9b23] has joined #bitcoin-core-pr-reviews
00:49 -!- smudge-the-cat [smudge-the@2600:3c01::f03c:93ff:fe0c:9b23] has left #bitcoin-core-pr-reviews []
01:00 -!- jonatack [~jonatack@user/jonatack] has joined #bitcoin-core-pr-reviews
01:26 -!- jonatack [~jonatack@user/jonatack] has quit [Ping timeout: 268 seconds]
01:34 -!- pablomartin [~pablomart@45.152.183.4] has joined #bitcoin-core-pr-reviews
01:35 -!- jonatack [~jonatack@user/jonatack] has joined #bitcoin-core-pr-reviews
01:40 -!- pablomartin [~pablomart@45.152.183.4] has quit [Ping timeout: 252 seconds]
01:57 -!- pablomartin [~pablomart@45.152.183.6] has joined #bitcoin-core-pr-reviews
02:08 -!- pablomartin [~pablomart@45.152.183.6] has quit [Quit: Leaving]
04:22 -!- ccniq [~ccniq@23.142.224.120] has joined #bitcoin-core-pr-reviews
04:30 -!- ccniq [~ccniq@23.142.224.120] has quit [Ping timeout: 260 seconds]
04:33 -!- jonatack [~jonatack@user/jonatack] has quit [Ping timeout: 252 seconds]
05:22 -!- vague [~vague@irssi/staff/vague] has joined #bitcoin-core-pr-reviews
05:23 -!- vague [~vague@irssi/staff/vague] has left #bitcoin-core-pr-reviews [Something happened, I never part voluntarily.]
05:24 -!- NorrinRadd [~me@188.215.95.37] has joined #bitcoin-core-pr-reviews
05:26 < NorrinRadd> 1700 UTC?
05:39 -!- brunoerg [~brunoerg@2804:14d:5281:8ae2:155e:b629:c8f4:964d] has joined #bitcoin-core-pr-reviews
06:07 -!- andrewtoth [~andrewtot@gateway/tor-sasl/andrewtoth] has quit [Remote host closed the connection]
06:09 -!- shawn457 [~shawn457@pool-96-255-32-133.washdc.fios.verizon.net] has joined #bitcoin-core-pr-reviews
06:09 -!- shawn457 [~shawn457@pool-96-255-32-133.washdc.fios.verizon.net] has quit [Client Quit]
06:16 -!- andrewtoth [~andrewtot@gateway/tor-sasl/andrewtoth] has joined #bitcoin-core-pr-reviews
07:13 -!- giabertu [~giabertu@93-44-80-65.ip96.fastwebnet.it] has joined #bitcoin-core-pr-reviews
08:13 -!- Kaizen_Kintsugi [~Kaizen_Ki@d75-156-179-9.abhsia.telus.net] has joined #bitcoin-core-pr-reviews
08:14 -!- Kaizen_Kintsugi [~Kaizen_Ki@d75-156-179-9.abhsia.telus.net] has quit [Client Quit]
08:14 -!- amovfx [~amovfx@d75-156-179-9.abhsia.telus.net] has joined #bitcoin-core-pr-reviews
08:20 -!- giabertu [~giabertu@93-44-80-65.ip96.fastwebnet.it] has quit [Quit: Connection closed]
08:33 -!- kevkevin [~kevkevin@2601:243:197e:8f10:a0b4:8a4e:ecef:9233] has quit [Remote host closed the connection]
08:34 -!- kevkevin [~kevkevin@2601:243:197e:8f10:a0b4:8a4e:ecef:9233] has joined #bitcoin-core-pr-reviews
09:54 < larryruane_> yes, 1700 UTC https://bitcoincore.reviews/
09:56 -!- sanya [~sanya@109-93-2-111.dynamic.isp.telekom.rs] has joined #bitcoin-core-pr-reviews
09:58 -!- yashraj [yashraj@gateway/vpn/protonvpn/yashraj] has joined #bitcoin-core-pr-reviews
09:59 -!- adam2k [~adam2k@ip24-254-208-245.hr.hr.cox.net] has joined #bitcoin-core-pr-reviews
09:59 -!- alecc [~alecc@s-25-172.flex.volo.net] has joined #bitcoin-core-pr-reviews
10:00 < stickies-v> #startmeeting
10:00 < larryruane_> hi!
10:00 < alecc> hi
10:00 < amovfx> hello
10:01 < stickies-v> welcome everyone! Unfortunately, glozow isn't able to host the meeting today so dergoegge and I will be guiding you through the wonderful world of block download stalling
10:01 < stickies-v> two hosts for the price of one!
10:01 < amovfx> osom
10:01 < dergoegge> Hi!
10:01 < adam2k> 🎉
10:02 < lightlike> hi
10:02 < stickies-v> the PR we're looking at is #25880, authored by lightlike (mzumsande), who luckily is here as well! The notes and questions are available on https://bitcoincore.reviews/25880
10:02 < stickies-v> anyone joining us for the first time today? even if you're just lurking, feel free to say hi!
10:03 < brunoerg> hi
10:03 < michaelfolkson> hi
10:03 < kouloumos> hi
10:04 -!- BlueMoon [~BlueMoon@dgb3.dgbiblio.unam.mx] has joined #bitcoin-core-pr-reviews
10:05 < stickies-v> who got the chance to review the PR or read the notes? (y/n) And for those that were able to review it, would you give it a Concept ACK, approach ACK, tested ACK, or NACK?
10:05 < sipa> hi
10:05 < amovfx> y
10:05 < amovfx> concept ack, tested ack
10:05 < alecc> y
10:06 < larryruane_> review 0.5, at least concept ACK
10:06 < alecc> concept ack, didn't get a chance to test
10:06 < yashraj> y
10:06 < dergoegge> Concept ACK, did a rough first pass on the code
10:06 < brunoerg> concept ack
10:06 < stickies-v> amovfx: awesome that you tested it too - could you share a bit about how you did that?
10:07 < adam2k> concept ACK
10:07 < BlueMoon> Hello!!
10:07 < amovfx> I just got the branch and ran the functional test
10:07 < amovfx> maybe that doesn't count?
10:07 < amovfx> would a proper test be me creating my own test?
10:08 < stickies-v> alright that's a great start! usually for tested ack we mean that we've done testing that goes beyond the functional tests though, because the CI runs the functional tests anyway - so them failing  would get picked up pretty quickly
10:09 < amovfx> ah okay, good to know
10:09 < stickies-v> what "testing" means depends on the PR, your understanding of it, and how creative you can get in trying to poke holes in it. so there's no one definition. In general, it's also good to describe on your review how you tested something, so people can replicate and/or test other angles
10:09 < amovfx> great tip, ty
10:09 < lightlike> what I did to test this at some time is to locally reduce MAX_BLOCKS_IN_TRANSIT_PER_PEER=2 and BLOCK_DOWNLOAD_WINDOW=16  (or similar(), and run with -onlynet=tor - that way, I would get into stalling situations very soon.
10:09 < amovfx> I'm going to have to get a lot better before I can do that
10:09 < larryruane_> I'm running the PR now, and one fun thing to do is, during IBD, `bitcoin-cli getpeerinfo` and you can see how the outstanding block requests are more or less interleaved among the peers (no 2 peers downloading the same block)
10:10 < amovfx> what network did you do that on lightlike? reg or test?
10:10 < larryruane_> to get a simplified view: `bitcoin-cli getpeerinfo | jq '.[]|.inflight`
10:10 -!- Talkless [~Talkless@mail.dargis.net] has joined #bitcoin-core-pr-reviews
10:11 < dergoegge> Alright lets get started with the first question: Without looking at the PR, what solutions would you consider? (Feel free to be creative)
10:11 < lightlike> amovfx: I did it on mainnet (to have blocks that are large and take more than 2s to download )
10:11 < amovfx> ty
10:11 < amovfx> I think thompson sampling would be good for something like this
10:11 < amovfx> prolly way overkill though
10:11 < amovfx> https://en.wikipedia.org/wiki/Thompson_sampling
10:12 < amovfx> Helps with Multi arm bandit problems
10:13 < dergoegge> amovfx: haven't heard about that before, is there a simple explanation of how that would apply to our problem here? 
10:13 < stickies-v> oh dear, I didn't know we had armed bandits on the bitcoin network, leave alone multi-armed
10:13 < larryruane_> there's a great podcast on this topic, featuring one of our esteemed attendees here today! https://podcast.chaincode.com/2020/01/27/pieter-wuille-1.html
10:14 -!- effexzi [uid474242@id-474242.ilkley.irccloud.com] has joined #bitcoin-core-pr-reviews
10:14 < larryruane_> (podcast isn't about Thompson sampling, it's about IBD initial block download)
10:14 < alecc> some clarification on the PR, why does it keep stalling on newly connected peers after only one stalls? i.e. is it intentional to assume new peers are stalling? i was thinking a possible alternative to adjusting the stall window would be to change this assumption? maybe i'm misinterpreting some part
10:14 < amovfx> I think it would help with selecting nodes that have good behavior, it randomly samples and starts building a confidence metric on the nodes that are returning what you want, in this case blocks
10:15 < amovfx> so a node that stalls out, would have a drop in confidence, one that doesn't gains increased confidence
10:15 < amovfx> the higher the confidence, the more you sample from that node
10:16 < larryruane_> alecc: I think you've hit upon an alternate design idea, maybe when we hit this stall situation, temporarily increase the 1024 window ... then somehow reduce it gradually
10:16 < larryruane_> amovfx: really interesting, thanks
10:17 < stickies-v> dergoegge: perhaps one alternative solution would be to instead of disconnecting the peer, allowing one (or more) extra peers to download the same block that's being stalled? that would increase bandwidth usage though, so probably less preferable than what this PR proposes
10:17 < stickies-v> larryruane_: alecc interesting idea. you'd still have to eventually force something if the peer keeps not sending the block until infinity though, I think?
10:18 < dergoegge> stickies-v: yea that might work in some cases , but if you are on a slow connection it would probably make things worse
10:18 < amovfx> yea, this PR helps when the operator happens to have a bad connection, correct?
10:18 < lightlike> I think if we have a peer that is significantly slower than the rest of the peers, such that it stalls the download, we probably want to disconnect it and exchange it for a better peer. It's not like we want to keep our current peers at all costs.
10:19 < dergoegge> a really simple alternative to the PR is to just increase the stalling timeout instead of making it adaptive
10:19 < stickies-v> (the question didn't state to come up with GOOD solutions :-D )
10:19 < larryruane_> well, I guess I was thinking, when we hit the stall condition, still boot that peer (as we do today), but when we then connect the replacement peer, have the window be larger ... (not sure how to shrink it again)
10:19 < amovfx> Yea, it seems like this PR would find the fastest peers faster
10:20 < amovfx> this would converge on a stall time correct?
10:20 < stickies-v> alright I'll launch the next question already, but as always - feel free to continue talking about previous/side points. we're async whoo
10:20 < stickies-v> under what circumstances does a node consider block download to be stalling?
10:20 < larryruane_> I think that's answered in the notes? (maybe there's more to it)
10:21 < stickies-v> in your own words, maybe? quick summary? always helpful to distill info!
10:21 < amovfx> if it hist the timeout?
10:21 < amovfx> hits*
10:21 < stickies-v> what's it? what's the timeout?
10:21 < yashraj> block window does not move?
10:21 < larryruane_> stickies-v: yes, of course, sorry! :) ... one peer is holding us back from making progress
10:22 < amovfx> +1 yashraj
10:22 < larryruane_> so if it wasn't for the ONE peer, we'd be able to jump ahead 1024 blocks
10:22 < alecc> for the actual code, when it's processing a getdata message in `net_processing.cpp`, in `FindNextBlocksToDownload` it sets the stalling bit, and then sets `m_stalling_since` to current time based on that
10:23 < alecc> and then checks `m_stalling_since` with the current time - timeout to determine whether to disconnect
10:23 < stickies-v> larryruane_: yes, except that there could be more than one stalling peer
10:24 < larryruane_> I did have a question about the text in the notes tho, which is "The node is unable to connect a new chain of blocks past the current tip, e.g. if the tip is at height i and blocks [i+2: i+1024] have arrived, but block i+1 hasn’t." .... my question is, couldn't there be other missing block in [i+2: i+1024] BUT, all those blocks are assigned to the same peer (as block i+1 is assigned to)? ... small difference
10:24 < stickies-v> or rather, more than one peer that hasn't yet delivered all blocks in time
10:25 < yashraj> rookie question - what does tip mean exactly? highest validated block?
10:25 < stickies-v> yashraj: yeah exactly the window doesn't move. To add to that: we've *requested* all blocks in the window, but we've not fully received all blocks in the window yet
10:25 < amovfx> yashraj: yea
10:26 < larryruane_> stickies-v: "more than one peer" ... okay I definitely know less than I thought! (not usual!)
10:26 < lightlike> larryruane_:  yes, that is possible (and probably the most usual case) that a couple of blocks from the 1024 window were assigned to staller.
10:26 < stickies-v> larryruane_: yeah I think there could be more missing blocks after that, but would they all need to be assigned to the same peer?
10:27 < larryruane_> stickies-v: well I thought if they weren't all assigned to the same peer, then we wouldn't be considering ourselves stalled?
10:27 < sipa> lightlike: Agree, there is no reason to keep our peers at all costs, though this is dependent on the fact that this is only relevant during IBD, when we mostly care about bandwidth and less about other properties (like partition resistance, latency, ...). After IBD, we're much more careful about when to disconnect.
10:28 < amovfx> thats neat, like IBD is a state of the node
10:28 < stickies-v> larryruane_: interesting, I thought we did. lightlike, what do you think?
10:29 < yashraj> stickies-v: if i+1 arrived but some of the rest have not, we still consider it stalled?
10:29 < sipa> And because I see people talking about increasing the 1024 window: I don't think that helps. For most of IBD, 1024 blocks is way too much; that value was set at a time when blocks were tiny. It still makes sense early in the chain, but arguably we should be shrinking it.
10:29 < larryruane_> amovfx: yes, and just as a side note, a node *always* starts in IBD state, but if it's synced with the block chain (or nearly so) then it's only in IBD momentarily
10:29 < amovfx> ty
10:29 < lightlike> larryruane_, stickies-v: we consider ourselves stalled if we have a peer with an open slot to which we want to assign a block, but can't becasue all of the blocks in thr 1024 windows are already either present or assigned to someone else. So yes, I think it's possible to have multiple stallers.
10:30 < stickies-v> ty for clearing it up! 
10:30 < larryruane_> lightlike: excellent explanation, thanks
10:30 < dergoegge> Next question: What problem is this PR trying to address? How might you be able to observe this issue, and how common could it be?
10:31 < larryruane_> very basically, we disconnect peers who are not necessarily that bad! so lots of churn, we stall more then we should
10:31 < alecc> yashraj: i think it's the opposite - if all but i+1 arrived (i+2 -> i+1024) and we haven't seen i+1 for a certain amount of time we consider ourselves stalled, i guess i'm not super sure exactly what happens if i+1 arrives and then the others are taking a while
10:31 < amovfx> I think previously we could fill up with really slow peers
10:31 < amovfx> and we are slowed for a long time
10:31 < amovfx> oh nm
10:32 < sipa> lightlike: The staller is the peer we have requested the first block in the window from, so there can only be one at a time.
10:32 < sipa> (because that's the one that prevents the window from moving forward)
10:32 < larryruane_> you can observe this issue by looking at `debug.log` for "disconnecting" messages during IBD
10:32 < alecc> dergoegge: i saw in the pr notes it mentioned using tor was one way to observe, as one example of a situation with slower network speeds
10:32 < amovfx> larryruane_: so previously to this PR, there would be far more disconnecting peers
10:33 < dergoegge> alecc: yes! if you run a node on a slow network you will likely run into the issue the PR is trying to address
10:33 < larryruane_> amovfx: yes that's my understanding
10:34 < lightlike> sipa: yes, that is true, I meant that it could be a possible situation that multiple peers are actually stalling us (withholding blocks from us), that will get marked as stallers one after another - but only one at the same time.
10:34 < dergoegge> larryruane_: i think the bigger issue is having a slow connection yourself and then churning through peers, not necessarily individual slow peers
10:36 < larryruane_> dergoegge: yes because that 2 seconds for the new (replacement) peer to respond might always be exceeded if our own network is slow!
10:36 < amovfx> so this would be an attack vector too, a bad actor could eclipse a new node and just stalls sending blocks
10:36 < lightlike> I ran into this when I attempted IBD earlier this year on a slow connection with debug=net enabled - the log consisted of minutes of peers being connected and disconnected for stalling.
10:37 < larryruane_> is there an easy way to artificially give your node a slow connection?
10:37 < larryruane_> (for testing)
10:37 < dergoegge> amovfx: would you say that the PR makes that attack vector worse?
10:38 < amovfx> I dont have the understanding to say weather it does or if it could be exploited. I think it would make it more secure as we have an adaptive time out, so the attacker could use the max time out consistently
10:38 < amovfx> my intuition tells me this is an improvement
10:39 < alecc> dergoegge: it seems the PR is generally leaning towards assuming some nodes are not stalling, so I'd imagine if you were eclipsed, by extending the allowed stall window it would take longer to maybe escape the eclipse im thinking
10:39 < stickies-v> larryruane_: perhaps one options is to give all the peers of your node (assuming you control them in your test setup) a low `-maxuploadtarget`?
10:39 < dergoegge> amovfx: you can do much worse than stalling IBD if you are able to eclipse someone, so i think this not really a concern 
10:39 < amovfx> cool, good to know
10:39 < alecc> that makes sense
10:40 -!- adam2k [~adam2k@ip24-254-208-245.hr.hr.cox.net] has quit [Quit: Client closed]
10:40 < stickies-v> next question: before this PR, if a node has such a slow internet connection that it cannot download blocks in less than 2 seconds, will it be continuously stalled and churning through peers? Why or why not?
10:40 < lightlike> alecc: I would say this PR leans towards assuming not __all of our peers__ are stalling. The first staller will be disconnected as fast as before, just subsequent peers will be given more time if that happened.
10:41 < lightlike> so yes, you are right.
10:41 < amovfx> stickies: I believe so, because all peers hit the max time out window of 2s
10:41 < larryruane_> stickies-v: I would say no, because if our connection to ALL peers is more or less uniformly slow, then we wouldn't get into this situation where we're waiting for only ONE peer
10:42 < alecc> lightlike: makes sense, good to clarify that
10:43 < stickies-v> larryruane_: the uniformly slow is a pretty specific assumption though
10:43 < stickies-v> in a window of 1024 blocks and 16 blocks per peer, I'd say at the end of the window there's likely quite a bit of variation?
10:43 < larryruane_> stickies-v: sorry, I meant if *our* network is slow, then all of our peers would look slow to us
10:44 < lightlike> but not equally slow. some peers might be even slower than we are in this situation.
10:44 < amovfx> larryruane_: thats what I thought this PR would fix, if the controller is on potato interent, they get more time to connect
10:44 < larryruane_> to be clear, I was trying to say why, even without this PR, we *wouldn't* continuously stall and churn, that we *would* make progress (but i'm not entirely sure)
10:45 < stickies-v> yeah, but still I think there would be enough variation to generally end up with at least one peer being marked as stalling
10:45 < yashraj> if we increased timeout to 4s, downloaded the block, so reduced to 2s again wouldn't we likely get stalled again?
10:45 < stickies-v> so in every case but the one where we have virtually identical download timings across our peers, would you say we'd keep stalling and churning peers?
10:46 < yashraj> sorry if I side-tracked a bit there
10:47 < lightlike> yashraj: That's a gread question! I think it's possible, but we would have made some progress along the way, connected some blocks and moved the window forward a bit. 
10:47 < dergoegge> yashraj: the current state of the PR decreases the timeout by a factor of 0.85 not 0.5 (the PR description should be updated)
10:48 < lightlike> if we have a slow connection, that doesn't mean we are more likely to get into stalling situations - for that to happen, we need some peers that are much slower compared to others.
10:48 < alecc> yashraj: to elaborate on what lightlike said, the stall timeout also decreases at a slower rate than it increases so it would try to go against the situation where you just flip back and forth between increasing the window then decreasing
10:48 < stickies-v> if we can't download any block in < 2 seconds, then as soon as we end up with one stalling peer, we would disconnect that peer. afterwards, we would give every new peer 2 seconds to catch up, but that would never happen, so we would be stuck forever I think?
10:48 < yashraj> dergoegge:  oh yeah sorry forgot about that comment
10:49 < lightlike> but if we have  a slow connection, we would have a hard time getting out of a stalling situation.
10:49 < larryruane_> lightlike: yes I agree, remember that the 2s (or with the PR, can be greater) timeout comes into play ONLY if we've entered this "stalled" condition ... with similarly slow peers, (or like if our own network connection is slow), we may never enter that state
10:49 < larryruane_> lightlike: ah interesting observation
10:49 < yashraj> great point alecc: and dergoegge: thanks
10:49 < stickies-v> bonus question: are there any (edge) cases where we wouldn't detect we have a stalling node?
10:50 < alecc> someone might've answered this already, but when considering whether we're stalling, does it consider the download times of all other peers (compares if one is relatively slower)?
10:50 < larryruane_> alecc: i don't think so
10:50 < stickies-v> no
10:50 < amovfx> I cant see one
10:51 < dergoegge> alecc: that could have been an alternative to the approach of the PR
10:51 < lightlike> alecc: this data isn't currently tracked. It would be a (more complicated) alternative approach to gather this data, and have some algorithm based on that.
10:51 < larryruane_> stickies-v: on the bonus question, you mean with the PR?
10:52 < yashraj> can a peer serve other blocks within 2s but not the i+1?
10:52 < stickies-v> larryruane_: the edge case I had in mind applies to both with and without this PR, but either would be interesting to hear!
10:52 < alecc> dergoegge: lightlike: makes sense
10:53 < dergoegge> yashraj: yes but the stalling timeout is only reset if a requested block is received
10:54 < larryruane_> stickies-v: that's a toughie .. i can't think of it
10:55 < stickies-v> I think if all of our nodes are just not sending us any blocks at all (probably malicious, or some serious technical issues), then we would just not make any progress at all, and no single node would be marked as stalling
10:55 < stickies-v> it's a very unlikely scenario though, just... theoretical edge case
10:55 < amovfx> if you are eclipsed, that could happen
10:55 < larryruane_> good one!
10:56 < alecc> stickies-v: would churning through not eventually find an honest node? or in the case that you're eclipsed, there's not much you could even do right
10:56 < alecc> or i guess does churning through all you to repeat connections to the same nodes (that would never be marked stalling)?
10:57 < dergoegge> stickies-v: we have a separate timeout for this case (Hint: have a look at m_downloading_since), after which we disconnect those peers.
10:57 < dergoegge> Last question: What approach does this PR take? Is it an appropriate solution?
10:57 < yashraj> can we run out of peers by churning? like for outbound-only?
10:58 < stickies-v> alecc: so we wouldn't disconnect them for stalling, but luckily as dergoegge points out there are other mechanisms that disconnect nodes when they're not sending us anything - and hopefully eventually we'd stumble upon a node that's able to send us something! that second mechanism is slower than the stalling disconnect, though
10:58 < amovfx> dergoegge: adding an adaptive timer? I think it is an appropriate solution.
10:58 < dergoegge> yashraj: your node will make new outbound connections continuously, so running out is not really possible i think 
10:58 < amovfx> Sure hope we get to that atomic question
10:58 < alecc> stickies-v: ahh
10:59 < amovfx> my concurrency game is weak, I need that explained
10:59 < lightlike> yashraj: unlikely, if we know about enough peers in our addrman. also, I don't think we ban these peers, so nothing prevents us from re-connecting to a previously disconnected peer.
10:59 < dergoegge> amovfx: yes, how does it adapt though?
10:59 < amovfx> when a peer is dropped, the disconnect time increases
10:59 < amovfx> for the next peer
10:59 < amovfx> ?
11:00 < larryruane_> dergoegge: "Last question" -- we still disconnect the peer that's stalling us, as before, but we increase the 2s timeout for the replacement peer .. and increase again if it happens again, etc. (exponentially)
11:00 < yashraj> thanks dergoegge: and lightlike:
11:00 < dergoegge> larryruane_ right!
11:00 < amovfx> and if a peer gives a block, the window shrinks?
11:00 < alecc> amovfx: from what i could find, using std::atomic allows for some atomic operations that allows different threads to edit memory at the same time without unexpected behavior (compare_exchange_strong is used in the pr for this reason). i think this is easier than using another mutex/locking? there's probably more to it
11:00 < amovfx> window = timeout window
11:01 < dergoegge> amovfx: the timeout shrinks (by a factor of 0.85) if a new block is connected
11:01 < amovfx> rgr
11:01 < amovfx> alecc: ty
11:02 < stickies-v> alecc: you're right that we use std::atomic to protect against UB during threading, but std::atomic doesn't allow multiple threads to edit memory at the same time - it just prevents data races from happening by using atomic operations, i.e. makes sure that one thread waits for the other one to finish before proceeding
11:03 < dergoegge> #endmeeting
11:03 < amovfx> do we needed that because of the cs_main lock on m_num_preferred_download_peers? What are the threads that are accessing this data here?
11:03 < alecc> stickies-v: ohhh
11:03 < dergoegge> thanks everyone for coming!!!
11:03 < amovfx> Thanks stickies and dergoegge!
11:03 < amovfx> I learned a lot this week
11:03 < larryruane_> dergoegge: stickies-v: thanks, this was really educational!
11:03 < alecc> thanks stickies-v, thanks dergoegge!
11:03 < amovfx> I dont suppose anyone can give pointers on thread sanitizers on M1 mac?
11:03 < amovfx> I think I might have to start developing in a docker container
11:04 -!- alecc [~alecc@s-25-172.flex.volo.net] has quit [Quit: Connection closed]
11:04 < yashraj> thanks everyone, these sessions are mind-expanding, especially for those of us that need it expanded a lot.
11:04 < stickies-v> amovfx: yeah we have one thread `threadMessageHandler` that's continuously calling `SendMessages()` for each peer, which reads and writes `m_block_stalling_timeout`. And then additionally, `m_block_stalling_timeout` is also used in `PeerManagerImpl::BlockConnected()`
11:05 < amovfx> okay ty
11:05 < amovfx> that really clears it up
11:05 < amovfx> man, so much to learn
11:05 < stickies-v> thanks everyone for coming, dergoegge for being an awesome co-host, lightlike for authoring this interesting PR and glozow for doing all the hard prep work and getting the notes/questions up!
11:05 < amovfx> yea, thanks to all for making this possible
11:08 -!- adam2k [~adam2k@ip24-254-208-245.hr.hr.cox.net] has joined #bitcoin-core-pr-reviews
11:12 -!- yashraj [yashraj@gateway/vpn/protonvpn/yashraj] has quit []
11:44 -!- adam2k [~adam2k@ip24-254-208-245.hr.hr.cox.net] has quit [Ping timeout: 252 seconds]
11:45 -!- BlueMoon [~BlueMoon@dgb3.dgbiblio.unam.mx] has quit [Quit: Connection closed]
11:52 -!- amovfx [~amovfx@d75-156-179-9.abhsia.telus.net] has quit [Remote host closed the connection]
11:53 -!- amovfx [~amovfx@d75-156-179-9.abhsia.telus.net] has joined #bitcoin-core-pr-reviews
11:56 -!- sanya [~sanya@109-93-2-111.dynamic.isp.telekom.rs] has quit [Quit: Connection closed]
12:09 -!- jonatack [~jonatack@user/jonatack] has joined #bitcoin-core-pr-reviews
12:14 -!- Talkless [~Talkless@mail.dargis.net] has quit [Quit: Konversation terminated!]
13:03 -!- effexzi [uid474242@id-474242.ilkley.irccloud.com] has quit [Quit: Connection closed for inactivity]
13:32 -!- kcalvinalvin [~kcalvinal@ec2-3-38-183-204.ap-northeast-2.compute.amazonaws.com] has quit [Ping timeout: 255 seconds]
13:32 -!- kcalvinalvin [~kcalvinal@ec2-3-38-183-204.ap-northeast-2.compute.amazonaws.com] has joined #bitcoin-core-pr-reviews
13:49 -!- amovfx [~amovfx@d75-156-179-9.abhsia.telus.net] has quit [Remote host closed the connection]
13:50 -!- amovfx [~amovfx@d75-156-179-9.abhsia.telus.net] has joined #bitcoin-core-pr-reviews
14:01 -!- amovfx [~amovfx@d75-156-179-9.abhsia.telus.net] has quit [Remote host closed the connection]
14:01 -!- amovfx [~amovfx@node-1w7jr9yi65te5rzb5c6wwxbur.ipv6.telus.net] has joined #bitcoin-core-pr-reviews
14:06 -!- hashfunc [~user@2601:5c0:c280:7090:390e:42fb:86a3:2177] has joined #bitcoin-core-pr-reviews
14:09 -!- brunoerg [~brunoerg@2804:14d:5281:8ae2:155e:b629:c8f4:964d] has quit []
14:16 -!- andrewtoth [~andrewtot@gateway/tor-sasl/andrewtoth] has quit [Remote host closed the connection]
14:17 -!- andrewtoth [~andrewtot@gateway/tor-sasl/andrewtoth] has joined #bitcoin-core-pr-reviews
14:24 -!- andrewtoth [~andrewtot@gateway/tor-sasl/andrewtoth] has quit [Ping timeout: 258 seconds]
14:48 -!- amovfx_ [~amovfx@d75-156-179-9.abhsia.telus.net] has joined #bitcoin-core-pr-reviews
14:51 -!- amovfx [~amovfx@node-1w7jr9yi65te5rzb5c6wwxbur.ipv6.telus.net] has quit [Ping timeout: 260 seconds]
15:29 -!- hashfunc [~user@2601:5c0:c280:7090:390e:42fb:86a3:2177] has quit [Remote host closed the connection]
15:34 -!- kevkevin [~kevkevin@2601:243:197e:8f10:a0b4:8a4e:ecef:9233] has quit [Ping timeout: 252 seconds]
15:57 -!- amovfx_ [~amovfx@d75-156-179-9.abhsia.telus.net] has quit [Remote host closed the connection]
15:57 -!- amovfx [~amovfx@d75-156-179-9.abhsia.telus.net] has joined #bitcoin-core-pr-reviews
16:02 -!- kouloumos [uid539228@id-539228.tinside.irccloud.com] has quit [Quit: Connection closed for inactivity]
16:12 -!- andrewtoth [~andrewtot@gateway/tor-sasl/andrewtoth] has joined #bitcoin-core-pr-reviews
16:52 -!- amovfx [~amovfx@d75-156-179-9.abhsia.telus.net] has quit [Remote host closed the connection]
16:53 -!- amovfx [~amovfx@d75-156-179-9.abhsia.telus.net] has joined #bitcoin-core-pr-reviews
16:59 -!- amovfx [~amovfx@d75-156-179-9.abhsia.telus.net] has quit [Remote host closed the connection]
16:59 -!- amovfx [~amovfx@node-1w7jr9yi65te5rzb5c6wwxbur.ipv6.telus.net] has joined #bitcoin-core-pr-reviews
17:02 -!- andrewtoth_ [~andrewtot@gateway/tor-sasl/andrewtoth] has joined #bitcoin-core-pr-reviews
17:05 -!- andrewtoth [~andrewtot@gateway/tor-sasl/andrewtoth] has quit [Ping timeout: 258 seconds]
17:27 -!- kevkevin [~kevkevin@2601:243:197e:8f10:e8c6:5010:3f9f:fe22] has joined #bitcoin-core-pr-reviews
17:30 -!- amovfx_ [~amovfx@d75-156-179-9.abhsia.telus.net] has joined #bitcoin-core-pr-reviews
17:32 -!- amovfx_ [~amovfx@d75-156-179-9.abhsia.telus.net] has quit [Remote host closed the connection]
17:33 -!- amovfx_ [~amovfx@d75-156-179-9.abhsia.telus.net] has joined #bitcoin-core-pr-reviews
17:34 -!- amovfx [~amovfx@node-1w7jr9yi65te5rzb5c6wwxbur.ipv6.telus.net] has quit [Ping timeout: 260 seconds]
18:05 -!- andrewtoth_ [~andrewtot@gateway/tor-sasl/andrewtoth] has quit [Remote host closed the connection]
18:05 -!- ghost43 [~ghost43@gateway/tor-sasl/ghost43] has quit [Write error: Connection reset by peer]
18:06 -!- andrewtoth_ [~andrewtot@gateway/tor-sasl/andrewtoth] has joined #bitcoin-core-pr-reviews
18:06 -!- ghost43 [~ghost43@gateway/tor-sasl/ghost43] has joined #bitcoin-core-pr-reviews
18:19 -!- andrewtoth_ [~andrewtot@gateway/tor-sasl/andrewtoth] has quit [Remote host closed the connection]
18:19 -!- andrewtoth_ [~andrewtot@gateway/tor-sasl/andrewtoth] has joined #bitcoin-core-pr-reviews
19:44 -!- hashfunc [~user@2601:5c0:c280:7090:fdb7:fb0b:e6eb:287] has joined #bitcoin-core-pr-reviews
20:07 -!- hashfunc [~user@2601:5c0:c280:7090:fdb7:fb0b:e6eb:287] has quit [Ping timeout: 244 seconds]
20:20 -!- hashfunc [~user@2601:5c0:c280:7090:fdb7:fb0b:e6eb:287] has joined #bitcoin-core-pr-reviews
20:25 -!- hashfunc [~user@2601:5c0:c280:7090:fdb7:fb0b:e6eb:287] has quit [Ping timeout: 264 seconds]
20:40 -!- gleb10689249 [~gleb@178.150.137.228] has quit [Ping timeout: 265 seconds]
21:02 -!- ghost43 [~ghost43@gateway/tor-sasl/ghost43] has quit [Remote host closed the connection]
21:02 -!- ghost43 [~ghost43@gateway/tor-sasl/ghost43] has joined #bitcoin-core-pr-reviews
21:31 -!- ghost43_ [~ghost43@gateway/tor-sasl/ghost43] has joined #bitcoin-core-pr-reviews
21:33 -!- ghost43 [~ghost43@gateway/tor-sasl/ghost43] has quit [Ping timeout: 258 seconds]
23:44 -!- andrewtoth_ [~andrewtot@gateway/tor-sasl/andrewtoth] has quit [Remote host closed the connection]
23:46 -!- andrewtoth_ [~andrewtot@gateway/tor-sasl/andrewtoth] has joined #bitcoin-core-pr-reviews
--- Log closed Thu Sep 22 00:00:26 2022