--- Log opened Fri Nov 20 00:00:22 2020 00:52 -!- gribble [~gribble@unaffiliated/nanotube/bot/gribble] has quit [Remote host closed the connection] 01:04 -!- gribble [~gribble@unaffiliated/nanotube/bot/gribble] has joined #bitcoin-builds 01:52 -!- jonatack [~jon@213.152.162.79] has quit [Ping timeout: 260 seconds] 03:22 -!- Mariam62Schroede [~Mariam62S@static.57.1.216.95.clients.your-server.de] has joined #bitcoin-builds 05:02 -!- molz_ [~mol@unaffiliated/molly] has joined #bitcoin-builds 05:03 -!- mol [~mol@unaffiliated/molly] has quit [Ping timeout: 260 seconds] 05:04 -!- mol [~mol@unaffiliated/molly] has joined #bitcoin-builds 05:07 -!- molz_ [~mol@unaffiliated/molly] has quit [Ping timeout: 256 seconds] 05:10 -!- mol [~mol@unaffiliated/molly] has quit [Ping timeout: 260 seconds] 05:31 -!- mol [~mol@unaffiliated/molly] has joined #bitcoin-builds 06:50 -!- mol [~mol@unaffiliated/molly] has quit [Ping timeout: 256 seconds] 06:57 -!- gribble [~gribble@unaffiliated/nanotube/bot/gribble] has quit [Read error: Connection reset by peer] 07:11 -!- gribble [~gribble@unaffiliated/nanotube/bot/gribble] has joined #bitcoin-builds 08:25 < dongcarl> Going to dump more details on the build non-determinism we saw here for everyone. 08:25 < dongcarl> I first encountered this while finalizing Guix's macOS reproducibility. I found that sometimes builds were identical, but sometimes they weren't. 08:26 < dongcarl> When they weren't it was the bitcoin-qt binary that differed. It differed in 2 places: 1. A word early in the binary, which I assumed was the checksum, and 2. a few words in the text section, which I assumed was code 08:27 < dongcarl> Then, I ran 2 builds of our depends QT package, and saw that there was a difference in `qpaintengine_raster.o` that matched the difference in bitcoin-qt exactly 08:28 < dongcarl> For those who want to play with this, qpaintengine_raster.o was archived into libQt5Gui.a 08:29 < dongcarl> Examining that object file with otool, we see that it is the `qt_intersect_spans` function is where the non-determinism comes from 08:30 < dongcarl> I'm not the best at brain-disassembly, so if anyone's somewhat good at that please let me know. 08:31 < dongcarl> However, in the meantime Cory found that llvm-9/clang-9 did not have this problem 08:31 < dongcarl> So Cory bisected from llvm-8 to llvm-9, and found the fix: https://reviews.llvm.org/D64601 08:32 < dongcarl> For Guix builds, I can just apply the patch to our llvm and rebuild the runtime and clang based on that fixed llvm. However, Gitian is another story 08:33 < achow101> Can we just use clang 9? 08:33 < achow101> Or is 8.x a hard requirement? 08:34 < dongcarl> 8.x is somewhat of a requirement because Xcode ships with a "blessed" apple-clang, and we've determined that "blessed" version corresponds to "clang 8" 08:35 < dongcarl> The reason why we're careful about this is because Apple's toolchain ecosystem is very tightly knit, and doesn't pretend to promise that things will just work together fine 08:37 < dongcarl> However, it can be argued that this is a somewhat theoretical concern, and my cross builds using clang-9 seem to work fine (there may still be corner-cases we're not seeing) 08:38 < dongcarl> wumpus fanquake achow101 luke-jr / anyone else interested 08:39 < achow101> So what are the current potential solutions? 08:40 < dongcarl> I think the best way would be to figure out how to _not_ trigger this non-deterministic llvm codepath in qt's qt_intersect_spans 08:41 < dongcarl> And patch qt appropriately 08:41 < dongcarl> Given that we need this fixed somewhat soon 08:50 -!- emzy [~quassel@unaffiliated/emzy] has joined #bitcoin-builds 08:53 < wumpus> does anything we do use that function? easiest would be to patch it out :) 08:54 < dongcarl> Not a terrible idea... Hopefully it's not super-critical in qt itself... 08:54 < dongcarl> Looking 08:56 < wumpus> if that turns out to be a can of worms my vote would also be to try to upgrade the compiler version 08:59 < dongcarl> It's... used in quite a lot of places :-( 08:59 < wumpus> bleh 08:59 < dongcarl> Funny that you guys run into this the day after Cory and I started debugging it haha 09:01 < achow101> dongcarl: can you link to the llvm fix? 09:02 < dongcarl> achow101: bruh scroll up: https://reviews.llvm.org/D64601 09:02 < achow101> nah 09:02 < achow101> scrolling is hard 09:03 < wumpus> do you know, what is the source of this nondterminism? if it's say, malloc'ed memory, there's some glibc malloc debug option that clears all memory after allocation that could be used as workaround 09:04 < wumpus> we've used this in the past to work around nondeterminsm in mingw 09:04 < wumpus> if it's a stack variable that ofc won't work 09:04 < dongcarl> wumpus: I'll be honest I don't fully understanding the clang patch, it's at the LLVM bitcode level and I don't have any familiarity with that :-( 09:05 < dongcarl> It looks like it has something to do with clang's MemorySSA feature: https://llvm.org/docs/MemorySSA.html 09:07 < wumpus> FWIW the setting was "export MALLOC_PERTURB_=255" 09:09 < dongcarl> Ah right I remember seeing that somewhere 09:10 < wumpus> if that plugs the non-determinism for now it'd be a quick workaround 09:14 < achow101> from a quick look at the code comments, it seems like they were using SmallPtrSet which doesn't track insertion order. They switched to SmallSetVector which does track insertion order. Presumably the insertion order effects some ordering later 09:16 < dongcarl> I'm going to guess that it has something to do with "dead blocks" in `qt_intersect_spans`'s while loop 09:16 < dongcarl> I asked on llvm, but no answer... 09:17 < wumpus> pretty sure their answer will be 'upgrade your compiler' 09:25 < wumpus> I mean, compare someone asking something about bitcoin core 0.17 or such :) 09:28 -!- mol [~mol@unaffiliated/molly] has joined #bitcoin-builds 09:43 < dongcarl> :) 10:51 -!- jonatack [~jon@88.124.242.136] has joined #bitcoin-builds 10:52 -!- glozow [uid453516@gateway/web/irccloud.com/x-rherywbkfykihtyw] has joined #bitcoin-builds 10:56 -!- jonatack [~jon@88.124.242.136] has quit [Ping timeout: 265 seconds] 10:58 -!- jonatack [~jon@88.124.242.136] has joined #bitcoin-builds 11:03 -!- jonatack [~jon@88.124.242.136] has quit [Ping timeout: 240 seconds] 11:04 -!- jonatack [~jon@213.152.162.181] has joined #bitcoin-builds 11:45 < achow101> i've rebuilt gitian with cleaned caches multiple times now and i'm getting the other build result now 11:45 < achow101> istm if this llvm issue is the non-determinism, this is a strange result 11:49 < achow101> dongcarl: can you post a diff of the nondeterminism you were investigating 11:50 < dongcarl> https://imgur.com/a/09mOypX 11:51 < dongcarl> achow101: ^ 11:52 < dongcarl> achow101: lmk if you want otool output 11:52 < achow101> nah, the diff is almost identical lol 11:53 < achow101> the problem is literally one opcode 11:55 < dongcarl> achow101: https://paste.sr.ht/~dongcarl/c840f05046bebf9f071639beb52cd0bb3c04ddc7 11:56 < dongcarl> 3 differences: movswq/movswl, nopl/nopw, movslq 11:58 < achow101> i don't see the movslq difference in mine 11:59 < achow101> oh i see it now 12:01 < dongcarl> achow101: https://github.com/bitcoin/bitcoin/pull/20436 12:03 < achow101> thanks i hate it 13:05 < achow101> I may have a patch to qt_intersect_spans 13:55 < achow101> nope i don't 23:11 -!- jonatack [~jon@213.152.162.181] has quit [Ping timeout: 260 seconds] 23:13 -!- jonatack [~jon@88.124.242.136] has joined #bitcoin-builds --- Log closed Sat Nov 21 00:00:23 2020