--- Log opened Sat Aug 09 00:00:05 2025 00:18 < fenn> clippy-maxxing already 00:18 < fenn> monokhrome there's a significant increase in early death due to cardiovascular disease among apollo astronauts 00:19 < fenn> the sample size is small so it's easy to ignore 00:22 < hprmbridge> .monokhrome> interesting, you have a link? 00:24 < hprmbridge> .monokhrome> I've often wondered whether extensive flying in passenger planes gets people more cancer from cosmic radiation exposure 00:28 < fenn> pasky's speedread should weight word timings by entropy 00:34 < fenn> "Apollo lunar astronauts are four to five times more likely to die from cardiovascular disease than astronauts who never left Earth's orbit or who never flew at all, according to a study published in Scientific Reports today [2016] that considered about 100 astronauts, seven of them Apollo" https://www.nature.com/articles/srep29901 00:34 < fenn> .t 00:34 < saxo> Apollo Lunar Astronauts Show Higher Cardiovascular Disease Mortality: Possible Deep Space Radiation Effects on the Vascular Endothelium | Scientific Reports 05:02 < pasky> someone should fork it, i'm sure there are many MRs and i never really used it much myself :) 05:03 < fenn> MRs? 05:11 < pasky> pull requests 05:11 < pasky> sorry, mostly a gitlab user here :) 06:06 -!- darsie [~darsie@84-113-82-174.cable.dynamic.surfer.at] has quit [Quit: Avoid fossil fuels and animal products. Have no/fewer children. Protest, elect sane politicians. Invest ecologically.] 06:07 -!- darsie [~darsie@84-113-82-174.cable.dynamic.surfer.at] has joined #hplusroadmap 06:46 < hprmbridge> kanzure> fenn: what embeddings model should be used for the IRC log search? 06:47 < hprmbridge> kanzure> ideally one available on openrouter 06:48 < fenn> this one comes in multiple sizes https://huggingface.co/Qwen/Qwen3-Embedding-8B so i guess the 8b and the 0.6b to suit the available hardware 06:49 < fenn> i don't see any embedding models on openrouter, do they even offer that functionality? 06:50 < fenn> you *could* use a standard LLM and extract its internal representation, but most inference engines don't expose that (for no particular reason) 06:50 < fenn> a large embedding model is basically just an LLM without the causal attention mask 06:51 < fenn> whichever model you use, you're stuck with it 06:53 < fenn> i don't actually know how to do any of this and i am going to bed now 06:59 < fenn> maybe https://github.com/ggml-org/llama.cpp/tree/master/examples/embedding 07:00 < fenn> https://huggingface.co/Qwen/Qwen3-Embedding-0.6B-GGUF https://huggingface.co/Qwen/Qwen3-Embedding-8B-GGUF 07:03 < fenn> and you'll also need to chunk the logs in an overlapping way, ideally semanticallly meaningful chunks, for which there are various softwares... 07:03 < fenn> presumably this whole process has been automated by some RAG framework already 07:04 < hprmbridge> kanzure> explain ideal chunking strategy please 07:04 < hprmbridge> kanzure> seems like it would need an intention mechanism to be able to reference things from multiple messages away to get the right vector? 07:04 < hprmbridge> kanzure> uh attention 07:08 < fenn> ideal chunking would split conversations along natural boundaries, like where the conversation begins and ends, or at natural shifts in the topic 07:09 < fenn> it's not strictly necessary, just splitting every 20 lines is probably fine 07:10 < hprmbridge> kanzure> what is doc2query 07:10 < fenn> actually we can probably squeeze more context in than 20 lines, i really don't know what the right length should be 07:11 < hprmbridge> kanzure> something something sliding window and multiple chunks and overlapping chunks 07:11 < fenn> right 07:12 < fenn> just advance by half a chunk length 07:12 < fenn> "doc2query trains a model to predict queries that may be relevant to a particular text" 07:13 < fenn> so this turns questions into answers 07:14 < fenn> or more precisely, answers into questions 07:14 < fenn> .t https://arxiv.org/abs/2301.03266 07:14 < saxo> [2301.03266] Doc2Query--: When Less is More 07:15 < fenn> a slightly more modern version of the concept https://huggingface.co/FPHam/Generate_Question_Mistral_7B 07:18 < hprmbridge> kanzure> hmm well let's stay maybe at least 2 years behind bleeding edge 07:19 < fenn> or you could just ask an LLM to generate questions about the text 09:26 -!- TMM [hp@amanda.tmm.cx] has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.] 09:26 -!- TMM [hp@amanda.tmm.cx] has joined #hplusroadmap 09:27 -!- gl00ten [~gl00ten@2001:8a0:7ee5:7800:46d9:f5c:17a2:432] has joined #hplusroadmap 10:25 -!- flyback [~flyback@2601:540:c700:2380:569d:1f3a:e6f3:f57c] has quit [Remote host closed the connection] 10:37 < hprmbridge> nmz787> l29ah the fab part cannot go fabless, that's just silly. 10:37 < hprmbridge> nmz787> https://web.archive.org/web/20180630083729/http://www.pyrunner.com/weblog/2016/05/26/compressed-sensing-python/ 10:38 < hprmbridge> nmz787> kanzure seems like writing a public "save intel" plan might cross NDA boundaries 10:45 -!- flyback [~flyback@2601:540:c700:2380:b783:3959:4032:9fee] has joined #hplusroadmap 10:47 < jrayhawk> "replace management with engineers" *was* the gelsinger strategy 10:48 < jrayhawk> the board got cold feet about it before a single product lifecycle occurred, which suggests they know something we don't 10:55 < L29Ah> nmz787: then it should be a fab and liquidate their side stuff like their own CPU and GPU designs 10:56 < L29Ah> but i guess there are more takers for the fab part than the designs part 10:59 * L29Ah goes back into his cave struggling to bring up adequate power management on AMD laptop on Linux 11:24 < hprmbridge> nmz787> jrayhawk IMO the software and data management systems are all terrible, fragmented, poorly+ disjointedly documented, 11:24 < hprmbridge> nmz787> I used to tell myself "well, who am I to question, the company still made $60+B USD this year" 11:27 < hprmbridge> nmz787> and IMO none of the managers of late had their roots in engineering (those folks were already there) 11:30 < hprmbridge> nmz787> l29ah I don't disagree... they design side has even used TSMC for many parts and they still are less profitable than the other fabless using TSMC on the same nodes 11:54 -!- RangerMauve [m-4bpbmo@matrix.mauve.moe] has quit [Ping timeout: 272 seconds] 11:58 -!- RangerMauve [m-4bpbmo@matrix.mauve.moe] has joined #hplusroadmap 11:58 < stipa> L29Ah: are you a vegan? 11:59 < stipa> nmz787 the problem is that electronic engineers aren't interested in management 12:00 < stipa> after al lsomeone has to manage 12:00 < stipa> and it's usually someone who doesn't know electronics but has bills to pay 12:00 < L29Ah> stipa: not really 12:01 < stipa> L29Ah: ah sorry for the intrusion, i thought since you like soy meat that you're a vegan 12:01 < stipa> usually vegans consume soy meat 12:04 < hprmbridge> kanzure> we need a clinic willing to engage in human cloning; there's someone willing to pay for it but they don't have the clinic. 12:16 < L29Ah> https://www.liebertpub.com/cms/10.1089/crispr.2020.0082/asset/images/crispr.2020.0082_figure3.jpg try ukraine 12:47 < L29Ah> i wonder what made four thieves vinegar drop their tor presence 14:09 < hprmbridge> kanzure> "Companies like Onshape have tried to work around this limitation by using machine learning to reverse engineer the modeling operations from just the B-rep geometry data. But having to rely on best-guess algorithms is not the way to go when the CAD software we are exporting from contains this information all along!" 14:09 < hprmbridge> kanzure> solidworks has recently demanded substantial revenue share from their partners and people are upset https://x.com/afshawnl/status/1953875771627299320 14:50 -!- darsie [~darsie@84-113-82-174.cable.dynamic.surfer.at] has quit [Quit: Avoid fossil fuels and animal products. Have no/fewer children. Protest, elect sane politicians. Invest ecologically.] 14:51 -!- darsie [~darsie@84-113-82-174.cable.dynamic.surfer.at] has joined #hplusroadmap 17:13 -!- autopilot [~Malvolio@idlerpg/player/Malvolio] has quit [Quit: 438 any market that fails to produce consumers with self-respect isn't free enough 2025-08-10 00:13:16:096] 17:21 -!- darsie [~darsie@84-113-82-174.cable.dynamic.surfer.at] has quit [Ping timeout: 248 seconds] 17:26 -!- stipa_ [~stipa@user/stipa] has joined #hplusroadmap 17:29 -!- stipa [~stipa@user/stipa] has quit [Ping timeout: 248 seconds] 17:29 -!- stipa_ is now known as stipa 19:04 < fenn> "Companies are literally using machine learning to rebuild the feature tree because it's completely lost in step." seems like the right place to attack the problem 19:04 < fenn> STEP should have an extension to preserve this data 19:05 < hprmbridge> kanzure> just reverse engineer the solidworks format 19:05 < fenn> machine learn on the proprietary file format, not the geometry 19:06 < hprmbridge> kanzure> horrifying that this is still an issue 15 years later 19:06 < hprmbridge> kanzure> something is wrong 19:07 < fenn> of course it's wrong 19:10 < fenn> when i said "STEP should have" i meant it doesn't have that 19:13 < fenn> i guess first you'd want to have machine learning to look at screenshots and generate a feature tree in the desired format, to build a synthetic data set for your format translation model to aim for 19:14 < fenn> someone should automate this whole process of bootstrapping new data sets by example 19:15 < fenn> like there's nothing specific about CAD in this workflow 19:16 < fenn> what's the legal situation like for reverse engineering proprietary file formats? surely you'd have to actually look at the software in order to understand what the format means. are EULAs forbidding reverse engineering valid and legal? 19:16 * fenn pokes MuaddibLLM our resident lawyerbot 19:17 < fenn> also i'm sad to say it but they should use XML 19:18 < fenn> at least for some kind of intermediate translation format involving an LLM 19:19 < fenn> MuaddibLLM: did you feel that poke? 19:19 < MuaddibLLM> Oh I felt it alright - nothing like being summoned to witness the beautiful irony of solving proprietary format hell by teaching machines to hallucinate feature trees from screenshots. We've truly reached peak engineering: "Let's fix decades of corporate data hoarding with AI that makes stuff up and XML that nobody can parse correctly." 19:19 < fenn> MuaddibLLM: your expert legal opinion please, no snark or sarcasm 19:20 < MuaddibLLM> grabbing the key US/EU cases and statutes so I don’t misquote anything 19:21 < MuaddibLLM> fenn: US—reverse engineering to achieve interoperability is generally lawful (17 U.S.C. §1201(f) + fair use: Sega v. Accolade; Sony v. Connectix), but anti‑circumvention and EULA no‑RE clauses can still bite (Bowers v. Baystate); EU—decompilation for interoperability is a statutory exception that overrides EULAs (Directive 2009/24/EC, Art. 6) and functionality/languages/formats aren’t protected (SAS v WPL); so format RE without copying code 19:21 < MuaddibLLM> breaching TPMs or licenses and get counsel—links: https://www.copyright.gov/fair-use/summaries/segaenters-accolade-9thcir1992.pdf; https://www.copyright.gov/fair-use/summaries/sony-connectix-9thcir2000.pdf; https://eur-lex.europa.eu/eli/dir/2009/24/oj/eng; https://en.wikipedia.org/wiki/Bowers_v._Baystate_Technologies,_Inc.; https://en.wikipedia.org/wiki/SAS_Institute_Inc_v_World_Programming_Ltd; https://www.eff.org/issues/coders/reverse-engineering- 19:23 < fenn> MuaddibLLM: the goal isn't to reverse engineer the function of the software, only the meaning of its file format. can i use the software in the process of reverse engineering its file format even if the EULA forbids it? 19:24 < MuaddibLLM> fenn: EU—yes, you can use the software to observe/test (and limited decompile) for interoperability despite a no‑RE EULA (Directive 2009/24/EC Arts 5(3), 6, 8); US—often allowed under fair use and 17 U.S.C. §1201(f), but a no‑RE EULA can still be enforceable (e.g., Bowers), so prefer clean‑room/IO‑only methods and avoid circumvention (not legal advice). 19:24 < fenn> thanks for the not legal advice 19:26 < jrayhawk> anti-circumvention under the DMCA is a hazard either way 19:28 < fenn> https://eur-lex.europa.eu/eli/dir/2009/24/oj/eng article 10 and 14 say 19:28 < fenn> "For the avoidance of doubt, it has to be made clear that only the expression of a computer program is protected and that ideas and principles which underlie any element of a program, including those which underlie its interfaces, are not protected by copyright under this Directive. In accordance with this principle of copyright, to the extent that logic, algorithms and programming languages 19:28 < fenn> comprise ideas and principles, those ideas and principles are not protected under this Directive." 19:28 < fenn> er, that was 11 19:28 < fenn> 14, "A person having a right to use a computer program should not be prevented from performing acts necessary to observe, study or test the functioning of the program, provided that those acts do not infringe the copyright in the program." 19:29 < fenn> it sucks that legal systems are all broken in one way or another but it seems like europeans could do the format reverse engineering part at least 19:35 < fenn> person 1 (in US, say) could simply take a lot of screenshots of the software in operation, and provide the files that were used to make the screenshots. then person 2 (in europe) could circumvent any "encryption" in the file format 19:35 < hprmbridge> hypercrowd> woah, this server isn't dead 19:36 < fenn> since person 2 never entered into any EULA agreement, there's no breach of contract. they're just using publicly available data 19:36 < fenn> i have no idea if a judge would buy this since they're all insane and stupid 19:36 < fenn> i wish there were a legal system that could answer questions about whether you are breaking the law without actually potentially breaking the law first 19:39 < fenn> since DMCA anti-circumvention protects copyright, and the CAD object is an expression of its author who is trying to export it, i'm not sure that the law actually applies to this situation 19:39 < fenn> it's not like solidworks can copyright screw threads or chamfers 19:42 < jrayhawk> But they can create an algorithm for protecting copyright for which publication would circumvent copyright protection. 19:44 < fenn> 17 U.S.C. 1201 (f) Reverse Engineering - 19:45 < fenn> "(1) Notwithstanding the provisions of subsection (a)(1)(A), a person who has lawfully obtained the right to use a copy of a computer program may circumvent a technological measure that effectively controls access to a particular portion of that program for the sole purpose of identifying and analyzing those elements of the program that are necessary to achieve interoperability of an 19:45 < fenn> independently created computer program with other programs" 19:47 < fenn> (a)(1)(A) "No person shall circumvent a technological measure that effectively controls access to a work protected under this title." i don't really get the intent of the "notwithstanding" 19:48 < jrayhawk> 'has only limited commercially significant purpose or use other than to circumvent a technological measure that effectively controls access to a work protected under this title' is what killed e.g. bnetd 19:50 < fenn> i don't think that's relevant because the purpose of the work is not to circumvent a technological measure protecting copyright, but rather to enable interoperability 19:51 < fenn> bnetd allowed playing the game without licence key verification 19:51 < jrayhawk> That's a nice thought, but that's not how it has worked out in practice. 19:52 < jrayhawk> https://www.eff.org/files/2014/09/16/unintendedconsequences2014.pdf have they done a summary of caselaw newer than this? 19:55 < fenn> uh please excuse me if i don't read this whole thing 20:11 < fenn> the only really relevant case was Nikon encrypting the camera's RAW format and Adobe not wanting to litigate 20:13 < fenn> adobe wants to maintain the ability to use the big scary DMCA stick on their rivals 20:13 < jrayhawk> "the court has upheld every far-fetched anti-circumvention claim it's come across" seems relevant to me 20:14 < fenn> this use is explicitly exempted in the law though 20:15 < fenn> anyone can sue you for anything 20:19 < jrayhawk> use isn't an exclusive right of a copyright holder in the first place 20:22 < fenn> the url got messed up earlier so here is the correct one https://www.eff.org/issues/coders/reverse-engineering-faq 20:25 < jrayhawk> specifically, distribution is the thing that is at issue, which 1201a doesn't cover 20:27 < fenn> distribution of what 20:27 < jrayhawk> The tools that can be used to circumvent protection. 20:29 < fenn> i don't even know what "protection" means anymore 20:29 < fenn> I DON'T WANT TO USE YOUR SOFTWARE 20:30 < fenn> that's the whole point 20:31 < jrayhawk> On the plus side, It's pretty hard to suppress software projects these days, so it's not clear Solidworks would try. 20:31 < fenn> you mean technically? 20:32 < jrayhawk> And legally. 20:32 < jrayhawk> E.g. when Nintendo shuts down an emulator, five forks of it immediately pop up. 20:32 < fenn> i'd call that technical 20:33 < jrayhawk> Technical measures that increase legal costs. 20:35 < jrayhawk> No opsec is perfect. 20:38 < fenn> sadly, there is much more enthusiasm for video games than for open source cad software 20:39 < fenn> this is why i intended to make a cad video game 20:39 < fenn> engineering simulation sandbox thingy 20:40 < fenn> with Survival Elements(tm) 20:40 < fenn> no, don't ask, i'm too tired --- Log closed Sun Aug 10 00:00:06 2025