--- Log opened Wed Jan 29 00:00:02 2025 00:00 < TMA> wire resistance is not futile. it is not even negligible. if the wire is "long" (in a cpu die, "long" might be on the order of 1e-10 furlongs) it matters a lot. that's why wrong gauge extension cords are such hazard when improperly used 00:16 < jrayhawk> macrogridding the U.S. alone is estimated to be 8 billion dollars in transmission lines https://www.nrel.gov/docs/fy21osti/76850.pdf and have a 30 year payoff time 00:16 < jrayhawk> numbers get less favorable once oceans get involved 00:40 -!- gl00ten [~gl00ten@2001:8a0:7ee5:7800:46d9:f5c:17a2:432] has joined #hplusroadmap 00:41 -!- gl00ten [~gl00ten@2001:8a0:7ee5:7800:46d9:f5c:17a2:432] has quit [Read error: Connection reset by peer] 02:26 -!- TMM [hp@amanda.tmm.cx] has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.] 02:26 -!- TMM [hp@amanda.tmm.cx] has joined #hplusroadmap 02:32 < L29Ah> https://en.wikipedia.org/wiki/HVDC_Ekibastuz–Centre sad story of an especially long and powerful grid transmission line 02:32 < L29Ah> the corona losses were <1% @ 5.5GW @ 2400km tho 02:34 < L29Ah> hmm, no, not this one, https://en.wikipedia.org/wiki/Ekibastuz–Kokshetau_high-voltage_line 02:36 < fenn> unifying the world political situation is the harder task 02:36 < L29Ah> so @ 432km 02:38 < L29Ah> fenn: i don't think the energy market needs much political unification as it's quite attractive for every participant 02:38 < fenn> ewan mcgregor did a "long way round" motorcycle trip from scotland through europe russia asia alaska canada, which follows the path with the shortest sea stretches 02:38 < L29Ah> the whole world trades in oil these days 02:42 < fenn> 8 billion is the price of one nuclear power plant these days 03:03 < hprmbridge> kanzure> even the modular ones? 03:04 < fenn> https://en.wikipedia.org/wiki/Olkiluoto_Nuclear_Power_Plant 03:50 < hprmbridge> kanzure> https://unsloth.ai/blog/deepseekr1-dynamic 03:53 < fenn> it would be cool if we actually did that thing with bitnet where you don't do matrix multiplication, only addition 03:54 < fenn> IQ1_M is not "good" even with their fancy dynamic pixie dust 03:55 < L29Ah> apparently Akademik Lomonosov's are planned to be sold $1.7G a piece 03:56 < fenn> those are the nuclear barges? 03:56 < L29Ah> yes 03:56 < fenn> for 70MW is not a good deal 03:57 < fenn> on the other hand there can't be any cost overruns 03:58 < fenn> "The Akademik Lomonosov originally was expected to cost about $140 million." 03:58 < fenn> perhaps i should say "any more cost overruns" 04:20 < fenn> sigh. i really expected to see some hard numbers on KL divergence or perplexity loss in that unsloth dynamic quantization post 04:24 < kanzure> fenn: what local llm hardware should i get? 04:24 < kanzure> https://x.com/carrigmat/status/1884244369907278106 04:25 < fenn> 2x3090 04:25 < kanzure> not 5090? 04:25 < kanzure> "24 x 32GB DDR5-RDIMM modules" 04:26 < kanzure> (in that previous link!) 04:27 < fenn> so far all the usable open weights models are in the 70B range, and you dont get anything by adding a few bits of bit depth. more gigs helps with longer contexts i guess 04:28 < fenn> can you actually buy a 5090 04:28 < fenn> i would not bother with that CPU only stuff, it's too slow for a reasoning model 04:29 < L29Ah> i wonder if it's faster on lower bit quants 04:30 < fenn> almost certainly 04:30 < L29Ah> on my laptop q4 is twice faster than q8 04:30 < kanzure> also very unclear to me if nivida digits is a good idea. hacker news thought so, but reddit r/locallama thought it was a weird joke. 04:30 < L29Ah> (1ch DDR4) 04:31 < fenn> so you go from a 2 minute waiting time to a 1 minute wait 04:31 < kanzure> i think that you could get some amount of privacy without going local by not using any of the major consumer LLM products, such as through huggingface or cloud providers 04:32 < fenn> digits doesn't seem useful for LLMs 04:32 < fenn> i don't know what it's for. it's small and cute i guess 04:32 < kanzure> but i haven't seen a cost analysis as to total cost of ownership of "rent cloud GPU / other non-major-API-provider" vs local hardware 04:32 < kanzure> by "rent cloud GPU" i mean fractionally 04:32 < fenn> uhh well renting is several orders of magnitude cheaper 04:33 < fenn> payback time for owning your own 4090s is like 8 months if you use them 100% of the time 04:33 < kanzure> chatgpt-4o gets low prices by parallel gpu hardware utilization between multiple customers i thought? 04:33 < fenn> yes and you will have no way of utilizing your hardware to anywhere near that extent, not least because of batch size 04:34 < kanzure> right, so local should always be more expensive 04:34 < fenn> yes 04:34 < fenn> it might have worked out differently if project north star (spiking neural network ASIC) had continued development 04:34 < kanzure> but maybe tradeoff some cost for privacy and get something that isn't exactly local hardware but not exactly "go to chatgpt.com" 04:35 < fenn> at least with a vast.ai node you don't know for certain that your data is being hoovered up 04:35 < fenn> like, it's probably just a guy with some computers 04:36 < fenn> and the new one, salad 04:36 < fenn> nobody uses petals and you can't sell GPU time on petals, it annoys me. that's a zillion dollar startup idea right there 04:37 < kanzure> besides the privacy tradeoff there's also $/(access to bleeding edge models that don't run on whatever local hardware you picked) which is probably more useful in the long run to target (eg better models, various advancements) than total local privacy 04:37 < fenn> instead of hosting an entire LLM on a single box, you have layer 1-10 on box A, 11-20 on box B, etc. and send partially processed tokens over the internet (4kB/token) 04:39 < fenn> basically recreating the fancy datacenter topologies over the internet 04:40 < fenn> most gamers have one GPU, not 8 04:41 < kanzure> where does 4 kb/token come from here? 04:42 < fenn> the size of the output of each transformer block, and activation bit depth 04:42 < fenn> i calculated it once, might be wrong by an order of magnitude 04:42 < L29Ah> https://github.com/b4rtaz/distributed-llama 04:44 < L29Ah> i wonder if https://stablehorde.net/ is any good 04:44 < fenn> no 04:46 < fenn> L29Ah: i had that distributed-llama bookmarked but hadn't actually looked at it in depth. it's cutting the layer cake the other direction, splitting each layer in half or quarters etc, which uses way more bandwidth than cutting the layers cleanly at the joints 04:46 < kanzure> "come and take it" is a stronger argument for local hardware than just privacy. remote access won't completely go away because vpn and international market access, but i do think available publicly purchasable/rentable supply can dry up pretty fast especially if you have spikey usage or want to have an SLA with yourself. 04:46 < kanzure> so i've argued myself back to local hardware 04:47 < fenn> yes well you can do both 04:47 < kanzure> hm okay 04:47 < fenn> it's much easier to set up weird stuff on local hardware and fall asleep in the chair and come back later to a working system 04:48 < fenn> like it doesn't have to capable of running eleventy billion parameter models 04:48 < fenn> 32b is probably the sweet spot, unfortunately qwen is the only recent model in that size range 04:48 < kanzure> how does "you can do both" change what hardware to target for local? keeping up with leading models locally seems like a losing strategy in terms of dollars spent over time. 04:49 < kanzure> hm 04:49 < kanzure> i need one of those multi-variable phase change diagrams for this 04:49 < kanzure> with funny xkcd annotations to keep my attention while reading it 04:50 < fenn> for experimental and privacy-sensitive things you run a local model and accept the performance hit 04:51 < fenn> if it's really such a hard problem that the local model can't solve it, you use the local model to strip out privacy relevant things and ask a big model to do it online. if you can't do this you have to use your brain 04:51 < fenn> i've heard a 5090 is just a 4090 with 30% more stuff and 30% more price 04:52 < kanzure> there's also werid sequence of returns risk with buying into the market at the wrong time, where waiting an extra 6 or 12 months might put you on a better upgrade cycle 04:52 < kanzure> (i've had this with smartphones before-- perpetually 3-6 months off from the next model release but need new phone because phonedeath) 04:52 < fenn> so far the GPUs seem to hold their value as long as it's not the latest model 04:53 < kanzure> what about actually renting equipment that gets sent to you locally 04:53 < fenn> those cheap M40s are now 2x the price ($210 from some guy in china) 04:53 < kanzure> local llm enjoyers would rent equipment and pay monthly to have local equipment, and later send it back for upgraded equipment, and downmarket gets lower prices on last year's equipment 04:53 < fenn> who would send you $20k in GPUs and how much would that cost 04:54 < kanzure> less than a car payment 04:54 < fenn> so leasing basically 04:54 < fenn> odd that english doesn't have words for these things 04:55 < fenn> you can rent a GPU online billed by the second 04:55 < fenn> shipping will take at least a day 04:56 < fenn> also power infrastructure requirements that may not exist in residential settings 04:56 < kanzure> rent online just means you're back to vast. also, it's not the same as leasing or having possession. 04:56 < kanzure> or, er, having title to the gpu 04:58 < fenn> there are probably lots of fun ways to exfiltrate data if you have physical access before and after 04:59 < fenn> i'm not sure what the threat model is here 04:59 < kanzure> at the end of this rabbit hole is "trust nvidia or TSMC with a TPM or trustzone SGX thing" 05:00 < kanzure> well not threat model so much as wanting to spend dollars in a way that optimizes across a handful of different goals, including price per query, keeping up with recent models (not necessarily bleeding edge), physical possession if necessary, de-risk against regulatory risk, privacy 05:01 < fenn> bleeding edge model is way more important than bleeding edge GPU 05:02 < fenn> software moves faster than hardware 05:02 < kanzure> maybe, but there's also an extent to which having somewhat physical possession of somewhat-recent GPU is also important there, like you don't care what exact model of GPU you end up with when the game of musical chairs ends, but you do care that you have something 05:02 < fenn> i think we are in the self-improving-AI phase already 05:03 < kanzure> leasing GPUs/workstations is different from leasing cars, because car speed isn't exponentially increasing every few years 05:04 < kanzure> nobody would care what car they have as long as they are also enjoying recent car speed improvements 05:05 < fenn> there's this hypothetical lawson criteria thing i've half baked, it's like the product of (compute * memory bandwidth * interconnect bandwidth)/(price * power) 05:06 < kanzure> i'll admit that power requirements are a thorn in any of this. for local llm hardware leasing yeah you need to specialize in low power :\. 05:06 < kanzure> until distributed gets worked out. 05:06 < fenn> you COULD hook up 80 raspberry pi nodes and run a giant model at 2 seconds per token 05:07 < kanzure> https://qwenlm.github.io/blog/qwen2.5-max/ 05:08 < fenn> nooo i'm trying to go to bed 05:08 < fenn> too much progress! 05:09 < fenn> oh it's just their big paywalled model 05:10 < fenn> when you see benchmark scores >90 it's a useless benchmark 05:11 < kanzure> goodwin's law? 05:11 < kanzure> uh sorry not that one 05:11 < kanzure> goodhart's law 05:11 < fenn> goodhart 05:12 < kanzure> hm this is slightly different, goodhart is any time you measure anything it ceases to be a good measurement 05:12 < fenn> no it's not that, it's just that the benchmarks often contain errors or impossible to answer questions 05:12 < kanzure> you seem to be proposing zeno's torture nexus 05:13 < fenn> i'd like to show you a chart in a recent AI-explained video showing how time to saturate benchmarks is decreasing over time, but i'm youtube-challenged with this laptop at the moment 05:13 < fenn> just imagine a bunch of lines with increasing slopes, and time is on the X axis 05:14 < fenn> the most recent "hard" benchmark GPQA is now at like 80% solved. they've come up with "humanity's last stand" (exam) which is supposed to be ludicrously difficult 05:15 < fenn> GPQA is really fuckin hard. i would respect anyone that could solve even one question 05:16 < fenn> there is probably some method of having a dumb LLM talk to you and extract all the necessary info to construct a good prompt to feed into a reasoning model and get good results 99% of the time 05:17 < fenn> this way you could run a big slow model and not feel impatient 05:24 < juri_> fenn: was that a diagnosis, above? :P 05:24 < L29Ah> > highly skilled non-expert validators only reach 34% accuracy, despite spending on average over 30 minutes with unrestricted access to the web (i.e., the questions are "Google-proof") 05:24 < L29Ah> doesn't seem too respectable 05:25 < kanzure> zeno's postmodern paradox: "someone keeps moving these god damned goalposts" 06:47 -!- darsie [~darsie@84-113-82-174.cable.dynamic.surfer.at] has joined #hplusroadmap 07:22 -!- TMM [hp@amanda.tmm.cx] has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.] 07:22 -!- TMM [hp@amanda.tmm.cx] has joined #hplusroadmap 07:42 < juri_> fenn: oh. i guess "ai infrastructure in europe" is a diagnosis, too. glad i have fans. :) 08:13 < kanzure> $/token is kind of a weird metric if we are selling intelligence shouldn't it be value/token as the metric? higher intelligence should be able to achieve outsized results with fewer more meaningful valuable tokens. 08:14 < L29Ah> i suspect all deepseek-likes producing a lot more tokens for a result compared to prior art due to their lengthy thinking 10:16 < kanzure> https://letsencrypt.org/2025/01/22/ending-expiration-emails/ 10:21 < L29Ah> lol 13:49 -!- flyback [~flyback@2601:540:c701:900:f5f4:8e30:8217:dfde] has quit [Ping timeout: 260 seconds] 13:52 -!- flyback [~flyback@2601:540:c701:900:f5f4:8e30:8217:dfde] has joined #hplusroadmap 14:00 -!- delthas [~cc0@2a01:4f9:c010:cf0b::1] has quit [Remote host closed the connection] 14:01 -!- delthas [~cc0@2a01:4f9:c010:cf0b::1] has joined #hplusroadmap 14:23 -!- delthas [~cc0@2a01:4f9:c010:cf0b::1] has quit [Remote host closed the connection] 14:26 -!- delthas [~cc0@2a01:4f9:c010:cf0b::1] has joined #hplusroadmap 15:53 < fenn> sign up to our mailing list because we hate email so much 15:53 < fenn> ok 15:54 < fenn> juri_: no the EU comment was not about you, sorry 16:04 -!- TMA [tma@twin.jikos.cz] has quit [Ping timeout: 252 seconds] 16:14 -!- darsie [~darsie@84-113-82-174.cable.dynamic.surfer.at] has quit [Remote host closed the connection] 16:15 -!- juri_ [~juri@implicitcad.org] has quit [Ping timeout: 260 seconds] 16:16 -!- darsie [~darsie@84-113-82-174.cable.dynamic.surfer.at] has joined #hplusroadmap 16:32 -!- juri_ [~juri@implicitcad.org] has joined #hplusroadmap 18:04 -!- srat3 [~srat3@user/srat3] has quit [Ping timeout: 248 seconds] 18:04 -!- srat3 [~srat3@user/srat3] has joined #hplusroadmap 18:11 -!- darsie [~darsie@84-113-82-174.cable.dynamic.surfer.at] has quit [Ping timeout: 244 seconds] 19:52 -!- TMM [hp@amanda.tmm.cx] has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.] 19:53 -!- TMM [hp@amanda.tmm.cx] has joined #hplusroadmap 20:49 -!- EnabrinTain_ [sid11525@id-11525.helmsley.irccloud.com] has quit [Ping timeout: 245 seconds] 20:53 -!- EnabrinTain_ [sid11525@id-11525.helmsley.irccloud.com] has joined #hplusroadmap 21:33 < hprmbridge> kanzure> https://link.springer.com/chapter/10.1007/978-3-031-57430-6_11 23:26 -!- L29Ah [~L29Ah@wikipedia/L29Ah] has quit [Ping timeout: 272 seconds] --- Log closed Thu Jan 30 00:00:03 2025