--- Log opened Wed Jan 29 00:00:02 2025
00:00 < TMA> wire resistance is not futile. it is not even negligible. if the wire is "long" (in a cpu die, "long" might be on the order of 1e-10 furlongs) it matters a lot. that's why wrong gauge extension cords are such hazard when improperly used
00:16 < jrayhawk> macrogridding the U.S. alone is estimated to be 8 billion dollars in transmission lines https://www.nrel.gov/docs/fy21osti/76850.pdf and have a 30 year payoff time
00:16 < jrayhawk> numbers get less favorable once oceans get involved
00:40 -!- gl00ten [~gl00ten@2001:8a0:7ee5:7800:46d9:f5c:17a2:432] has joined #hplusroadmap
00:41 -!- gl00ten [~gl00ten@2001:8a0:7ee5:7800:46d9:f5c:17a2:432] has quit [Read error: Connection reset by peer]
02:26 -!- TMM [hp@amanda.tmm.cx] has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
02:26 -!- TMM [hp@amanda.tmm.cx] has joined #hplusroadmap
02:32 < L29Ah> https://en.wikipedia.org/wiki/HVDC_Ekibastuz–Centre sad story of an especially long and powerful grid transmission line
02:32 < L29Ah> the corona losses were <1% @ 5.5GW @ 2400km tho
02:34 < L29Ah> hmm, no, not this one, https://en.wikipedia.org/wiki/Ekibastuz–Kokshetau_high-voltage_line
02:36 < fenn> unifying the world political situation is the harder task
02:36 < L29Ah> so @ 432km
02:38 < L29Ah> fenn: i don't think the energy market needs much political unification as it's quite attractive for every participant
02:38 < fenn> ewan mcgregor did a "long way round" motorcycle trip from scotland through europe russia asia alaska canada, which follows the path with the shortest sea stretches
02:38 < L29Ah> the whole world trades in oil these days
02:42 < fenn> 8 billion is the price of one nuclear power plant these days
03:03 < hprmbridge> kanzure> even the modular ones?
03:04 < fenn> https://en.wikipedia.org/wiki/Olkiluoto_Nuclear_Power_Plant
03:50 < hprmbridge> kanzure> https://unsloth.ai/blog/deepseekr1-dynamic
03:53 < fenn> it would be cool if we actually did that thing with bitnet where you don't do matrix multiplication, only addition
03:54 < fenn> IQ1_M is not "good" even with their fancy dynamic pixie dust
03:55 < L29Ah> apparently Akademik Lomonosov's are planned to be sold $1.7G a piece
03:56 < fenn> those are the nuclear barges?
03:56 < L29Ah> yes
03:56 < fenn> for 70MW is not a good deal
03:57 < fenn> on the other hand there can't be any cost overruns
03:58 < fenn> "The Akademik Lomonosov originally was expected to cost about $140 million."
03:58 < fenn> perhaps i should say "any more cost overruns"
04:20 < fenn> sigh. i really expected to see some hard numbers on KL divergence or perplexity loss in that unsloth dynamic quantization post
04:24 < kanzure> fenn: what local llm hardware should i get?
04:24 < kanzure> https://x.com/carrigmat/status/1884244369907278106
04:25 < fenn> 2x3090
04:25 < kanzure> not 5090?
04:25 < kanzure> "24 x 32GB DDR5-RDIMM modules"
04:26 < kanzure> (in that previous link!)
04:27 < fenn> so far all the usable open weights models are in the 70B range, and you dont get anything by adding a few bits of bit depth. more gigs helps with longer contexts i guess
04:28 < fenn> can you actually buy a 5090
04:28 < fenn> i would not bother with that CPU only stuff, it's too slow for a reasoning model
04:29 < L29Ah> i wonder if it's faster on lower bit quants
04:30 < fenn> almost certainly
04:30 < L29Ah> on my laptop q4 is twice faster than q8
04:30 < kanzure> also very unclear to me if nivida digits is a good idea. hacker news thought so, but reddit r/locallama thought it was a weird joke.
04:30 < L29Ah> (1ch DDR4)
04:31 < fenn> so you go from a 2 minute waiting time to a 1 minute wait
04:31 < kanzure> i think that you could get some amount of privacy without going local by not using any of the major consumer LLM products, such as through huggingface or cloud providers
04:32 < fenn> digits doesn't seem useful for LLMs
04:32 < fenn> i don't know what it's for. it's small and cute i guess
04:32 < kanzure> but i haven't seen a cost analysis as to total cost of ownership of "rent cloud GPU / other non-major-API-provider" vs local hardware
04:32 < kanzure> by "rent cloud GPU" i mean fractionally
04:32 < fenn> uhh well renting is several orders of magnitude cheaper
04:33 < fenn> payback time for owning your own 4090s is like 8 months if you use them 100% of the time
04:33 < kanzure> chatgpt-4o gets low prices by parallel gpu hardware utilization between multiple customers i thought?
04:33 < fenn> yes and you will have no way of utilizing your hardware to anywhere near that extent, not least because of batch size
04:34 < kanzure> right, so local should always be more expensive
04:34 < fenn> yes
04:34 < fenn> it might have worked out differently if project north star (spiking neural network ASIC) had continued development
04:34 < kanzure> but maybe tradeoff some cost for privacy and get something that isn't exactly local hardware but not exactly "go to chatgpt.com"
04:35 < fenn> at least with a vast.ai node you don't know for certain that your data is being hoovered up
04:35 < fenn> like, it's probably just a guy with some computers
04:36 < fenn> and the new one, salad
04:36 < fenn> nobody uses petals and you can't sell GPU time on petals, it annoys me. that's a zillion dollar startup idea right there
04:37 < kanzure> besides the privacy tradeoff there's also $/(access to bleeding edge models that don't run on whatever local hardware you picked) which is probably more useful in the long run to target (eg better models, various advancements) than total local privacy
04:37 < fenn> instead of hosting an entire LLM on a single box, you have layer 1-10 on box A, 11-20 on box B, etc. and send partially processed tokens over the internet (4kB/token)
04:39 < fenn> basically recreating the fancy datacenter topologies over the internet
04:40 < fenn> most gamers have one GPU, not 8
04:41 < kanzure> where does 4 kb/token come from here?
04:42 < fenn> the size of the output of each transformer block, and activation bit depth
04:42 < fenn> i calculated it once, might be wrong by an order of magnitude
04:42 < L29Ah> https://github.com/b4rtaz/distributed-llama
04:44 < L29Ah> i wonder if https://stablehorde.net/ is any good
04:44 < fenn> no
04:46 < fenn> L29Ah: i had that distributed-llama bookmarked but hadn't actually looked at it in depth. it's cutting the layer cake the other direction, splitting each layer in half or quarters etc, which uses way more bandwidth than cutting the layers cleanly at the joints
04:46 < kanzure> "come and take it" is a stronger argument for local hardware than just privacy. remote access won't completely go away because vpn and international market access, but i do think available publicly purchasable/rentable supply can dry up pretty fast especially if you have spikey usage or want to have an SLA with yourself.
04:46 < kanzure> so i've argued myself back to local hardware
04:47 < fenn> yes well you can do both
04:47 < kanzure> hm okay
04:47 < fenn> it's much easier to set up weird stuff on local hardware and fall asleep in the chair and come back later to a working system
04:48 < fenn> like it doesn't have to capable of running eleventy billion parameter models
04:48 < fenn> 32b is probably the sweet spot, unfortunately qwen is the only recent model in that size range
04:48 < kanzure> how does "you can do both" change what hardware to target for local? keeping up with leading models locally seems like a losing strategy in terms of dollars spent over time.
04:49 < kanzure> hm
04:49 < kanzure> i need one of those multi-variable phase change diagrams for this
04:49 < kanzure> with funny xkcd annotations to keep my attention while reading it
04:50 < fenn> for experimental and privacy-sensitive things you run a local model and accept the performance hit
04:51 < fenn> if it's really such a hard problem that the local model can't solve it, you use the local model to strip out privacy relevant things and ask a big model to do it online. if you can't do this you have to use your brain
04:51 < fenn> i've heard a 5090 is just a 4090 with 30% more stuff and 30% more price
04:52 < kanzure> there's also werid sequence of returns risk with buying into the market at the wrong time, where waiting an extra 6 or 12 months might put you on a better upgrade cycle
04:52 < kanzure> (i've had this with smartphones before-- perpetually 3-6 months off from the next model release but need new phone because phonedeath)
04:52 < fenn> so far the GPUs seem to hold their value as long as it's not the latest model
04:53 < kanzure> what about actually renting equipment that gets sent to you locally
04:53 < fenn> those cheap M40s are now 2x the price ($210 from some guy in china)
04:53 < kanzure> local llm enjoyers would rent equipment and pay monthly to have local equipment, and later send it back for upgraded equipment, and downmarket gets lower prices on last year's equipment
04:53 < fenn> who would send you $20k in GPUs and how much would that cost
04:54 < kanzure> less than a car payment
04:54 < fenn> so leasing basically
04:54 < fenn> odd that english doesn't have words for these things
04:55 < fenn> you can rent a GPU online billed by the second
04:55 < fenn> shipping will take at least a day
04:56 < fenn> also power infrastructure requirements that may not exist in residential settings
04:56 < kanzure> rent online just means you're back to vast. also, it's not the same as leasing or having possession.
04:56 < kanzure> or, er, having title to the gpu
04:58 < fenn> there are probably lots of fun ways to exfiltrate data if you have physical access before and after
04:59 < fenn> i'm not sure what the threat model is here
04:59 < kanzure> at the end of this rabbit hole is "trust nvidia or TSMC with a TPM or trustzone SGX thing"
05:00 < kanzure> well not threat model so much as wanting to spend dollars in a way that optimizes across a handful of different goals, including price per query, keeping up with recent models (not necessarily bleeding edge), physical possession if necessary, de-risk against regulatory risk, privacy
05:01 < fenn> bleeding edge model is way more important than bleeding edge GPU
05:02 < fenn> software moves faster than hardware
05:02 < kanzure> maybe, but there's also an extent to which having somewhat physical possession of somewhat-recent GPU is also important there, like you don't care what exact model of GPU you end up with when the game of musical chairs ends, but you do care that you have something
05:02 < fenn> i think we are in the self-improving-AI phase already
05:03 < kanzure> leasing GPUs/workstations is different from leasing cars, because car speed isn't exponentially increasing every few years
05:04 < kanzure> nobody would care what car they have as long as they are also enjoying recent car speed improvements
05:05 < fenn> there's this hypothetical lawson criteria thing i've half baked, it's like the product of (compute * memory bandwidth * interconnect bandwidth)/(price * power)
05:06 < kanzure> i'll admit that power requirements are a thorn in any of this. for local llm hardware leasing yeah you need to specialize in low power :\.
05:06 < kanzure> until distributed gets worked out.
05:06 < fenn> you COULD hook up 80 raspberry pi nodes and run a giant model at 2 seconds per token
05:07 < kanzure> https://qwenlm.github.io/blog/qwen2.5-max/
05:08 < fenn> nooo i'm trying to go to bed
05:08 < fenn> too much progress!
05:09 < fenn> oh it's just their big paywalled model
05:10 < fenn> when you see benchmark scores >90 it's a useless benchmark
05:11 < kanzure> goodwin's law?
05:11 < kanzure> uh sorry not that one
05:11 < kanzure> goodhart's law
05:11 < fenn> goodhart
05:12 < kanzure> hm this is slightly different, goodhart is any time you measure anything it ceases to be a good measurement
05:12 < fenn> no it's not that, it's just that the benchmarks often contain errors or impossible to answer questions
05:12 < kanzure> you seem to be proposing zeno's torture nexus
05:13 < fenn> i'd like to show you a chart in a recent AI-explained video showing how time to saturate benchmarks is decreasing over time, but i'm youtube-challenged with this laptop at the moment
05:13 < fenn> just imagine a bunch of lines with increasing slopes, and time is on the X axis
05:14 < fenn> the most recent "hard" benchmark GPQA is now at like 80% solved. they've come up with "humanity's last stand" (exam) which is supposed to be ludicrously difficult
05:15 < fenn> GPQA is really fuckin hard. i would respect anyone that could solve even one question
05:16 < fenn> there is probably some method of having a dumb LLM talk to you and extract all the necessary info to construct a good prompt to feed into a reasoning model and get good results 99% of the time
05:17 < fenn> this way you could run a big slow model and not feel impatient
05:24 < juri_> fenn: was that a diagnosis, above? :P
05:24 < L29Ah> > highly skilled non-expert validators only reach 34% accuracy, despite spending on average over 30 minutes with unrestricted access to the web (i.e., the questions are "Google-proof")
05:24 < L29Ah> doesn't seem too respectable
05:25 < kanzure> zeno's postmodern paradox: "someone keeps moving these god damned goalposts"
06:47 -!- darsie [~darsie@84-113-82-174.cable.dynamic.surfer.at] has joined #hplusroadmap
07:22 -!- TMM [hp@amanda.tmm.cx] has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
07:22 -!- TMM [hp@amanda.tmm.cx] has joined #hplusroadmap
07:42 < juri_> fenn: oh. i guess "ai infrastructure in europe" is a diagnosis, too. glad i have fans. :)
08:13 < kanzure> $/token is kind of a weird metric if we are selling intelligence shouldn't it be value/token as the metric? higher intelligence should be able to achieve outsized results with fewer more meaningful valuable tokens.
08:14 < L29Ah> i suspect all deepseek-likes producing a lot more tokens for a result compared to prior art due to their lengthy thinking
10:16 < kanzure> https://letsencrypt.org/2025/01/22/ending-expiration-emails/
10:21 < L29Ah> lol
13:49 -!- flyback [~flyback@2601:540:c701:900:f5f4:8e30:8217:dfde] has quit [Ping timeout: 260 seconds]
13:52 -!- flyback [~flyback@2601:540:c701:900:f5f4:8e30:8217:dfde] has joined #hplusroadmap
14:00 -!- delthas [~cc0@2a01:4f9:c010:cf0b::1] has quit [Remote host closed the connection]
14:01 -!- delthas [~cc0@2a01:4f9:c010:cf0b::1] has joined #hplusroadmap
14:23 -!- delthas [~cc0@2a01:4f9:c010:cf0b::1] has quit [Remote host closed the connection]
14:26 -!- delthas [~cc0@2a01:4f9:c010:cf0b::1] has joined #hplusroadmap
15:53 < fenn> sign up to our mailing list because we hate email so much
15:53 < fenn> ok
15:54 < fenn> juri_: no the EU comment was not about you, sorry
16:04 -!- TMA [tma@twin.jikos.cz] has quit [Ping timeout: 252 seconds]
16:14 -!- darsie [~darsie@84-113-82-174.cable.dynamic.surfer.at] has quit [Remote host closed the connection]
16:15 -!- juri_ [~juri@implicitcad.org] has quit [Ping timeout: 260 seconds]
16:16 -!- darsie [~darsie@84-113-82-174.cable.dynamic.surfer.at] has joined #hplusroadmap
16:32 -!- juri_ [~juri@implicitcad.org] has joined #hplusroadmap
18:04 -!- srat3 [~srat3@user/srat3] has quit [Ping timeout: 248 seconds]
18:04 -!- srat3 [~srat3@user/srat3] has joined #hplusroadmap
18:11 -!- darsie [~darsie@84-113-82-174.cable.dynamic.surfer.at] has quit [Ping timeout: 244 seconds]
19:52 -!- TMM [hp@amanda.tmm.cx] has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
19:53 -!- TMM [hp@amanda.tmm.cx] has joined #hplusroadmap
20:49 -!- EnabrinTain_ [sid11525@id-11525.helmsley.irccloud.com] has quit [Ping timeout: 245 seconds]
20:53 -!- EnabrinTain_ [sid11525@id-11525.helmsley.irccloud.com] has joined #hplusroadmap
21:33 < hprmbridge> kanzure> https://link.springer.com/chapter/10.1007/978-3-031-57430-6_11
23:26 -!- L29Ah [~L29Ah@wikipedia/L29Ah] has quit [Ping timeout: 272 seconds]
--- Log closed Thu Jan 30 00:00:03 2025