More

bigyabai · 2026-06-26T00:30:58 1782433858

> I have zero doubt it'll reach mainstream adoption

Be careful. I said this 10 years ago about OpenCL, and I've ate crow ever since.

bigyabai · 2026-06-25T23:22:27 1782429747

Without a "zero regulation" option, you're just begging people to select one of your troll answers.

krapp · 2026-06-25T23:24:19 1782429859

fair enough I added that option.

bigyabai · 2026-06-25T23:22:19 1782429739

The integrated GPU. Not enough compute onboard to handle prefill for 100gb+ models, and the decode is constrained by memory bandwidth that's lower than most dGPUs that price.

Apple would be in a much stronger spot right now if they didn't pretend like eGPUs were inconceivable black magic that Macs are incompatible with.

aroman · 2026-06-26T00:07:07 1782432427

I'm not sure I follow - 614 GB/sec is pretty squarely in dGPU territory (~5070 level). External GPUs can definitely exceed that on the very high end, but it seems pretty competitive, no?

bigyabai · 2026-06-26T00:27:11 1782433631

Competitive for 16-24GB dGPUs, but for 100gb+ inference workloads it's going to be a decode bottleneck. For smaller models it'd be fine, but the same goes for the smaller GPUs.

In particular though, the fatal bottleneck is the weakness of the iGPU. Filling a KV cache on a 100gb+ model could take a few minutes, or even hours if you're trying to restore a 256k-to-1m token session.

bigyabai · 2026-06-25T22:24:52 1782426292

They very well could have. Apple was the only company poised to take on CUDA with OpenCL, and they got pantsed so hard by the HPC industry that the Mac Pro got discontinued entirely.

Apple could have added a couple trillions to their valuation if they weren't addicted to service revenue. But today's Apple is too fat to see their shoes, let alone where the puck is headed.

Danox · 2026-06-26T03:30:01 1782444601

The Mac Pro was a stopgap and they weren't going to work with Nvidia, Intel and AMD anymore good riddance in the long run.

bigyabai · 2026-06-25T20:51:25 1782420685

How are people getting so upset over this? OpenAI asked for regulation to validate their scaremongering. Now they've allied with the government, just like they said they would.

It's 2026 now, you can't pin your hopes and dreams on a random business that treats you like exit liquidity. When you pray to the cannibal king, you get what you ask for.

bigyabai · 2026-06-25T20:42:39 1782420159

> And in the end, everyone is worse of.

I don't know, it really depends on the product. If we were all still paying for proprietary UNIX in the current year, then it's likely that these startups would never exist in the first place. Sometimes a SAAS has to admit that it's not providing real value before actual innovation becomes the status quo. I don't weep many tears for dying businesses because nobody lives forever.

bigyabai · 2026-06-25T19:20:23 1782415223

Opencode has a toggle for planning and edit modes, you might like that.

bigyabai · 2026-06-25T17:35:48 1782408948

The US and allies should have banned Chinese tech when they cloned STM-32s without ARM licenses and refused to pay royalties for the IP.

You're a day late and a dollar short, if this isn't an ironic shitpost intended to foment nationalist outrage.

bigyabai · 2026-06-25T17:18:34 1782407914

We have to get real, here - most people are not replacing GPT or Claude with local inference, even on M5. If you can afford to do that (RAM shortage or not), then you are in the minority of customers.

Alleviating the memory constraint would only really make Nvidia a danger to cloud margins, and their consumer sales are neutered while they focus on the datacenter segment. It's feels facetious to insinuate that people would be doing inference on their Macbook Neo or Wintel laptop if they only had a gorbillion gigabytes of memory and a 400W accelerator card plugged into the wall outlet.

kamranjon · 2026-06-25T17:31:45 1782408705

You’re out of the loop if you don’t think m series chips with unified memory aren’t one of the best platforms for running local inference

bigyabai · 2026-06-25T17:39:11 1782409151

They aren't. Apple Silicon is unusable for interactive prefill and decode speeds in agentic workflows and SOTA LLMs.

kamranjon · 2026-06-25T17:56:00 1782410160

You’re just out of the loop, and that’s fine but it’s worth learning about.

There is a pretty large and growing community of us using entirely local models for our agentic flows. From GLM 4.7 flash on 32gb machines with >60tok/s to Gemma and Qwen dense and MOE models on 64gb machines all the way up to Deepseek V4 flash on 128gb machines with 450tok/s prefill and 25-30tok/s decode.

I use DS4 on the daily - it’s become my main model.

I know it’s in fashion to talk trash about Apple but their hardware outperforms other options like DGX Sparc when it comes to local inference, they got the unified memory, memory bandwidth and the GPU cores to actually be useful in a way that most other hardware just isn’t.

aroman · 2026-06-25T18:42:39 1782412959

My hardware isn't powerful enough to try, so I'm asking out of genuine curiosity, not to push back: what do you use DS4 for? Did it replace e.g Claude Code with Opus for you, or was it replacing something else?

kamranjon · 2026-06-25T22:39:39 1782427179

I use it as my main coding agent - so its running DS4 server on my 128gb mbp and I run the pi coding agent on my other machine which calls out to it. Mostly Go and Typescript work.

I also use it in local agent mode if im coding directly on the machine which is nice cause you can save sessions and resume them, and so for personal projects and training related stuff it's been great.

Even got an autoresearch loop going where the agent looks at the previous run, adjusts parameters and code if needed, and then hands off training to another script (so full system resources are available for training), ad infinitum - it works really well - what antirez has done with that project is pretty incredible.

johncalvinyoung · 2026-06-25T20:02:14 1782417734

Isn't Deepseek V4 Flash still like 150+ GB even at Q4?

kamranjon · 2026-06-26T10:12:36 1782468756

I think most people running deepseek locally are using DS4: https://github.com/antirez/ds4

Which provides a 2-bit quant and a mixed 2-bit/4-bit quant - which range from 80gb to 97gb

Deepseek is particularly well suited for quantization and the quality of the 2bit quants antirez has provided have been validated by folks like ggernov of llama.cpp project

bigyabai · 2026-06-25T20:20:30 1782418830

> From GLM 4.7 flash

GLM 4.7 Flash is a 30b model that was far behind SOTA at launch, and I know that because I pay for z.ai inference and have run the model locally. Qwen and Deepseek V4 Flash have the same issue, and beg the question; are you really going to process a 64k agentic context at 450tok/s? That's 2+ minutes that you spend waiting for the first token to generate! Of course nobody can sell that as competitive inference, and it only gets worse with larger models. We're talking about non-interactive speeds, here.

If you're satisfied with small local models, more power to you. It puts you in the same barrel as Strix Halo enthusiasts or the guys that bought 2x3090s on Reddit. You are completely ignoring the market if you think that any of those SOCs are unprecedented or unparalleled for inference workloads, though. The free DS4 API is faster at prefill and decode, you could not give away Mac inference at zero cost and compete with what China provides for free. That's how far behind Macs are for local inference, to put things into perspective.

kamranjon · 2026-06-26T09:42:07 1782466927

I think you’re confused - nobody running local models is concerned about SOTA - that’s just marketing hype from large providers - we are interested in data governance, security, control and freedom. You can’t compare hosted services to local inference, these are two very different things, out of principle we aren’t interested in handing our code bases over to untrustworthy third parties.

On your first point, nobody is pasting 64k tokens at once as context, if you are you’ll experience very similar wait times even with hosted providers - context is built up piecemeal and by virtue of being context does not need to be replaced constantly - this is how all agents work.

100% local models are not SOTA, but they are good enough to be incredibly useful if you are a skilled engineer, and I understand the industry is pushing engineers to offload more and more of their work to incentivize higher token spending, but talented engineers can absolutely be just as productive using local models today - it’s just a different style of working that folks who have become accustomed to large providers can’t really comprehend at this point. They’ve vendor-locked themselves into a delusion that they absolutely need SOTA for everything and as a result see everything as black and white.

Danox · 2026-06-26T04:05:05 1782446705

You sound like IBM in the mainframe era...

bigyabai · 2026-06-25T17:01:45 1782406905

The flip side is that the pot is now boiling. Windows and macOS are both replete with advertisements and service upsell, which is something that nontechnical and technical users both pick up on. It's been expanding the discussion of alternatives, and gave Linux a piece of the spotlight in the PC gaming world. Normies that watch LTT, Gamers Nexus or Jayztwocents have been exposed to Linux already. Many of them bought a Steam Deck and switch to the desktop, getting their first "preinstalled" Linux desktop experience.

The Year of the Linux Desktop won't be when everyone switches to Linux. You can't save everyone, there will always be iPads and gaming laptops that will never see proper Linux support. OP's point seems to be that higher device prices will push people to get more mileage out of depreciated Intel Macbooks and Windows 10 desktops. Price increases will outright prevent some customers from engaging in the upgrade cycle altogether, which is why a lot of enthusiasts and gamers have already switched to Linux distros for extended support.

If this squeeze continues, more and more low-income computer users will defect from the upgrade/service treadmill. It won't be a firehose of defectors, but it's already enough to make an impact.

thewebguyd · 2026-06-25T18:05:29 1782410729

Fair point, but I'd say

> Normies that watch LTT, Gamers Nexus or Jayztwocents have been exposed to Linux already.

Aren't normies at all. The 80% that are functionally computer illiterate aren't watching LTT. Someone with enough interest to follow gaming/tech youtube channels can probably already handle installing Linux with a little handholding.

I agree on your other point though, you can't save everyone. We'll just bifurcate. That 80% just won't own a general purpose computer at all outside of what is provided by their employer. They'll use their smartphone, and maybe an iPad. The desktop/general purpose market will shrink, but Linux definitely is ripe to take nearly that entire market as it is now effectively becoming an enthusiast only market.