More

hadlock · 2026-06-14T12:22:47 1781439767

They also stopped releasing 100b+ model weights after firing him

hadlock · 2026-06-10T16:42:48 1781109768

2% is good, anything over 3% is not good, anything over 4% is bad, 5% and higher is really bad. Hope that clears things up for you.

neves · 2026-06-10T18:23:26 1781115806

It depends on the country. Brazil, due to its hyperinflation days, has a lot of indexed prices. These are prices that automatically increase due to inflation. This makes the country to have so called inertial inflation, current inflation caused by past inflation, and also makes it more robust to a higher inflation.

embedding-shape · 2026-06-10T16:51:07 1781110267

Can you really say that based only on the inflation? What if wages increased 6%, then 3% inflation wouldn't be as bad as if inflation raised 2% but wages only increased 0.1%? At least if you think about purchasing power I suppose. But won't claim to be an expert on this, happy to be educated by those who are :)

recursivecaveat · 2026-06-11T04:22:13 1781151733

Part of it is just expectations btw. If the value of money is jumping up and down rapidly it is bad for business. Like if I'm going to sell you a 30 year fixed rate mortgage we require an accurate expectation of future inflation for one of us not to effectively lose their shirt on the deal. Imagine shops shuffling their prices up and down constantly, unions renegotiating contracts all the time, you sign up for 12 months of netflix but the price implicitly assumes that money will be worth N% less by month 12, etc. (imo a lot of these things should already be pegged, but people don't like doing that) It's basically just much more annoying to transact using a currency whose future value is unpredictable. So given that 2% is the stated target, which expectations are presumably largely built around, significant deviation is a failure to manage that process.

anticorporate · 2026-06-10T17:05:52 1781111152

In general, higher inflation has a negative impact on consumer sentiment even if wage growth matches the inflation, which it rarely does.

But the bigger issue is that inflation is generally distributed much more evenly than wage increases. Very few employers offer a COLA that is automatic, so wages almost always trail inflationary pressure.

tastyfreeze · 2026-06-10T16:54:36 1781110476

The FED says that 2% is good. 2% is not good. Their target of 2% per year means we have 2% compounding annually devaluation of our currency.

DennisP · 2026-06-10T17:05:01 1781111101

It's fine as long as t-bill rates match or exceed inflation. Then you can avoid losing purchasing power by just putting your money in the world's safest investment. Over the past century, t-bill returns have slightly exceeded inflation on average, though there have been periods when they didn't.

Stash paper cash in your safe and sure, you lose purchasing power. Use fiat money the way it's designed to be used, instead of using it like gold coins, and it works better.

aidenn0 · 2026-06-10T18:16:55 1781115415

Us debt as a fraction of GDP has doubled this century and roughly quadrupled in my lifetime. It would seem to me that eventually t-bills will not be safe.

DennisP · 2026-06-10T23:13:34 1781133214

True, but that's a different problem and to hedge against that you probably need hard assets.

throwwwll · 2026-06-11T09:22:52 1781169772

They did double and quadruple before, if you know what I mean.

xenadu02 · 2026-06-10T18:21:07 1781115667

That's by design.

If currency doesn't devalue then stuffing it under a mattress looks like a reasonable alternative to investing. If we hit deflation you can receive gains for "free" and borrowed money becomes more expensive over time. Neither of which our economic system is setup to handle.

We punish people who hoard cash by devaluing it thus encouraging them to put the money to work.

greyface- · 2026-06-10T18:29:50 1781116190

One side says this design is necessary to sustain growth. The other says it's unfair because the gains from the growth are unevenly distributed. Neither is wrong.

HDThoreaun · 2026-06-10T16:56:35 1781110595

Why is that not good? When inflation is close to 0 real interest rates increase which causes the economy to slow down. It seems clear to me that the optimal rate of inflation is always above 0.

dmoy · 2026-06-10T17:33:24 1781112804

The real problem imo is that below 0% is really bad, and has the potential to spiral. So the fed does not target anything close to 0%, but instead targets some buffer above it.

So it's not that "2% is good", but more that "2% is the best buffer we've decided above the <0% super scary threshold"

HDThoreaun · 2026-06-10T17:49:38 1781113778

Yes of course below 0% is especially bad, but I dont think thats the whole story. If central banks were able to set inflation with 100% certainty I still think targeting a number close to 0% is a bad idea. Nominal interest rates have a floor due to defaults, servicing costs. As inflation approaches 0 that floor is hit and monetary policy loses its ability to control real interest rates. Keeping nominal rates above their floor is key to ensuring small business can obtain liquidity, as the floor is approached it makes less sense for lenders to write small loans.

There are many other reasons a positive inflation rate is better than substantially near 0. One common complaint about inflation is that erodes real wages because nominal wages are sticky, but this is actually a good thing. It gives businesses room to breathe during downturns without cutting nominal wages or having to cut staff. Positive inflation also forces cash into productive uses which helps monetary policy because it keeps the actaul money supply more stable.

somenameforme · 2026-06-10T18:16:07 1781115367

The Fed did a study some time back estimating CPI levels since 1800. [1] They found that from 1800 to 1950 the CPI never shifted more than 25 points from the starting base of 51, so it always stayed within +/- ~50% of that baseline. That's through the Civil War, both World Wars, Spanish Flu, and much more. And obviously the US economy increased in sized quite exponentially from 1800 to 1950, with no persistent inflation whatsoever.

It's even more interesting to contrast this from 1971 onward. 1971 is when Bretton Woods ended and the government was given a free hand to start 'printing money' so to speak, and inflation became the new policy. Since then the CPI has increased by more than 800 points, 1600% more than our baseline. And it's only increasing faster now - to the point that these numbers I'm giving are already rather outdated.

[1] - https://www.minneapolisfed.org/about-us/monetary-policy/infl...

HDThoreaun · 2026-06-10T19:50:44 1781121044

The US economy faced repeated economic catastrophes from 1800-1950 largely because the government was unable to enact monetary policy. The long depression of the 1870s happened pretty much solely because monetary supply contracted and populists got elected to fuck with the silver/gold standard. Causes of the great depression are more varied, but contraction of money supply due is certainly one of the leading ones.

Yes, the economy expanded greatly over this period, but you have to separate inflation from many other causes such as innovation, increasing labor supply, better education, increases in the amount of investment. I think its pretty clear that the economy wouldve fared much better in the 1800-1950 period if the government was partaking in monetary policy that focused on small but positive inflation.

somenameforme · 2026-06-11T03:51:18 1781149878

Check out the data from the Fed and contrast it against events in the past. For instance you mention the long depression which happened from 1873 to 1879 and resulted in a decline in prices of about 30% followed by stabilization. And of course that was also the advent of the Gilded Age, where economic growth, wages, and so on all were skyrocketing, all while prices trended downward! It's difficult to even imagine something like that now a days.

I don't think that's just a coincidence either. Economic issues in the US used to foreshadow booms to come, which makes sense in many ways as it's the ending of one generation of businesses and the start of another. By contrast in the US prices have increased by 30% over almost the same length of time as the 'long depression', and continue going up up and away. It'd be nonsensical to call it the long inflation or whatever because it's only slightly off the normal. 2% 'planned' inflation over the same 7 year period is a 15% increase in prices. And businesses going under? No, everything's huge now, and so everything's too big to fail. The government has taken on the responsibility of perpetually propping up failing businesses, forever inhibiting competition in the process.

And there's no boom waiting at the end, in no small part because there is no end. Each economic issue we face, which are becoming increasingly regular, just further magnifies the divides in society. The wealthy have sufficient assets and resources to turn this into profit, in no small part by dumping excesses of money into inflation resistant assets, but the middle and lower classes have no such option and so mostly just lose either badly or very badly. This [1] site lays out a bunch of the data since 1971 (when the dollar became completely unbacked following the end of Bretton Woods) and impossible not to see it as an enormous inflection point for all sorts of nasty stuff.

---

I suspect if we hadn't had the tech boom kicking off fairly shortly after 1971, driving in a decades long unprecedented economic boom, that this experiment would have long since reached its climatic failure. It only works with infinite exponential growth. For some time we had that. Now we not only no longer do, but also are seeing a fertility collapse at the same time. There's gonna be some fireworks.

[1] - https://wtfhappenedin1971.com/

kachnuv_ocasek · 2026-06-10T16:51:55 1781110315

Why those arbitrary thresholds?

JumpCrisscross · 2026-06-10T18:34:05 1781116445

> Why those arbitrary thresholds?

The broad idea is you want a number low enough that people don't price inflation expectations into day-to-day pricing but not so low that a hiccup causes deflation.

The empirical evidence around inflation persistence is a bit all over the place, but broadly suggests people start daily indexing between 2 and 5%. When that starts to happen, restraining inflation without causing a depression becomes incredibly hard, because people will actively countermand policy moves.

hadlock · 2026-06-10T18:31:37 1781116297

The fed has a dual mandate to maintain full employment and keep inflation at 2%. Others have already explained why 2% and not 0%. Up to 3% is expected, 4% means significant price shocks and they should consider acting quickly. 5% means they are at risk of losing control of inflation as it's more than doubled from their mandate and the fed risks losing credibility with markets

horsawlarway · 2026-06-10T17:29:21 1781112561

In complete seriousness:

An offhand remark made by New Zealand's Finance Minister, Roger Douglas, during a 1988 television interview.

hadlock · 2026-06-10T00:23:49 1781051029

From what I've gathered, Mythos is the uncensored version, for institutional use, and then Fable is the censored version for general public, that won't talk about biology, encryption or anything remotely interesting

hadlock · 2026-06-09T18:02:41 1781028161

GPT5.5 one-shotted an entire mandolin for me. Like, the whole thing, ready to 3d print or CNC. Well, two shotted, the fender style neck came out so good (I didn't think it could do it) I asked for the body and it made the matching body with correct neck pocket etc. with bolt holes. SCAD is really rough, I agree, but cadquery is great for this sort of thing for whatever reason. I linked the pastebin upstream in this same comments section if you want to check it out.

hadlock · 2026-06-09T17:47:54 1781027274

I've had pretty good success with using LLMs to generate basic shapes using python and cadquery (which generates real parametric step files you can edit in fusion, not glorified triangulated STLs). Yesterday I had GPT5.5 build a python script to generate fender style Mandolin with separate neck and body, with correct bolt holes for the neck, gibson style bridge, and stop-tail, even fretboard with the little dots and cutouts for the fret wire (I didn't ask for these, but it added them anyways). Everything looks correct and to scale. These should generate (after pip install cadquery) a .step and .stl which you can open in something like PrusaSlicer or Fusion:

Neck: https://pastebin.com/Sg3LmmUq Body: https://pastebin.com/FE9nikYB

edit: screenshot, too https://i.imgur.com/FZGyyVO.png

vmbm · 2026-06-09T18:04:37 1781028277

I am interested. Every few months, I loop back to using LLM's for this type of task but have always had fairly mixed results. Not sure if it is my prompt, model choice, or the part itself being too complex. And I haven't had the time to really dig into why things aren't working out. But would be nice to find a workflow that gets good results as I regularly 3D print stuff for hobby projects but find 3D modeling to be the most tedious and time intensive task.

ActorNightly · 2026-06-09T20:21:25 1781036485

My general goto for tasks that are n level complex is to have the agent store summaries after every generation. I do this for interacting with websites - Ill sit there and type text for the agent to correctly inject js to do something on a website, and every iteration is asyncronously writes a history in a background thread of what it has done and what the result was. On every invocation, it injects that context.

hadlock · 2026-06-08T23:21:21 1780960881

Adding to this, a lot of fiber installed in the 1990s is still dark. Multi-wavelength XYZ and other improvements mean the same fiber from 35 years ago can carry 100 or 1000x what it was originally designed for. Also, like Solar, all the cost is in labor. When they designed the Seattle/King County fiber network, they knew they would never have access/permits to go back and add more, so instead of running a single 12 fiber bundle the size of your pinkie, they ran 3 x 1024 bundles the size of your arm through the hollow bridges that span I-5 freeway and snakes through Seattle underground. Almost all of that sits dark today despite being in a very busy area, simply because fiber technology keeps getting better.

db48x · 2026-06-09T00:02:27 1780963347

Yea, fiber is great. They can do hundreds of terabits/s per fiber today, and petabits/s is not far away. Bandwidth is fundamentally cheap enough that my ISP offers 50Gbps residential service!

podocarp · 2026-06-09T01:52:29 1780969949

Can I ask where do you stay? Korea? 50G is insane, is it on qsfp? Also what's the pricing on that?

db48x · 2026-06-09T18:31:11 1781029871

I live in Oregon. The price was $900/month last time I checked. I believe they do provide a QSFP+ module to plug into your equipment. They also allow residential customers, at any tier of service, to announce their own IP blocks via BGP.

https://ziplyfiber.com/internet/multigig

hadlock · 2026-06-08T23:15:45 1780960545

Qwen's ~30B-class models are genuinely good enough for use if you can find a machine with enough memory bandwidth to run them at 30-90 tokens/second. It's been extremely telling that Qwen stopped releasing 120b class models. At some point in the next 10 years (maybe 3?) someone is going to release an Opus 4.5 class 256B model you can run locally. Right now our engineers use about $800/mo worth of opus tokens; at that rate the ROI for local LLM is ~10 months

horsawlarway · 2026-06-09T01:53:08 1780969988

I want to echo this.

I've been on claude's opus 4.5/6/7 for work for a couple months, and I finally got back to running Qwen A3B 35B... it's incredibly performant and quite capable on semi-reasonable local hardware.

I get ~150 tokens/s on dual nvidia RTX 3090s and can fit the whole 300k context into gpu on a UD-Q4-K-XL quant gguf.

Combined with Pi as a harness, and I'm surprised to find that it feels about as capable as claude did 8 months ago (their 3.x models).

It's not Opus 4.5 levels yet, but it's good enough for a LOT of basic work. I actually downgraded my personal anthropic subscription because Qwen is absolutely fine for implementation work. I still let a better model write a plan, but then I can just switch over to Qwen to implement.

I don't think we're 10 years away from opus 4.5 levels running on cheap consumer hardware. I think we're probably closer to 18 months away, and I suspect it'll be in the 30-60b range, not the 256b range.

PC manufacturers also seem to be betting on local, with a LOT of focus on 64 to 128gb unified RAM machines.

dofm · 2026-06-09T07:37:44 1780990664

I have come at this at a slightly different angle.

I am a fully-burned-out freelancer (in the last couple of years so severely and totally that I thought I had early onset dementia, and I am still not sure I don't). I don't really have an off-ramp to anything else yet, but the sea-change in the industry has been contributing to my feeling that I should knock it on the head.

I must get past broad understanding of AI to deep understanding, but I have to find a way to do this which sits well with freelancer ethics (sustainability, stability, control of destiny).

So I decided I would start out with that operating principle that ultimately this stuff is just going to be local: models will eventually hit some level of practicality for most tasks and technological progress guarantees that they will eventually run on desktops.

I decided to learn how to run models locally properly, see how far I get with opencode (and Pi and Zed experiments), and grow outwards from there to metered models (opencode go, openrouter etc.)

Knowledge first; what can I do that meaningfully changes my outcomes and confidence with no cost and no exposure to sudden change?

I have a secondhand M1 Max (excellent GPU bandwidth), and I am really shocked to find that arguably that level of practicality is already here.

Qwen 3.6 35B can really do a lot. And — not sure if you have tested it — but in some ways I think the Gemma 4 26B is better. Particularly for more commonplace dev tech — it is very knowledgeable about the sort of low-end web dev stack that is most common (Wordpress, PHP, MySQL).

I have been getting 75 tokens/sec with (GGUF) Gemma-4 26B QAT and MTP. (Can't get anywhere close with MLX, for some reason.)

A similar sort of speed with an MLX Qwen 3.6 35B. I have a sneaking suspicion that maybe llama.cpp is now faster than MLX on this older kit so I might try seeing what llama.cpp can do there, too.

Not blazing fast, but fast enough that there are plenty of experiments and small jobs I can do before I even get to using Big Pickle!

weirdcat · 2026-06-09T12:13:41 1781007221

How are you running that GGUF, and how many tokens/sec are you getting without MTP? My M1 Max gives me 65 t/s for non-MTP unsloth/gemma-4-26B-A4B-it-qat-GGUF (UD-Q4_K_XL), but with MTP that actually goes down to 56 t/s (at 63% accepted drafts).

dofm · 2026-06-09T14:09:44 1781014184

Just this guy's assistant running against the official Q4_0 GGUF:

  ./llama-server \     
    -hf google/gemma-4-26B-A4B-it-qat-q4_0-gguf \
    --spec-draft-hf RachidAR/gemma-4-26B-A4B-it-qat-assistant-q4_0-gguf:Q4_0 \
    --spec-type draft-mtp \
    --spec-draft-n-max 3

I hadn't done any really radical testing so I've just had another look.

Without the MTP drafter, it is pretty consistently 75 tokens per second anyway, which is interesting.

With the MTP drafter it reaches well above 95 tokens per second handling the prompt and it will slowly drop to 65 or so with the output tokens as the prediction success rate slowly drops.

But with generated output it seems to me that the predictions are always going to drop dramatically over time.

I think my results here are broadly consistent with what people say about success rates with smaller and sparse models. I am going to test with n-max 4 in agentic situations at some point, and I may see whether it has much impact on the 31B model which is too slow to be practical otherwise.

I have a very unqualified feeling that MTP will matter more in agentic coding because of the larger prompts.

But my biggest issue since I installed it, I think, is that the combination is occasionally messing with markdown generation during thinking, and sometimes possibly losing the </think> at the end. I've seen it enough now to be fairly sure it is the Gemma MTP causing it. There is an open bug in the vLLM project about this and I wonder if something similar is going on in llama.cpp.

The speed without the MTP drafter is pretty solid so I am content to let more experienced people than me handle things while I learn other stuff, but I might go looking for some testing code that can prove it sometime.

dofm · 2026-06-09T15:11:53 1781017913

Just saw this:

https://huggingface.co/google/gemma-4-26B-A4B-it-qat-q4_0-gg...

Might see if Google has official drafters later.

maxdo · 2026-06-09T02:10:25 1780971025

Majority of my agentic setup is pi / Claude code where every single Chinese models are not as good except commercial 1T models .

Local is a pipe dream . If you can run it cheap occasionally why commercial companies can’t run it cheaper 24/7 and lower the costs ? The answer is simple. Use cases are more demanding and hence you need more from model not less .

Sure if you task is to do a narrow labeling task on 1m records small optimized model is good . If you want to do complex things , it shifts with models advancements

hparadiz · 2026-06-09T02:20:30 1780971630

This sounds like something someone at IBM in 1986 would say trying to sell their mainframes. "PCs will never be a thing. No one's gonna want a computer."

I'm seeing some impressive results from folks that can afford 10k+ GPUs right now. But those GPUs will all be hand me downs in 10 years. So pipe dream? Hmmm...... that's not how this industry works.

tyre · 2026-06-09T04:29:20 1780979360

Those are not GPUs available on iPhones. Will we get there eventually? Maybe! Maybe we end up with GPU clusters built on the edge (e.g. cell towers) for offloading, maybe it’s never economical, maybe a different model architecture makes it simpler, who knows.

But it doesn’t seem anywhere imminent with our current world state.

hparadiz · 2026-06-09T04:36:25 1780979785

My computer is 15,000 times faster and costs in inflation adjusted dollars half that of my computer in 1995. There's zero reason to think that won't happen over the next 30 years again.

For whatever reason every generations thinks they are the peak. Naw man. You're just a blip at the bottom of the logarithmic chart.

sgt101 · 2026-06-09T06:46:10 1780987570

For me there are a bunch of questions:

- was the pause in model scaling a result of the benefits of RL & SFT being easier to access and quicker than scaling, or was it genuinely the result of scaling being low ROI now?

- are power densities necessary to provide high quality on device inference possible? Can the best, technically feasible, architectures accomodate T scale models and run them off batteries that fit in your hand?

- will thing slow down enough to allow edge depoloyments to realise value vs. centralised deployments.

- do edge use cases drive enough revenue to get this to happen?

- can local inference make up for model scale? Does that make sense in a latency/power race with the central infrastructure? Is there a sweet spot here?

I am not sure about any of the answers...

wqaatwt · 2026-06-09T06:01:47 1780984907

It has slowed down massively for CPUs at least. e.g. modern CPUs are hardly more than 3-5x faster than those from 10 years ago. There is zero reason to think won’t happen over the next 10 years again.

horsawlarway · 2026-06-09T13:17:28 1781011048

This isn't an crazy statement (cpu performance metrics have mostly stalled their meteoric rise from prior to the 2000s)

But it also doesn't capture the entire picture.

CPU metrics mostly stalled for two reasons.

1. There wasn't much demand for the extra capacity. Even low end cpus from a decade ago are plenty capable for just browsing the web and typing up documents. It takes a novel use-case to drive demand again (or a desire to do things like play new games).

2. The interest in CPU development shifted in response to mobile. Given point #1 and the state of battery development.... the blocker wasn't "performance". It was "performance per watt". And on that metric you couldn't be more wrong.

Since ~2005, MIPS per watt has improved 15x to 30x.

Also - fun news is that the traditional CPU pipeline really isn't the bottleneck for AI workloads. So we're going to see incredible interest in things like memory bandwidth and other inference related hardware bottlenecks, which haven't already been optimized.

BeetleB · 2026-06-09T14:12:42 1781014362

> There wasn't much demand for the extra capacity. Even low end cpus from a decade ago are plenty capable for just browsing the web and typing up documents.

It stalled before the rise of PC-as-Internet-portal.

I bought a high end PC in 2003, and 5 years later the PCs were not much faster - probably not even 2x. Around 2008-2010 was when most people started using PCs as a way to connect to the Internet.

It stalled because scaling got a lot more challenging. Not because of lack of demand.

horsawlarway · 2026-06-09T19:31:38 1781033498

Yes, but it only stalled along a single dimension - Single core clock speed.

I was building gaming machines in the early 2000s, I absolutely remember the 4ghz wall that cpus hit.

But it wasn't a real wall... because we then got one of the arguably most influential processors ever in the Core 2 duo. Which... blew the limit away by giving you two processors clocked at 2.93 GHz each.

And honestly, even then - it was lack of demand (we could go to 4+ghz, but we didn't want to pay the power bill for the rest of the system - the planned pentium 5 was 7-10ghz on paper, but they canceled the project because keeping it fed and cool was too hard for personal desktop machines).

Of Note - we did reach these speeds on consumer hardware (ex - in 2012, Andre Yang hit 8.794Ghz on an AMD FX-8350)

So it was never "impossible" to keep scaling. It just wasn't worth it compared to going multi-core.

---

And maybe it's because I was in my formative years at this time, but you're off by 5+ years with this:

> Around 2008-2010 was when most people started using PCs as a way to connect to the Internet.

Gmail was a web only email client released in 2004. Wikipedia was released in 2001. Web browsing was very much one of the "killer" apps for computers by the 2000s. What do you think the damn 2000s dot-com bubble crash was?

at the risk of aging myself - I was born in '89, and I literally do not remember a time where we didn't have DSL speeds and above (friends houses often still had dial-up until ~2005, though).

BeetleB · 2026-06-09T21:03:03 1781038983

> Gmail was a web only email client released in 2004

Well, Gmail was actually one of the last web based email clients people used :-) Yahoo mail, Hotmail, and so many others predate Gmail by years.

> Web browsing was very much one of the "killer" apps for computers by the 2000s.

One of them. People still used non-browser apps for all kinds of things: Media consumption (people didn't watch movies on Youtube), Office (Google Docs was very much a niche thing for many years), photo-editing (lots of pirated versions of Photoshop/Lightroom years after the iPhone release), etc.

Most non-mail, non-social media, non-shopping stuff people do on the web these days was a dedicated SW from the vendor in those days. Want to make a photobook? Download this Windows binary and set it up there. It will then communicate with the server for the order (no browser utilized).

> at the risk of aging myself - I was born in '89, and I literally do not remember a time where we didn't have DSL speeds and above (friends houses often still had dial-up until ~2005, though).

Spring chicken! My first online experience was on a 340 baud modem :-)

horsawlarway · 2026-06-09T13:01:21 1781010081

Because I have a fixed expenditure on my local machine, and I can be absolutely sure of the costs over a long horizon (5+ years, for low end hardware life, 10+ years with moderate care). Not something that's true for cloud costs.

Your argument is actually really similar to an argument around the time Uber started kicking into gear and expanding.

It went:

---

"Why should I own a car when it's actually cheaper to just Uber for all my rides, compared to the cost of buying, maintaining, and insuring a car?"

---

And that wasn't an insane argument at that exact moment. Uber was pricing itself in the range of $5-$7 a ride, was novel and high quality.

Except take a look around today... Uber in my area went from ~$5 a ride to ~$27 a ride for the same trip. Uber's quality has also degraded quite a bit. It went from primarily high end, new cars with immaculately clean interiors to "average".

So want to make a wager on what's going to happen with cloud costs over the next decade for inference?

Because my strong hunch is they're going to follow exactly the same trend. They will stop being subsidized, providers WILL downgrade model quality to improve operating costs (and you'll have no control over this outside of enterprise contracts), and companies will start exploring "additional revenue options"... which means they'll shove ads and sponsored content into your results.

Is it worth being ~10-18 months behind the latest and greatest to avoid that entire set of shenanigans? I'd vote yes... I pay one time up front, and get usage limited by my hardware for the cost of electricity over a 10 year timeline. That's a decent deal with no surprises.

You're welcome to rent, but renting makes you subject to the whims of the owners. They're being very nice right now to attract all the flies. That's not a mistake, and it's absolutely a trap.

---

Side note - if you're only able to do labeling tasks with a local model... you're holding something very, very wrong.

iwontberude · 2026-06-09T12:47:35 1781009255

Keep working on your agentfu because there is a sweet spot with subagents and parallelizable plans. It’s not about better, it’s about efficiency and picking the right model for the job. You can achieve the same results as frontier models with the right type of planning and context management on local Chinese models.

nozzlegear · 2026-06-09T16:25:47 1781022347

Depends on what you're doing, of course, but for the small and focused tasks where I'm using agentic AI, local models on my M1 Mac Studio are superb.

iwontberude · 2026-06-09T12:45:58 1781009158

I was freaked out being stuck with OpenAI and Anthropic. I setup qwen3.6:35b-mlx on my Mac Studio M1 Ultra and was blown away really. I am no longer afraid that Anthropic or OpenAI will be able to control the market.

strictnein · 2026-06-09T01:03:53 1780967033

Didn't Qwen stop releasing their more powerful models because they're commercializing them?

mswphd · 2026-06-09T03:35:00 1780976100

Yes and no.

Qwen 3.5 was released 3/2/2026. It includes models up to a 397B-A17B model

https://huggingface.co/collections/Qwen/qwen35

A day afterwards, a high-up technical leader working on Qwen was let go

https://techcrunch.com/2026/03/03/alibabas-qwen-tech-lead-st...

The more recent Qwen 3.6 was released on 4/16

https://huggingface.co/collections/Qwen/qwen36

This does not include any particularly large models. But the models it contains (Qwen3.6 27B and Qwen3.6 35B-A3B) are the local models people have been very excited about lately. So they didn't release any larger models, and the models people praise so much are from this most recent release.

tyre · 2026-06-09T04:30:37 1780979437

If they stop releasing their larger models because they want to monetize, would we expect them to release better small models that can outcompete those?

mswphd · 2026-06-10T14:11:53 1781100713

there's pros and cons to it for them. Clearly, they get good branding (at least in enthusiast circles). Perhaps more important is they get community work on optimization. There have been significant performance uplifts on the Qwen3.6 models from the open-source community since they were launched (at a minimum, multi-token prediction is now working with them. It is almost a 2x token generation speedup)

https://www.reddit.com/r/LocalLLM/comments/1ti9w4o/qwen3635b...

hadlock · 2026-06-08T20:59:12 1780952352

This is MS Word Macros all over again

hadlock · 2026-06-08T20:14:23 1780949663

Apps I can't uninstall on my Pixel 9 or 10 (I forget): Gemini, Google TV, My Pixel, Now Playing, Screenshots, Wallet, Weather, YT Music

Some of those might be system level stuff, but I don't know what the heck Google TV is, and I certainly am not signing up for whtever "YT Music" is. Probably some spotify subscription thing they will cancel like they did google play music and whatever came before that. But I can't uninstall any of these, or even delete the icons. They're just visual trash I can't hide from the launcher.

I don't trust the digital music stores anymore, cumulatively I have probably $100 worth of music I've bought across 3-4 music stores in the last 25 years and I can't access any of it anymore. Meanwhile my MP3 collection and WinAmp from high school continue to work without issue.

hadlock · 2026-06-03T18:00:39 1780509639

>The added cannibalization of releasing them is effectively zero, so the reputational benefits are likely to be worth it.

Nobody would be looking at Qwen if their ~30b class models weren't fantastically good, it's great advertising and builds significant goodwill with developers, who are going to be your biggest advocates.

The other thing is, all these models are already disposable grade, and in a year they'll all be outclassed by The Next Big Thing. "Open" models are less than 18 months behind SOTA right now and I can't imagine that will slow down much over the next two years, they may even begin to close the gap. Nobody even talks about llama 4 anymore despite only being a year old.