Hacker Newsnew | past | comments | ask | show | jobs | submit | swingboy's commentslogin

GPT5.5 xhigh seems to benchmark about on par with Mythos for cybersecurity.

The Trump administration would never do anything to manipulate the markets. /s

I realize these models are locked up pretty tight and terabytes in size, but in a future like that, I don’t see them not being leaked via an insider. The weights have to be loaded into VRAM at some point.

It’s a pretty safe bet that every frontier lab has multiple foreign intelligence agencies running assets inside of it.

Every hyperscaler hosting these models outside of FEDRAMP environments has been compromised by every regional power’s intelligence services. Fable was running all over the world until today.

AWS and friends are very good at providing excellent enterprise grade security, but it’s literal child’s play for nation state threat actors to exfil these models.

TEMPEST / EMSEC alone is a wide open door for unclassified datacenters when the Mossad’s out to get you.


I'm skeptical that you're going to be able to reliably exfiltrate ~10TB of model weights using TEMPEST. Which is not to say weights are secure, just that this isn't the threat model I would be concerned about.

That would depend on what gets leaked, as I'm not so sure that the weights by themselves would be enough to replicate the architecture. I imagine some part of the secret sauce will remain in the architecture, and the tensor dimensions may not be enough to decode it.

I'm sure if proprietary models continue to be a big thing, the methodology of their storage and loading on hardware will be obfuscated quite a bit.


But you can see this is not true (yet); competitors/Chinese labs are less than 6 months behind: either via leaks or by just stumbling on the same improvements with time/effort.

What chinese labs are on par with GPT-5.3 and Sonnet 4.6 that I can go and use today? (granted they're 4 months ago, not 6 but nothing was released in Dec/Jan so I rounded up).

I use GLM a lot; for what I do (coding), it is Sonnet quality; I did not try the newest 5.2 yet to compare to Opus (some people are hinting). Older Sonnet/Opus , not SOTA but that already worked fine for me.

Hope it happens someday. That'd probably be the best possible outcome for all of humanity.

The gamers would really be complaining about why they can’t run Fable.torrent on their gaming PCs

I don't think it's a good idea to give the crowds that kind of weapon. The first thing they'd do is "liberate" the model aka remove guardrails and safetly-protocols and brag on X / reddit with it and throw it into the public. That's only cool for a geek that doesn't think about the ethical impact of such a move. You'd basically become responsible for anything that is done with it, forever - have a good sleep. /s

As opposed to what, the US military, or better yet Israel (because we all know they won't be excluded) using that model to drive weaponry that kills people?

Your hypothetical implies that there is a better alternative, but when those models are "restricted", in practice that means that the only people who have access to them are precisely those who can and will use them for the worst kind of shit. So yes, releasing them to the public is a better deal, ethically speaking, at least then the playing field will be slightly more equal.


There are plenty of weapons (see custom made virus) which no state actor (or even an informal militia) would want to release, as these weapons attack everyone. But, open access to details of its construction leaves everyone vulnerable to motivations of small groups of crazy individuals.

What if I told you there are no safety guardrails. I used GLM 5.1 and had fable literally build a harness to avoid triggering guard rails. I built skills carefully and had Fable doing vuln research and exploit repro in a few hours. I called the project manhattan. The GLM models are down for almost anything so I named it Oppenheimer. It orchestrated the fable CLI agents via tmux. This whole Fable/Mythos thing is such a fucking joke. It is all PR and theatre and they know it.

I’ve been doing pentesting with LLMs for a while and only hit a few “nope I won’t do that” and one “this conversation is flagged for being against the TOS”. No idea what the guardrails are but they are trivially abused

Same model that costs $12 in tokens to finally add “overflow-x: hidden;” to an element, by the way.

https://news.ycombinator.com/item?id=48498573


overflow is CSS 101

Immediately I thought “isn’t this just an overflow issue?” Amazing how far these models still have to go and also how many people don’t know basic CSS.

This is why I really like karapathy's idea of llms having spiky intelligence.

We would assume that if tasks A and B are closely related. Mastery in A would mean mastery in B but that doesn't always work with an LLM


Yeah pretty crazy capability from the AI but also sad that we're at the point where web developers don't know right click->inspect element, and scrolling overflow properties (one of the most basic and common parts of CSS).

What's your theory on why the bug was present in Safari on macOS but absent in Chrome, Firefox, and WebKit for Playwright?

Browsers tend to not lay out things totally identically in my experience. Especially when it comes to scrollbars. So the bug probably was present on the other browsers but it just happened to not be hit. I'd have to play around with the dev tools to know for sure.

Also I'm not sure the fix is even correct. overflow-x: hidden means it just chops off any overflowing content which means you don't get a scroll bar, but if the user types to much it just goes into an invisible void they can't see.

See https://developer.mozilla.org/en-US/docs/Web/CSS/Reference/P...

So this could be a case of the AI doing its classic "the symptom is gone!" thing.


> Also I'm not sure the fix is even correct. overflow-x: hidden means it just chops off any overflowing content which means you don't get a scroll bar, but if the user types to much it just goes into an invisible void they can't see.

That's what I figured would happen too, but I tested it and it doesn't.


Learn to center a div

Copy and paste code from stack overflow until the div is centered

Ask AI to center it


$12 and 200k tokens!

Interesting that the `brew-rs` experiment has concluded and didn't find much of a performance increase. I suppose that is expected though with a lot of the bottleneck being network IO?

What file format(s) are giant LLM models distributed in? I’m surprised they don’t get leaked by employees.

These are terabyte sized files (realistically a multi hour transfer) that you're unlikely to have access to in the first place. Every organization has exfiltration checks these days. You may succeed but you'll want to be on a plane to a non-extradition country no more than hours after you kick off the transfer.

The employees are hoping to become very very rich after the IPO and after they are allowed to sell the shares given to them - risking a likely multi-million dollar pay back to leak a model that will be superseded by publicly available models in a couple of years is not a likely decision.

I assume they’re encrypted/DRM’ed when deployed on inference hardware, so only core researchers/sec admins would potentially have some access to unprotected weights, and they are far too well paid to risk it leaking the model

Incentives matter on the average, but people are too unpredictable for categorical statements like that. They can always have other reasons beyond personal gain to leak secrets.

There was no shortage of spies and defectors leaking American nuclear secrets to the USSR during the Cold War.


I wouldn't be surprised if they encrypt them at rest, but at some point the weights have to be loaded into vram.

Newer NVidia cards (H100 and up) support both in-memory model encryption and ‘trusted’ execution environment/remote attestation, not sure how widely used in frontier model deployments, but at least vendor claimed perf overhead is ‘3%’ [0]

[0] https://www.spheron.network/blog/confidential-gpu-computing-...


What’s the point? Anthropic and other frontier vendors already provide their models on other services like vertex, bedrock, or openrouter

It’s not like anyone can home lab one of these models without quite a bit of hardware


Yeah we can probably figure out how to run it on xiaomi gpus

This guy is a goober.

Please don't post unsubstantive comments to Hacker News, regardless of who is a goober or you believe they are.

What is revising in this context? For Americans.

Studying previously learned material to prepare for an exam

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: