If people are interested in this stuff, this is the house style guide that I've ended up with in mid 2026, its great-great-great grandparents were at Google, which informed Greg Badros and Mark Rabkin and Andrei Alexandrescu when they did the one at FB, which informed a bunch of trading work, which informed a bunch of GPU work.
It links itself to some things that really seem to have existed, like a straylight project linked to the ESA, and an old domain b7r6.net linked to another HN account. There are a lot of buzzwords there, but in aggregate it is nonsense. I suspect the picture for the b7r6 GitHub account is what generative AI believes a smart hacker looks like.
First off, i highly suggest that you expand this into a full-blown book. This could become a successor to a combination of {Adrian & Piotr's "Software Architecture with C++" + Fedor Pikus' "The Art of Writing Efficient Programs"} for the Agentic era.
I really like that you are using Lean4 for parts of code generation, tips for Agentic coding etc. which are all needed today. I myself have been thinking on these lines i.e. using formal methods for specification and verification so that agent-generated code can be "correct-by-construction" and efficient. Your write-up is the first i have seen which tries to provide the overall picture.
"There are a lot more important problems than Sri Lanka to worry about. Well, we have to end apartheid, for one, slow down the nuclear arms race, stop terrorism and world hunger. We have to provide food and shelter for the homeless and oppose racial discrimination and promote civil rights, while also promoting equal rights for women. We have to encourage a return to traditional moral values. Most importantly, we have to promote general social concern and less materialism in young people."
Arithmetic mean is just really bad for latency conversations (ditto MTTR). Other averages have their place but for a legible, accessible chart that's 4 lines in anything: p50, p75, p95, p99.9 with the last having the SLA is IMHO the right thing to goal and alert on in a cross-functional setting that's attaching engineering outcomes to business outcomes.
There's better math for advanced introspection, but for stuff everyone in the room can intuit no matter their discipline, that's a really sweet spot.
And it's motivating: the p99.9 latency is a bunch of quick, high-impact wins if you haven't profiles it yet. A good time is had by all.
Hallucination should be called "failure to ground".
Something about the cost model of US near frontier has the cattle prod out whenever a model is uncertain but thrashes on whether to search. Search flinch is roughly all hallucination.
I don't even wait for the model's turn, if there's a man page or Hoogle hit, stuff the last prefix cache cut point. You come out ahead.
Depends entirely on how competitive / adversarial the market segment is. And there are wild nonlinearities, just as you care a lot whether or not something fits in an 8-way set associative L1d but once you're too big or bank conflicted to make it the cost model drops sharply until you're about to spill the L2.
The hackers doing the drone software in Ukraine will go to enormous lengths to get something onto Jetson Orin, but once it needs Thor? New cost model.
I think what the parent might be asking is whether the cost of DRAM will be passed along to the most powerless actor in the system, and that depends on whether it's a real competition (war, HFT) or a pillow fight between frenemies (consumer Internet, too big to fail AI lab).
Us liberals have this quaint idea that good referees make capitalism an adversarial contest instead of a rolodex contest, but that idea is out of fashion at the moment.
Anthropic has a durable advantage that OpenAI does not, as much as they choose to squander it to pump the numbers.
They were first on a few things but the tell is the coherence of the thinking traces of Claude. You have to put a loss on that. GPT 5 series thinking traces are creepy, Gemini thinking traces are disturbing. They both represent forced discontinuities on the policy gradient.
Claude is good at tool use because it's gigantic and well-labeled, but the reason you pay the premium is for a thinking partner, not a tool user.
Claude Code is the cancer that will kill the patient, Boris is the the Kardashian version of Karpathy, with less business sense.
Using a multi-trillion parameter softmax attention transformer to parse nested delimiters is a perverse thing to do. It is hard to imagine a sillier way to boil the oceans than feeding JSON to an LLM, a task that a pushdown automata from the 1960s effortlessly did on a PDP-X.
The API business throws a massive model that by definition can't be inferred efficiently because nothing can across 4 different compute substates, at a problem that DSv4 nails at or near 100% while leaving most of the actual unique value of Claude on the table.
Claude should be in your house and car and your kid's classroom and shit.
Having it write tail -n5?
That's because Anthropic's A-Team is Meta's C-Team. Hell, I fired some of their stars myself.
It has graduated into being an active and fairly precisely targeted thought shaping tool.
I worked on Ads and Feed Ranking at FB/IG and we never dreamed of the scope for shaping behavior and opinion that is now routinely deployed by frontier and near frontier vendors. RLHF is basically feed ranking in the first place, preference gradient with no ground truth referee, late SFT on amplifying data sets, and affine injections into the residual stream with a fluent, earnest base model that the public has been conditioned to regard as omniscient and wise?
Yeah that's fucking mind control when applied at scale. We did some sketchy shit a decade ago, this is next level.
Derek Lowe is an extremely well-known and widely respected expert in the field of pharmaceutical drug discovery chemistry.
His series "In The Pipeline" has a cult following of experts and non-experts alike.
He is widely regarded as an authority on the chemistry of Alzheimer's.
For a fun introduction to his work, the "first hit is free" dopamine rush is his "Things I Won't Work With", a masterclass in bringing chemistry to life through the lens of synthesis actively dangerous to person and property.
He's the guy from FOOF[1]! I didn't immediately make the connection, thanks. Yeah, that series was kind of fun, even if it kind of reinforced my layman understanding that all chemists are explosion nerds :D
With 4.8 Claude has begun refusing to ground, leaking destabilizing injections into the web interface (in XML for some reason), and being generally argumentative.
By arguing he means trying to get a result that 4.6 just did and it was fun. You have to laboriously re-align 4.8 over incredibly dumb shit, especially if you're working on AI. And it's not meaningfully better at anything, the distribution is perturbed but net , net it's just shrinkflation.
It's basically identical to when GPT 5.1 went full corpo shill, something about the RLHF gradient necessary to do whatever IPO adjacent manipulation they need makes these things nasty and argumentative in general.
I caught the religion on using types in conjunction with LLMs about eighteen months ago, but I only really got serious about `lean4` like six to eight months ago and now I wouldn't even consider using AI assist in software work with a `CIC` proof substrate that has practical C/C++ (and therefore, everything) FFI.
We've banned everything from JSON to Python, rewritten `nix` to have a compiler, and almost everything we write is not only property tested and multiply fuzzed to within an inch of it's life, but we have proofs in `lean4` that at a minimum drive property tests via `.olean` linkage, and when we have the bandwidth we prove exhaustiveness over the domain and property test that.
We skip the whole C++/Rust thing because all of the fast stuff is generated from `lean4` and so it doesn't really matter (C++ has advantages when bugs in it are actually bugs in `lean4` code, but you could go either way).
This is a big departure, and I certainly don't blame anyone who is skeptical: "ban JSON and Python wtf?!?!", but we've done millions of lines (checked) of this stuff and AI + formal systems is a dramatically bigger leap than no AI -> AI and Python. The latter in our experience is not monotone in progress, the former is almost always monotone in progress.
And you can do wild shit, this is a formal proof of the polyhedral tensor calculus that is modeled by things like ISL and CuTe, and using that we can get swizzling and tiling using `mdspan` in C++23 on device and prove it's right (up to some L'Hopital arugment about the coverage which this example doesn't demonstrate well: https://github.com/b7r6/mdspan-cute
That in turn, well, it goes real fast. On the first try.
- `continuity` is a `lean4` metaprogramming system that we use all kinds of ways but the real meat is it allows formal specification of codecs and state machines in ways that make security and performance properties proof amenable, the key trick here is to limit parser power to just what you need and no more. a very cool thing is that we can add targets to it, so when we do Zig for example, Zig will immediately get proven correct and frontier performance support for dozens of protocols that are not all mature right now.
- `libevring-cpp` (bound up into `libevring-hs` and `libevring-rs`) is a Trinity-inspired deterministic event replay system that replaces anything you would have done with `libuv` or whatever, and it's wired into `io_uring` (we're stuck with `kqueue` on darwin, eh). it interfaces with `continuity` machines and codecs (which are generated very carefully for the hardware they run on) and we have yet to find a way of measuring such programs where it doesn't resoundingly shatter the performance of any other asynchronous programming primitive in any language. i'm sure the community will prove us wrong when we release it, but it's real fast. and you get much stronger guarantees than in most such systems (Trinity derived, so if you can repro a bug, you can walk the event trace until it's sitting there in GDB and shit)
- `hyperconsole-cpp` (and `hyperconsole-hs` and `hyperconsole-rs`) is the TUI library idea taken to some deranged extreme on performance and supports everything in `notcurses`, it's pretty wild: https://youtu.be/YqgEtpJ8tGI
- straylight-nix is a complete rewrite of nix that fixes thousands of bugs, hundreds of them security adjacent and dozens of them we only talk to vendors about. it's daemonless, dramatically faster, ground-up WASM-targeting compiler with a formal grammer, uses an extremely fast LSM-based store (it can read the legacy store but we don't write it) that fixes all the problems in floating CA, IFD is too cheap to care about, and recursive nix is no longer and issue (see daemonless). it supports tearing derivations into `REAPI` actions that you can feed to your friendly native NativeLink or whatever, which just goes through them like a woodchipper. KVM-based sandbox with snapshot and restore, really opens the world up on what your builders can be.
- `slide` is the reference implementation of a family of protocols called `SIGIL`: `SIGIL-LLM` is a binary encoding for LLM data that resets on ambiguity and drops the average bytes on the wire from e.g. OpenRouter to your harness from ~hundreds to ~1.5 per token, `SIGIL-API` is a bijection on OpenAPI 3.1.0 and AsyncAPI 3.0.0 that gives comparable improvements, and `SIGIL-SH` is such an encoding for a sensible subset of bash. this does about 1.5 billion tokens per second on a laptop and never emits partial frames, so you don't get speculative execution rollback problems in your harness that tilt agents off.
- `// WEAPON //` is an adversarial, vendor-skeptical, full-take surveilance agemt harness built on `hyperconsole-cpp` and `SIGIL` (so, you could absorb the entire token output of OpenRouter on one machine if you wanted at least on the wire and in the terminal, clearly the bottleneck rapidly becomes whatever the agents are calling, but it's `zmq4` transport underneath `SIGIL` so it's also trivial to full-take all of your data for fine tuning or whatever you want it for into e.g. `parquet` on R2. `// WEAPON //` does a bunch of stuff: the tool call surface is heavily optimized for AST-level edits that miss dramatically less, we intercept and manipulate shell commands (slice off the stupid `| tail -n5` that keeps the droid in a loop not seeing the error, pre-emptively ground using heuristics that have been tuned (defeats the search flinch), and always recovers from any stall, or nag box, or anything else that would serve as an unannounced rate limit, it's fine if vendors rate limit but they need to put it in their ToS. it has a bunch of other primitives, agents run in real KVM sandboxes and speculate out as wide as you want to pay the tokens for. we hyper-manage things like the cache breakpoint geometry of Anthropic so e.g. Opus rarely misses in cache and always hands off edits to specialized tool use models. it's pretty extreme the difference in outcomes relative to all this React jank.
- `s4` is a general-purpose compiler from most any pytorch 2.0 model to `myelin`-level performance on NVIDIA (we only support NVIDIA Blackwell at the moment, that might change) and it's never worse than `myelin` because if we don't out-tune it we invoke it, but we out tune it a lot because we've proven a lot of decideability theorems about tiling and scheduling on both 1CTA and 2CTA, so we can often arrive at a finite, enumerable set of schedule/tile choices. `myelin` mops up the random garbage around the big GEMMs just like in TRT-LLM.
- `sigil-trtllm` is inspired by TensorRT-LLM-Edge but designed from the ground up around Mellanox/ConnectX and in particular GPUDirect, so it can stream `SIGIL-LLM` tokens directly onto the wire whereas something like Dynamo is usually traversing both Python and NATS, which is super weird to us. this uses the `s4` compiler very heavily.
- `straylight-cas` is a geographically distributed content addressable store backed by any R2-compatible (so most any S3-compatible too) object store with multi-level LSM and extreme performance memtables, optimistic hinted handoff over `zmq4` to other geographies, and a really simple operational story, this is kind of the thing that powers the product surface.
... which is the thing i'm less ready to talk about because it's supposed to be a surprise.
No we're pre-alpha, I guess most people would call it stealth but it's more like, not quite done and we don't want to waste people's time because our entire thesis is around correct outcomes in AI systems at a level that permits their use in outcome-critical regimes (we sometimes call it "insurable" AI as a north star, would LLoyds of London or Swiss Re stand behind someone who was writing policies on this?).
Now a bunch of that is development tooling that copes with agent-scale software development, and a lot of that might become product surface, so we have a lot of usage denominated by like, bytes and agent hours and stuff because we build this stack in itself, but that's somewhat orthogonal to the north star vision.
We'll make sure to give the HN community the opportunity to see this stuff as early as anyone does if people find it interesting, most of the above will be open source fairly soon. Don't know if it'll make the front page, but the product will be called `ORBITAL`, so if you see that floating around that's us.
It's opinionated but it has served me well.
https://gist.github.com/b7r6/5dde648f5dc1dea1e9039f2211f5d40...
reply