More

dcsommer · 2026-06-06T02:48:10 1780714090

https://www.memorysafety.org/docs/memory-safety/

dcsommer · 2026-05-31T18:13:20 1780251200

Performance should not be priority #1. Security should be. Why do we slow down all CPUs to prevent SPECTRE attacks yet continue to write in C? As rav1d shows, the perf loss is far less to migrate from C to Rust than it is to apply SPECTRE mitigations, and adding a sandbox around a memory-unsafe codec is going to be way more expensive again than using Rust code to start.

Const-me · 2026-05-31T20:05:04 1780257904

> Performance should not be priority #1. Security should be.

For a web browser, or a server in a bank, sure. For anything else, questionable.

> adding a sandbox around a memory-unsafe codec is going to be way more expensive

In modern world, overhead of strong sandboxes is surprisingly small. A nuclear but most reliable option is hardware assisted VM. On modern computers with SLAT and virtualized IO the overhead for most use cases is negligible. If you want something lighter weight, can use a multi-user nature of all modern OS kernels and isolate into a separate process with restricted permissions. Sandboxing overhead is approximately zero.

cogman10 · 2026-05-31T19:30:03 1780255803

> As rav1d shows

rav1d is not a full rewrite of dav1d to rust. So it really doesn't show that. It's currently C + rust + asm.

I don't think we can say anything about what this does or does not prove about the performance of safe code.

> Performance should not be priority #1. Security should be.

Entirely depends on the application. The reason rust has `unsafe` is because there's some situations where performance needs to preempt potential security problems.

dcsommer · 2026-05-31T20:51:48 1780260708

Codecs are difficult and expensive to develop. Therefore they get reused in many contexts, including security critical ones. Sandboxing is shown over and over to not be a great security solution, so what this means in practice is that security-critical software that needs software decoding get pwned because software engineers don't care to prioritize it in the first place.

Why shouldn't safety be the default? If you really want to, it wouldn't be too hard to maintain a patch on top of rustc to drop the bounds checks if you want to compile object files without them.

Software decoding has a safety culture problem, and we need to talk about it.

cogman10 · 2026-05-31T21:05:01 1780261501

> Why shouldn't safety be the default?

Because safe code isn't fast enough to decode live video.

> If you really want to, it wouldn't be too hard to maintain a patch on top of rustc to drop the bounds checks if you want to compile object files without them.

Yeah, but then you are undermining safety in a critical way that does lead to security vulnerabilities (buffer overflow). And you are also now maintaining and requiring other devs for a project to use a custom version of rustc. That's certainly part of the reason that's simply not happened.

But another major part of it is that encoders end up with a lot of custom ASM regardless. That custom ASM is going to be where vulnerabilities end up. You don't really escape that by using rust.

If you are already abandoning where you critically need safety the most for performance, then why pick a language that additionally penalizes you for using unsafe constructs?

> Software decoding has a safety culture problem, and we need to talk about it.

Compilers and languages have an optimization problem that we need to talk about. SIMD optimizations remain a very hard thing for compilers to get right. We should talk about what it'd take to make compilers better and the reasons for why codec devs need to drop down to asm instead of using a high level compiler.

There might not be a solution to this problem, there are reasons for it.

Dylan16807 · 2026-06-01T02:33:24 1780281204

> Because safe code isn't fast enough to decode live video.

I strongly doubt that.

And if any implementation of AV2 can be "fast enough", then there should be no question at all that we can write "fast enough" safe decoders for every other codec. Absolutely no way safe code is inherently that much slower.

cogman10 · 2026-06-01T12:13:26 1780316006

Show me the AV1, H.265, or even H.264 decoder that doesn't ultimately rely heavily on hand written assembly to achieve "fast enough".

You can doubt all you like. Ultimately, there's a reason why dav1d includes hand coded SIMD for common platforms.

It's simply impossible to get a compiler to emit something like this [1].

[1] https://github.com/videolan/dav1d/blob/master/src/x86/ipred_...

Dylan16807 · 2026-06-01T13:06:28 1780319188

Is it simply impossible to get compiled code within a factor of five? That claim needs strong evidence.

More importantly, if you can show that your assembly code isn't altering pointers it shouldn't alter, and isn't going out of bounds on its reads, you're most of the way to having assembly in your verified safe code. And rough bounds checking with padding can as cheap as a bitmask.

cogman10 · 2026-06-01T13:19:51 1780319991

> Is it simply impossible to get compiled code within a factor of five? That claim needs strong evidence.

1. I didn't make that claim.

2. A negative assertion doesn't require evidence. If I say "this is impossible to do" the burden to disprove me is showing it's actually possible. You can't prove a negative. For example, if I say "the tooth fairy doesn't exist" I don't need to provide evidence of the tooth fairy's non-existence. If you disagree, you need to provide evidence to the contrary.

Dylan16807 · 2026-06-01T14:16:13 1780323373

> 1. I didn't make that claim.

Then you didn't read my previous comment correctly. AV2 must be "fast enough" if the designers aren't crazy. And AV2 is 5x slower than AV1. Therefore if compiled code is within a factor of five of hand-written assembly, it's "fast enough" for AV1, and h.264, and probably h.265 too.

You were disagreeing with my claim that other codecs could be "fast enough" with a safe compiler, right? If you weren't disagreeing, I don't know why you challenged me to show you some particular kind of code.

> 2. A negative assertion doesn't require evidence. If I say "this is impossible to do" the burden to disprove me is showing it's actually possible. You can't prove a negative. For example, if I say "the tooth fairy doesn't exist" I don't need to provide evidence of the tooth fairy's non-existence. If you disagree, you need to provide evidence to the contrary.

You're saying it's "simply impossible" for a compiler to optimize instructions to a certain level. But anything one person can code, another person can teach a compiler to do in similar situations. I don't need to show you an example, I just need to point you at the Church-Turing thesis and related documents.

imtringued · 2026-06-01T07:34:03 1780299243

What's supposed to be the big source of unsafety in codecs though? Feels like the problem here is that C developers are ruining the reputation of C with their garbage code.

Bounds checking as a source of slowdown is overrated in a niche where you're working on fixed size blocks. It feels like the C developers are getting the parts outside the ASM kernels wrong.

cogman10 · 2026-06-01T12:28:46 1780316926

> What's supposed to be the big source of unsafety in codecs though?

Hand written assembly. It's quite easy to accidentally start reading or manipulating a block of memory you didn't intend to when doing complex SIMD transformations.

> Bounds checking as a source of slowdown is overrated in a niche where you're working on fixed size blocks.

I think you don't really understand how codecs work. It is not uncommon for a transformation like `a = b[c[i] * 3 + offset];`. There's no way for a compiler to omit the bounds check because it can't prove the contents of `c` aren't going to exceed the bounds of `b`.

This isn't a "crappy C developer" problem. This is a "There isn't a language that does a great job at capturing high level SIMD expressions" problem.

dcsommer · 2026-05-02T18:58:28 1777748308

We must not continue to develop media codecs in memory unsafe languages. Small, auditable sections can opt-out perhaps, but choosing default-unsafe for this type of software is close to professional negligence.

fguerraz · 2026-05-02T19:14:43 1777749283

Cryptography and video codecs are notable exceptions, they put a lot of effort to making the code provably memory safe: no recursion, limited use of stack variables, no dynamic allocations, etc. As a result, memory safe languages bring nothing but trouble by making it non deterministic, that’s especially true for crypto where compiler “optimisations” guarantee you side channels attacks.

WhatIsDukkha · 2026-05-02T20:59:56 1777755596

Thank you for mentioning this.

I wonder IFF Rust had an effects system that a Jasmin MIR transform (ie like SPIRV is for shaders) would be useful?

https://github.com/jasmin-lang/jasmin

astrange · 2026-05-02T22:43:19 1777761799

Video codecs just don't need to do dynamic allocations because it's not relevant to the problem. There's still certainly plenty of opportunities for memory bugs because there's a lot of pointer math.

dcsommer · 2026-05-03T21:53:32 1777845212

How is this POV compatible with the exploitable vulnerabilities, caused by memory safety, found in openh264, x264, dav1d, and practically every video decoder out there?

izacus · 2026-05-04T11:04:18 1777892658

Easily. It's a tradeoff.

simonask · 2026-05-02T22:51:41 1777762301

What in the world do you mean by “non-deterministic”?

C compilers, Rust compilers, and assemblers are all deterministic.

fguerraz · 2026-05-03T06:19:44 1777789184

In cryptography, you want operations to run in constant time, even if it’s wasteful, otherwise an attacker could guess information about the key or plaintext by measuring execution times.

Modern compilers are extremely clever and will produce machine code that takes full advantage of modern CPU branch predictors, and reorder instructions to better take advantage of pipelining. This in itself will make the same code run at different speeds depending on the input data.

Then there is the whole issue of compiler version roulette. As a developer you have no idea which version of compilers your users and distros will use, and what new and wonderful optimisation they will bring.

simonask · 2026-05-03T12:32:48 1777811568

I know that, but none of that makes the compiler output non-deterministic.

Determinism does not mean “easy to predict”, it just means “predictable”.

adgjlsfhk1 · 2026-05-03T02:57:45 1777777065

> C compilers, Rust compilers, and assemblers are all deterministic.

Within a version, yes, but not cross version. Different versions of GCC/Clang etc can give you completely different code.

kllrnohj · 2026-05-02T20:39:43 1777754383

For the codec itself, the majority of it is performance sensitive and often has a significant amount of assembly even, so a memory safe language doesn't change much.

However for the container/extractor... those should absolutely be in a memory safe language, and those are were a lot of the exploits/crashes are, too, as metadata is more fuzzy.

As a practical example of this see something like CrabbyAVIF. All the parser code is rust, but it delegates to dav1d for the actual codec portion

fishgoesblub · 2026-05-02T19:28:37 1777750117

Of the 3 software AV1 encoders, the only one that is fully dead is the Rust encoder (rav1e). If people truly wanted memory safe encoders/decoders, they would fund and develop them.

dataking · 2026-05-03T09:45:41 1777801541

https://github.com/memorysafety/rav1d got funded and developed. it is unfortunately a bit slower (typically by a single-digit percentage) than dav1d.

Sesse__ · 2026-05-03T10:47:21 1777805241

I can totally understand why people would want a memory-safe decoder, but a memory-safe encoder is niche. Finding a memory-safety bug in a decoder is a matter of finding a single unchecked integer field somewhere; finding a memory-safety bug in an encoder requires first finding some sort of logic bug in the encoder and then crafting an adversarial input that survives a number of highly lossy transformations.

Compare the number of CVEs against x264 (included decoders don't count!) and FFmpeg's H.264 decoder.

vlovich123 · 2026-05-02T20:08:54 1777752534

Fully dead in what sense? Seems like it still has active development to me.

fishgoesblub · 2026-05-02T20:11:27 1777752687

It hasn't had any proper quality/speed improvements in years. Only thing that has changed is updating deps and some bug fixes.

simonask · 2026-05-02T22:53:48 1777762428

Encoding is a way, way less risky thing to be doing compared to decoding.

snvzz · 2026-05-03T09:48:19 1777801699

There are many paths to memory safety, even if the one Rust project seems to be going nowhere.

There's other memory-safe languages, and there's formal verification.

e.g. seL4 favors pancake.

esseph · 2026-05-02T19:31:20 1777750280

> If people truly wanted memory safe encoders/decoders

Really? How many codecs have your neighbors contributed money for the development of, just curious.

computerbuster · 2026-05-02T20:46:12 1777754772

I think these conversations are directed by the parties funding the efforts. Example: "we (large company) want a fast AV2 decoder" -> they pay a specialized team to do it -> this team works in C for the most part, so it is done in C. If there were financial incentives to do it in Rust, they'd pay more for a Rust decoder.

esseph · 2026-05-03T15:34:44 1777822484

I'm more interested in the idea of general "people" (the commons) funding complex video encoders. I do wish that was the world we lived in, however :)

Telaneo · 2026-05-02T19:44:41 1777751081

Given Netflix's involvement with SV1-AV1, (not even that) indirectly, at least 1.

izacus · 2026-05-03T17:35:02 1777829702

Are you part of any codec development team to use "we" here?

maxloh · 2026-05-02T20:51:23 1777755083

Decoders written in Rust will be a lot slower than the equivalents in assembly.

dcsommer · 2026-02-09T05:11:18 1770613878

I think GP is simply identifying a potential popular niche that could be satisfied in a future city builder game, which seems quite on topic.

chongli · 2026-02-09T23:14:23 1770678863

Yeah, I don't want my preferred playstyle to be favoured, just treated fairly:

* Add all of the non-car transport options: walking paths (including underground and raised paths for walking between large buildings in the winter, a la PATH in Toronto), bike paths, buses, streetcars, light rail, subways, inter-city trains, high speed rail, ferries

* Add parking lots as a feature to all commercial and residential construction, require every car to be parked somewhere when not in use, but allow residents/property owners to decide whether to build parking or not

* Allow land-value taxes as an alternative to property taxes, as well as the possibility for things like street parking and pollution ordinances to give you levers to incentivize/disincentivize the construction of parking lots

* Simulate emissions appropriately from all transport methods

* Parking lots and heavy traffic should lower property values, as citizens complain about the ugliness, pollution, noise, and danger of excessive traffic

There's a whole conversation to be had about the design of games like SimCity and how it affects future urban planners, but that may be going too far afield. Still, I think it would be nice to have a game that doesn't reward car-centric planning while burying the drawbacks.

dcsommer · 2026-01-28T18:06:49 1769623609

Just for reference, Wamedia ships on the major Meta apps and on iOS, Android, Desktop, and Web platforms.

dcsommer · 2026-01-28T17:43:07 1769622187

We invested a lot into build system optimizations to bring this number down over time, although we did accept on the order of 200 KiB size overhead initially for the stdlib. We initially launched using a Gradle + CMake + Cargo with static linking of the stdlib and some basic linker optimizations. Transitioning WhatsApp Android to Buck2 has helped tremendously to bring the size down, for instance by improving LTO and getting the latest clang toolchain optimizations. Buck2 also hugely improved build times.

palata · 2026-01-28T22:49:56 1769640596

Thanks!

dcsommer · 2026-01-22T04:31:00 1769056260

Great reading to see beyond the clichéd, sanitized retellings of that era. It really makes you consider the prices paid for what some call progress.

hatmanstack · 2026-01-22T04:51:26 1769057486

That's the first piece by Didion I've read, after her death, I'd always meant to read her more. The mask-less account was refreshing, the only counter-weight to flower-power I knew about was Altamont, was getting heavy Hunter S. Thompson vibes.

dcsommer · 2025-11-16T15:35:45 1763307345

Great work by the MS team. It is great progress to shift OOB access into a controlled crash. These kinds of panic bugs are then easy to remediate, with clear stack traces, as we see in the turn around time from the report.

ekidd · 2025-11-16T18:28:02 1763317682

This is my experience as well: Writing parsers for complex file formats in Rust often leaves a few edge cases which might cause controlled panics. But controlled panics are essentially denial of service attacks. And panics have good logging, making them easy to debug. Plus, you can fuzz for them at scale easily, using tools like "cargo fuzz".

This is a substantial improvement over the status quo.

Tools like WUFFS may be more appropriate for low level parsing logic when you're not willing to risk controlled panics, however.

tialaramex · 2025-11-16T16:02:21 1763308941

That's true, but really this kind of problem screams out for the approach taken in WUFFS. Have the programmer who is Wrangling Untrusted File Formats prove that what they wrote is correct as part of that exercise.

dcsommer · 2025-11-01T16:08:26 1762013306

> basically fine

How many type confusion 0 days and memory safety issues have we had in dynamic language engines again? I've really lost count.

gpm · 2025-11-01T16:22:03 1762014123

Are you counting ones that involve running malicious code in a sandbox and not just trusted code on untrusted input? Because then I'd agree, but that's a much harder and different problem.

My impression is that for the trusted code untrusted input case it hasn't been that many, but I could be wrong.

bigyabai · 2025-11-01T17:44:41 1762019081

It depends, what language was the sandbox written in?

gpm · 2025-11-01T17:47:53 1762019273

Sandboxes are difficult independent of language, see all the recent speculation vulnerabilities for instance. Sure, worse languages make it even harder, but I think we're straying from the original topic of "python/ruby" by considering sandboxes at all.

zahlman · 2025-11-02T01:59:27 1762048767

How many ways to cause a segmentation fault in CPython, that don't start with deliberate corruption of the bytecode, are you aware of?

How is "type confusion" a security issue?

dcsommer · 2025-10-04T01:38:29 1759541909

Is there a straightforward path to building Zig with polyglot build systems like Bazel and Buck2? I'm worried Zig's reliance on Turing complete build scripts will make building (and caching) such code difficult in those deterministic systems. In Rust, libraries that eschew build.rs are far preferable for this reason. Do Zig libraries typically have a lot of custom build setup?

rockwotj · 2025-10-04T02:20:57 1759544457

For bazel:

https://github.com/aherrmann/rules_zig

Real world projects like ZML uses it:

https://github.com/zml/zml

esjeon · 2025-10-04T14:50:12 1759589412

FYI, build scripts are completely optional. Zig can build and run individual source code files regardless of build scripts (`build.zig`). You may need to decipher the build script to extract flags, but that's pretty much it. You can integrate Zig into any workflow that accepts GCC and Clang. (Note: `zig` is also a drop-in replacement C compiler[1])

[1]: https://andrewkelley.me/post/zig-cc-powerful-drop-in-replace...