Performance should not be priority #1. Security should be. Why do we slow down all CPUs to prevent SPECTRE attacks yet continue to write in C? As rav1d shows, the perf loss is far less to migrate from C to Rust than it is to apply SPECTRE mitigations, and adding a sandbox around a memory-unsafe codec is going to be way more expensive again than using Rust code to start.
> Performance should not be priority #1. Security should be.
For a web browser, or a server in a bank, sure. For anything else, questionable.
> adding a sandbox around a memory-unsafe codec is going to be way more expensive
In modern world, overhead of strong sandboxes is surprisingly small. A nuclear but most reliable option is hardware assisted VM. On modern computers with SLAT and virtualized IO the overhead for most use cases is negligible. If you want something lighter weight, can use a multi-user nature of all modern OS kernels and isolate into a separate process with restricted permissions. Sandboxing overhead is approximately zero.
rav1d is not a full rewrite of dav1d to rust. So it really doesn't show that. It's currently C + rust + asm.
I don't think we can say anything about what this does or does not prove about the performance of safe code.
> Performance should not be priority #1. Security should be.
Entirely depends on the application. The reason rust has `unsafe` is because there's some situations where performance needs to preempt potential security problems.
Codecs are difficult and expensive to develop. Therefore they get reused in many contexts, including security critical ones. Sandboxing is shown over and over to not be a great security solution, so what this means in practice is that security-critical software that needs software decoding get pwned because software engineers don't care to prioritize it in the first place.
Why shouldn't safety be the default? If you really want to, it wouldn't be too hard to maintain a patch on top of rustc to drop the bounds checks if you want to compile object files without them.
Software decoding has a safety culture problem, and we need to talk about it.
Because safe code isn't fast enough to decode live video.
> If you really want to, it wouldn't be too hard to maintain a patch on top of rustc to drop the bounds checks if you want to compile object files without them.
Yeah, but then you are undermining safety in a critical way that does lead to security vulnerabilities (buffer overflow). And you are also now maintaining and requiring other devs for a project to use a custom version of rustc. That's certainly part of the reason that's simply not happened.
But another major part of it is that encoders end up with a lot of custom ASM regardless. That custom ASM is going to be where vulnerabilities end up. You don't really escape that by using rust.
If you are already abandoning where you critically need safety the most for performance, then why pick a language that additionally penalizes you for using unsafe constructs?
> Software decoding has a safety culture problem, and we need to talk about it.
Compilers and languages have an optimization problem that we need to talk about. SIMD optimizations remain a very hard thing for compilers to get right. We should talk about what it'd take to make compilers better and the reasons for why codec devs need to drop down to asm instead of using a high level compiler.
There might not be a solution to this problem, there are reasons for it.
> Because safe code isn't fast enough to decode live video.
I strongly doubt that.
And if any implementation of AV2 can be "fast enough", then there should be no question at all that we can write "fast enough" safe decoders for every other codec. Absolutely no way safe code is inherently that much slower.
Is it simply impossible to get compiled code within a factor of five? That claim needs strong evidence.
More importantly, if you can show that your assembly code isn't altering pointers it shouldn't alter, and isn't going out of bounds on its reads, you're most of the way to having assembly in your verified safe code. And rough bounds checking with padding can as cheap as a bitmask.
> Is it simply impossible to get compiled code within a factor of five? That claim needs strong evidence.
1. I didn't make that claim.
2. A negative assertion doesn't require evidence. If I say "this is impossible to do" the burden to disprove me is showing it's actually possible. You can't prove a negative. For example, if I say "the tooth fairy doesn't exist" I don't need to provide evidence of the tooth fairy's non-existence. If you disagree, you need to provide evidence to the contrary.
Then you didn't read my previous comment correctly. AV2 must be "fast enough" if the designers aren't crazy. And AV2 is 5x slower than AV1. Therefore if compiled code is within a factor of five of hand-written assembly, it's "fast enough" for AV1, and h.264, and probably h.265 too.
You were disagreeing with my claim that other codecs could be "fast enough" with a safe compiler, right? If you weren't disagreeing, I don't know why you challenged me to show you some particular kind of code.
> 2. A negative assertion doesn't require evidence. If I say "this is impossible to do" the burden to disprove me is showing it's actually possible. You can't prove a negative. For example, if I say "the tooth fairy doesn't exist" I don't need to provide evidence of the tooth fairy's non-existence. If you disagree, you need to provide evidence to the contrary.
You're saying it's "simply impossible" for a compiler to optimize instructions to a certain level. But anything one person can code, another person can teach a compiler to do in similar situations. I don't need to show you an example, I just need to point you at the Church-Turing thesis and related documents.
What's supposed to be the big source of unsafety in codecs though? Feels like the problem here is that C developers are ruining the reputation of C with their garbage code.
Bounds checking as a source of slowdown is overrated in a niche where you're working on fixed size blocks. It feels like the C developers are getting the parts outside the ASM kernels wrong.
> What's supposed to be the big source of unsafety in codecs though?
Hand written assembly. It's quite easy to accidentally start reading or manipulating a block of memory you didn't intend to when doing complex SIMD transformations.
> Bounds checking as a source of slowdown is overrated in a niche where you're working on fixed size blocks.
I think you don't really understand how codecs work. It is not uncommon for a transformation like `a = b[c[i] * 3 + offset];`. There's no way for a compiler to omit the bounds check because it can't prove the contents of `c` aren't going to exceed the bounds of `b`.
This isn't a "crappy C developer" problem. This is a "There isn't a language that does a great job at capturing high level SIMD expressions" problem.
We must not continue to develop media codecs in memory unsafe languages. Small, auditable sections can opt-out perhaps, but choosing default-unsafe for this type of software is close to professional negligence.
Cryptography and video codecs are notable exceptions, they put a lot of effort to making the code provably memory safe: no recursion, limited use of stack variables, no dynamic allocations, etc. As a result, memory safe languages bring nothing but trouble by making it non deterministic, that’s especially true for crypto where compiler “optimisations” guarantee you side channels attacks.
Video codecs just don't need to do dynamic allocations because it's not relevant to the problem. There's still certainly plenty of opportunities for memory bugs because there's a lot of pointer math.
How is this POV compatible with the exploitable vulnerabilities, caused by memory safety, found in openh264, x264, dav1d, and practically every video decoder out there?
In cryptography, you want operations to run in constant time, even if it’s wasteful, otherwise an attacker could guess information about the key or plaintext by measuring execution times.
Modern compilers are extremely clever and will produce machine code that takes full advantage of modern CPU branch predictors, and reorder instructions to better take advantage of pipelining. This in itself will make the same code run at different speeds depending on the input data.
Then there is the whole issue of compiler version roulette. As a developer you have no idea which version of compilers your users and distros will use, and what new and wonderful optimisation they will bring.
For the codec itself, the majority of it is performance sensitive and often has a significant amount of assembly even, so a memory safe language doesn't change much.
However for the container/extractor... those should absolutely be in a memory safe language, and those are were a lot of the exploits/crashes are, too, as metadata is more fuzzy.
As a practical example of this see something like CrabbyAVIF. All the parser code is rust, but it delegates to dav1d for the actual codec portion
Of the 3 software AV1 encoders, the only one that is fully dead is the Rust encoder (rav1e). If people truly wanted memory safe encoders/decoders, they would fund and develop them.
I can totally understand why people would want a memory-safe decoder, but a memory-safe encoder is niche. Finding a memory-safety bug in a decoder is a matter of finding a single unchecked integer field somewhere; finding a memory-safety bug in an encoder requires first finding some sort of logic bug in the encoder and then crafting an adversarial input that survives a number of highly lossy transformations.
Compare the number of CVEs against x264 (included decoders don't count!) and FFmpeg's H.264 decoder.
I think these conversations are directed by the parties funding the efforts. Example: "we (large company) want a fast AV2 decoder" -> they pay a specialized team to do it -> this team works in C for the most part, so it is done in C. If there were financial incentives to do it in Rust, they'd pay more for a Rust decoder.
Yeah, I don't want my preferred playstyle to be favoured, just treated fairly:
* Add all of the non-car transport options: walking paths (including underground and raised paths for walking between large buildings in the winter, a la PATH in Toronto), bike paths, buses, streetcars, light rail, subways, inter-city trains, high speed rail, ferries
* Add parking lots as a feature to all commercial and residential construction, require every car to be parked somewhere when not in use, but allow residents/property owners to decide whether to build parking or not
* Allow land-value taxes as an alternative to property taxes, as well as the possibility for things like street parking and pollution ordinances to give you levers to incentivize/disincentivize the construction of parking lots
* Simulate emissions appropriately from all transport methods
* Parking lots and heavy traffic should lower property values, as citizens complain about the ugliness, pollution, noise, and danger of excessive traffic
There's a whole conversation to be had about the design of games like SimCity and how it affects future urban planners, but that may be going too far afield. Still, I think it would be nice to have a game that doesn't reward car-centric planning while burying the drawbacks.
We invested a lot into build system optimizations to bring this number down over time, although we did accept on the order of 200 KiB size overhead initially for the stdlib. We initially launched using a Gradle + CMake + Cargo with static linking of the stdlib and some basic linker optimizations. Transitioning WhatsApp Android to Buck2 has helped tremendously to bring the size down, for instance by improving LTO and getting the latest clang toolchain optimizations. Buck2 also hugely improved build times.
That's the first piece by Didion I've read, after her death, I'd always meant to read her more. The mask-less account was refreshing, the only counter-weight to flower-power I knew about was Altamont, was getting heavy Hunter S. Thompson vibes.
Great work by the MS team. It is great progress to shift OOB access into a controlled crash. These kinds of panic bugs are then easy to remediate, with clear stack traces, as we see in the turn around time from the report.
This is my experience as well: Writing parsers for complex file formats in Rust often leaves a few edge cases which might cause controlled panics. But controlled panics are essentially denial of service attacks. And panics have good logging, making them easy to debug. Plus, you can fuzz for them at scale easily, using tools like "cargo fuzz".
This is a substantial improvement over the status quo.
Tools like WUFFS may be more appropriate for low level parsing logic when you're not willing to risk controlled panics, however.
That's true, but really this kind of problem screams out for the approach taken in WUFFS. Have the programmer who is Wrangling Untrusted File Formats prove that what they wrote is correct as part of that exercise.
Are you counting ones that involve running malicious code in a sandbox and not just trusted code on untrusted input? Because then I'd agree, but that's a much harder and different problem.
My impression is that for the trusted code untrusted input case it hasn't been that many, but I could be wrong.
Sandboxes are difficult independent of language, see all the recent speculation vulnerabilities for instance. Sure, worse languages make it even harder, but I think we're straying from the original topic of "python/ruby" by considering sandboxes at all.
Is there a straightforward path to building Zig with polyglot build systems like Bazel and Buck2? I'm worried Zig's reliance on Turing complete build scripts will make building (and caching) such code difficult in those deterministic systems. In Rust, libraries that eschew build.rs are far preferable for this reason. Do Zig libraries typically have a lot of custom build setup?
FYI, build scripts are completely optional. Zig can build and run individual source code files regardless of build scripts (`build.zig`). You may need to decipher the build script to extract flags, but that's pretty much it. You can integrate Zig into any workflow that accepts GCC and Clang. (Note: `zig` is also a drop-in replacement C compiler[1])
reply