Hacker Newsnew | past | comments | ask | show | jobs | submit | zadikian's commentslogin

This is exciting. INSERT (SELECT ...) doesn't work though, right? The docs only mention VALUES inserts.

Not yet, but actively working on this as we speak.

The "transaction safety" part is confusing. What they mean is if you use SERIALIZABLE, transactions need to be retried, so your code inside the xact should be idempotent. I guess this is safer in Haskell because there are no variables, but that doesn't stop your code from having other side effects.

> TBH I think I've seen more database use than not specifically because it serves as the central race-resolver in a system

But you usually don't need serializable for this, cause READ COMMITTED locks the rows during updates.


Cause I want relations and SQL. But also I kinda get what you mean and would not use serializable in Postgres.

In my experience, the performance hit is so bad that it's not feasible to use that way. It's also not strictly safer behavior-wise because retries can trip people up.

Tbh I always forget the specifics soon after reading them. Basically you can do an atomic UPDATE WHERE if there are no subqueries involved. 90% of the time that's good enough, and for anything else I end up refreshing on features like SELECT FOR UPDATE.

Well also I know Postgres UNIQUE indexes provide additional locking. Like you can do an INSERT... WHERE NOT EXISTS or INSERT... ON CONFLICT that is guaranteed to succeed.


> Well also I know Postgres UNIQUE indexes provide additional locking. Like you can do an INSERT... WHERE NOT EXISTS or INSERT... ON CONFLICT that is guaranteed to succeed.

That's true only for the latter (and even then only at a isolation level that's not too strict).


Oh I misremembered, yeah just tested and the second INSERT errors.

Was curious about the Flexcoin hack, but the article wasn't loading, so here's an archive: https://web.archive.org/web/20240423000007/https://hackingdi... Supposedly it was this simple:

  mybalance = database.read("account-number")
  newbalance = mybalance - amount
  database.write("account-number", newbalance)
  dispense_cash(amount)   // or send bitcoins to customer
and MongoDB didn't even have a way to do this atomically? An RDBMS with read-committed would handle this fine if you did "read for update" on that row.

This is sorta how I've felt working the past ~7 years.

Simple example, we've been striving for 90% unit test coverage and thorough code review when there's 0% integration test coverage. I blame the metrics only looking at unit tests, but also many people think unit tests should come first. I would prioritize integration. There are some small pieces that need to work reliably, but if your system relies so hard on all of them working right, it's a bad system. That, and too many things will work in pieces but not overall.

Broadly I'm gonna assume that the team will later hire solid SWEs who don't necessarily know how our stuff works, and aren't going to read 100 docs about it. If this is a backend+DB combo, get your DB right and there won't be too many wrong ways to code against it in the future, get it wrong and it becomes a black hole for SWE-time. Or if someone on their first day can't run a system locally for debugging, no matter how elegant the code is, don't count on that system getting fixed quickly during an outage.


The home solution is supposed to be mDNS. I just checked right now, my mDNS isn't working on my LAN, idk why.


I'm fine with that. The part that makes it not really an abstraction is, you still deliver code in the end. It'd be different if your deliverable were prompt+conversation, and the code were merely an intermediate build artifact. Usually people throw away the convo. Some have tried making markdown files the deliverable instead, so far that doesn't really work.

It makes even less sense when people compare an LLM to a compiler. Imagine making a pull request that's just adding a binary because you threw the source code away.


The whole field of reproducible builds is only a field because compilers also have had trouble historically of producing binary artifacts with guaranteed provenance and binary compatibility even when built from the same source codes.

If I assign a bug fix ticket to a human developer on my team, I won't be able to precisely replicate how they go about solving the bug but for many bugs I can at least be assured that the bug will get solved, and that I understand the basic approach the assigned dev would use to troubleshoot and resolve the ticket.

This is an organizational abstraction but it's an abstraction just the same, leaky as it is.


> The whole field of reproducible builds is only a field because compilers also have had trouble

No, this is not comparable. The reason reproducible builds are tricky is not because compilers are inherently prone to randomness, it's because binaries often bake-in things like timestamps and the exact pathnames of the system used to produce the build. People need to stop comparing LLMs to compilers, it's an embarrassingly poor analogy.


> The reason reproducible builds are tricky is not because compilers are inherently prone to randomness

And neither are LLMs. Having their output employ randomness by default is a choice, not a requirement, just like things like embedding timestamps into builds is a choice that can be unwound if you want the build to be reproducible.

> People need to stop comparing LLMs to compilers, it's an embarrassingly poor analogy.

They are certainly different things, but if you are going to criticize LLMs it would be better if you understood them.


> Having their output employ randomness by default is a choice, not a requirement

This is not really meaningfully true. E.g. batching, heterogeneous inference HW, and even differences in model versions can make a difference in what result you get, and these are hard to solve.


But again, these are all things that are also true of build systems.

GCC 16.1 vs. 15.2 will get you differences. GNU ld vs. gold vs. mold vs. lld will get you differences. Whether you do or do not employ LTO or other whole-program optimization vs. whether you do gets you differences.

Have you never debugged a race condition that worked on your machine but didn't work in prod, based only on how things ended up compiled in the final binary?

I'm not saying these are identical but there's a lot more similarity than you all seem to understand. And we've made compilers work well enough that a lot of you believe that they give very routine, very deterministic outputs as part of broader build systems even though nothing could be further from the truth, even today.


Are you arguing that the output of an LLM isn’t random?


It is random if you select it to be (temperature != 0, etc.).

It is not random if you don't use random sampling in its output generation.

It the whole thing were actually stochastic then prompt caching would be impossible because having a cache of what the previous tokens transformed into to speed up future generation would be invalidated by the missing random state.

Look at llama.cpp, you can see what samplers are adjustable and if you use samplers that employ randomness you can see what settings disable the random sampling. Or you can employ randomness but fix the seed to get reproducible results.


Yes, it can still be random with temperature set to 0. It'll only be the same if you run it on exactly the same hardware every single time.


An LLM is a set of structured matrix multiplies and function applications. The only potentially non-deterministic step is selecting the next token from the final output and that can be done deterministically.


Matrix multiplication on GPUs is non-deterministic. As are things like cumsum()

https://docs.pytorch.org/docs/2.11/generated/torch.use_deter...

This comes down to map reduce and floating point's lack of associativity. You see the same thing with OpenMP on CPUs.

People are constantly claiming determinism in LLMs that is just not there.


Even if it were reproducible, realistically most people are using some service like Claude that makes no guarantee that the model or hardware didn't change. Which is fine, it doesn't need reproducibility.

This is interesting though, I didn't know PyTorch had a debug mode for reproducibility.


Even with this debug mode, a different batch size can give different results for the same input - e.g. your tensor multiplies might use different blocking, hence different associativity.

I posted that to show that at a bare minimum, there is some pretty extreme nondeterminism (though probably mild in effect) in even the most pedestrian GPU inference, unless you go to the extreme of using the debug mode and taking the potential performance hit.


well just run all inference on the cpu, single threaded /s


random isnt the right term.

ill conditioned or unstable is better

a small change in the input can create a large difference in the output.


> And neither are LLMs.

This is not my claim, you're veering wildly off course here. I'm merely responding to the common, tiresome and, to be frank, stupid analogy of LLMs to compilers.


It's an abstraction for you, not the rest of that developer's team, who have to reproduce the same solution even after said developer has "won the lottery", so-to-speak.

inb4: "Don't worry, just use GPT to make the docs"


If you throw away the code then yeah, but I've never seen anyone do this.


but even if it didn't it still provided a binary that is mathematically proven (assuming no compiler bugs, which if found are fully fixable, unlike LLMs) to correspond to the code you wrote.


This is a great point. We’re very much in a transitional phase on this, but I personally do see signs in my own work with agents that we are heading toward the main deliverable being a readme/docs.

The code is still important, but I could see it becoming something that humans rarely engage with.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: