More

grayhatter · 2026-06-26T05:04:12 1782450252

Neither matter. The maintainer isn't required to adopt how they want to protect their repo, and lower the amount of value they perceive it to have. They can reject any patch for any reason, or no reason at all. No one is owed any control over their repo. If you want your code in the repo, you fork the repo.

The hurdle you have to clear as the devils advocate, is what rules is the maintainer allowed to enforce about their repo, and what are those limits. The maintainer doesn't want to introduce LLM generated code. Until you solve for that, nothing else matters.

uHuge · 2026-06-26T05:19:34 1782451174

Such ultra-liberal approach seems to project unfriendly collaboration culture..?

It seems to incentivize the contributor to try our proposing their patch multiple times repeatedly with varied introductory words just to test if the maintainer had a good day to accept the improved code.

geocar · 2026-06-26T05:33:42 1782452022

No.

The “contributor” doesn’t have the ability to contribute; They do not have copyright over the code so they can’t assign it.

The maintainer should not read it because then they could be tainted and perform accidental copyright infringement in the future.

grayhatter · 2026-06-26T05:23:43 1782451423

> It seems to incentivize the contributor to try [to do something unethical]

I find the argument that the act of enforcing rules to protect the things that you want to protect, as the thing that incentivizes someone else to lie, stupid.

If I tell you no, I'm not encouraging you to try to figure out how to get around what I want. Why do you feel saying no counts as encouragement to ignore or evade it?

grayhatter · 2026-06-26T04:48:47 1782449327

I have an exceptionally strong, visceral, negative reaction to people who aren't offended by the arguments the author makes in this post.

Your patch was rejected because the maintainer objects to the source and tooling used to generate the patch. If you agree with the maintainers opinions or object because you wanna do it your way, does not matter.

Honesty didn't get your patch rejected, root cause analysis shows the origin of the rejection was the patch was LLM generated. If the author had decided to lie, but the maintainer still knew it was LLM generated, it would still have been rejected. Honesty isn't implicated at all, and framing it as such is also dishonest.

The title of the post can only exist if the author would gladly lie to get what he wanted ignoring the others involved in the process. That behavior is extremely disgusting.

> I don't care about what you want, so I'll gladly lie to you about my submission so that I get what I want... what you care about, and what you want don't matter!

-- Przemysław Alexander Kamiński, presumably?

I'm embarrassed by proxy that the author^ was willing to write this, and then publish it on the internet. Because this kinda behavior makes all of us working in and around software look bad. Please, adopt some personal ethics that include consideration and respect for others, and expend even a basic about of thought into if you're treating other humans with said respect. Because reading this, you're obviously not.

josteink · 2026-06-26T05:01:28 1782450088

He made a patch in good faith, not knowing about these rules.

He’s point is that because he was coming at this with an honest, open approach he saw his work rejected.

His observation is that this will reward dishonest submissions which are NOT made in good faith. Ie rewarding the wrong things.

Incentives drives the outcome. What incentives does this give people?

geocar · 2026-06-26T05:37:38 1782452258

> not knowing about these rules.

He didn’t look at the t&c for whatever model he was using and didn’t understand he had no copyright claim over its output?

He doesn’t have any rights to the code; he can’t assign them to the FSF.

> What incentives does this give people?

Hopefully to learn how to code so they can make their own contributions.

grayhatter · 2026-06-26T05:12:22 1782450742

> Incentives drives the outcome. What incentives does this give people?

I understand this argument to be, if you stop people from doing something you don't want them to do to you, you're only incentivizing them to lie to you, before the do that exact same thing to you.

Is that the argument you're trying to make? Because I don't think the solution to wanting to exclude LLM codegen, is to ... not reject LLM generated patches because it might force other people to lie to get around the exclusion.

But, just to be clear, I think the argument that enforcing rules would induce someone to lie, is an insane argument to try to make.

josteink · 2026-06-26T05:53:23 1782453203

I think a better frame would be «how could the maintainers have responded in a constructive, collaborative way upon learning about the tooling not being compliant with Emacs-standards, in a way which have helped land what was clearly a good faith effort aiming to make Emacs better?»

Outright rejecting the patches was IMO not a pragmatic or constructive choice and will drive the wrong incentives wether you morally approve of it or not.

cenazoic · 2026-06-26T14:39:06 1782484746

The patch author, while claiming LLM discussions are (implied "hidden") in "internal mailing lists", didn't link to the "constructive and collaborative" conversation that indeed occurred (publicly). That seems a sort of dishonesty itself.

https://lists.gnu.org/archive/html/emacs-devel/2026-06/msg00...

In a different thread about a different LLM-generated package, here's RMS himself:

> • As a disclaimer it was generated with LLM. I think ELPA has a > • prudent policy to not allow this for now until there is legal > • clarify, but does this apply to nonGNU-ELPA also?

RMS: It does apply, for the time being. It is not a final decision, it is tentative.

(link: https://lists.gnu.org/archive/html/emacs-devel/2026-05/msg00...)

GNU (and/or the Emacs project in particular) does not seem to be acting in ludditic or reactionary ways, but thoughtful ones.

TurdF3rguson · 2026-06-26T05:02:15 1782450135

Entertaining rant, but he never said he was willing to lie, he was offering a hypothetical.

grayhatter · 2026-06-26T05:08:41 1782450521

> but he never said he was willing to lie, he was offering a hypothetical.

I disagree with your assessment.

> First of all, I could’ve hidden the fact of LLM usage, and yet decided to declare it explicitly. By being truthful I already lost my footing. This alone makes the policy stupid. If admittance is punished it’s better to push submissions without admitting. It punishes integrity, not usage per se.

The author values getting what he wants over interacting fairly and honestly with others. Literally saying "it’s better to push submissions without admitting [the truth]".

I think your predictions are inaccurate, but will gladly acknowledge the facts do show the author decided not lie this time (perhaps because it was too late he already admitted and would have lied if he knew about the policy. Which does seem more inline with the recommendations in post). Unfortunately he was still willing to advocate and argue that honesty was the root cause reason his patch was rejected. I think generally speaking, you wouldn't willingly encourage others to lie if you weren't willing to lie yourself, would you?

TurdF3rguson · 2026-06-26T06:48:20 1782456500

I don't think it's worth it in this case, it's certainly not clear to me who is in the wrong here. Lying is obviously problematic, but like I said, he didn't lie - it was purely hypothetical.

grayhatter · 2026-06-25T20:55:52 1782420952

Your example is incorrect. @ptrCast has the same (similar, if you want to be pedantic to the exclusion of good faith) result rules. If you need @as to @ptrcast, you'd need it to @bitCast as well.

> It is significantly worse to take address and deref afterwards.

How are you measuring worse? Because my understanding from the article is that's exactly the behavior @bitCast used to have. So, instead of worse, it'd be exactly the same?

If you mean it's simply more things that you have to type... You're describing a core language feature as "worse". For all the builtins, some of them can help the compiler emit better code, but can for some doesn't mean will for all. As an example

    const thing: f64 = @floatFromInt(int_ish);
    const result = thing + other_float;
    return @intFromFloat(result);

Could zig auto convert between these types? Yes, absolutely. But it doesn't as a design decision. On some arch, converting between float and int can be very expensive. A competent engineer will ensure they're type converting in a reasonable order. Zig requires this painfully verbose syntax it order to make it painful. Are there times where it's is actually the only reasonable option? yes, but even if there wasn't it'd still need to exist because I'm not rewriting my whole program to avoid a single float conversion. But because it's a bit painful, I will rewrite this one function to make it less painful.

And, yes having already made that exact mistake... I now write better code from the start because there's no way I'm gonna ruin all my beautiful code with a bunch of ugly, annoying, hard to read, casts.

I used to complain about unused variable errors, unhandled enum branch, var unmodified (hint: use const) errors, hell even result ignored or error ignored when I'm trying to test some unrelated single line of code. But now that I'm used to them, I emit better code without thinking. It's made me a better programmer. Is it annoying? abso-fucking-lutely but I'm better now than I used to be, so: worth it; and: thankyou sir can I have another. :D

grayhatter · 2026-06-25T19:53:00 1782417180

How confident are you? I ask because I'm a zig zealot, and am constantly shilling for it. But I disagree with a number of ark's positions, and think of him as a bit of a shitter... So I don't think "cult of personality" accounts for it, despite how easy it would be for someone to be able disregard zig if was just a personality cult.

f13f1f1f1 · 2026-06-25T20:06:02 1782417962

I have no issues w/ the zig language and I'm not saying that's the only reason why people talk about it. There is however a subset of very vocal people who will go out of there way to bring stuff up and push something if they do see that is a part of it. Not that it's the only reason why either, just additional motivation for people to go out and push it that otherwise you might not get. All you need is a couple people who view that as a kind of campaign and they can radically increase the visibility of something on the internet, and turning programming that has some broader social or moral thing related to it even just through the creator is a very easy way of doing that. Rust has a similar thing. I don't view the instinct that leads to language zealotry or zealotry related to social issues(or say religion) being that distinct and it's probably a similar personality trait that encourages both, and it's generally one I find unpleasant regardless of the particular content. FP can also lean in that direction. If you get some narrative you can say this language fixes stuff in a fundamental way + also can appeal to the social thing it just riles people up who will go around talking about it online non stop.

grayhatter · 2026-06-25T19:48:35 1782416915

I can only answer for me, and while I do think it's more significant a metric for me, I equally assume it probably has some influence on others as well.

C3 uses :: for namespaces, that makes it a competitor with C++ more than C. Equally Odin's syntax is more at home among python, not systems programming.

The appeal of Zig is it feels like C. To many people, this is a downside. C is very very scary to them. But for people who feel at home in C, it's not a downside.

Additionally, the selling point for both are "c replacement" where the selling point of Zig is "good systems programming language" C is only mentioned by it's users as a heuristic.

If 2 groups are trying to replace a language that people are running away from, and that's their best selling point... I'd assume they're less likely to be as successful as a different language just trying to be as good as it can be.

I've even stopped comparing Zig to C, IMO, it does a disservice to both. And I say that as someone who likes C.

Full disclosure, I need to spend a bit more time with both odin and c3 to know exactly how this compares. But the reason I keep writing Zig, and still love it, is how simple it is. Zig is aggressively insistant on simplicity at the expense of functionally or comfort. The only other high level language I know of that is as aggressive about it's design simplicity is infact C. While I assume it's an accident when C does it, it's definitely not an accident in Zig.

ulbu · 2026-06-25T23:06:23 1782428783

C3 is a contender to C++ because of its namespace operator?

grayhatter · 2026-06-25T15:09:55 1782400195

> Quite long devlog coming up, apologies—I got a little carried away with this one!

mlugg, please don't apologize for creating something I actually want to read. I'm drowning in low effort garbage, the in depth technical explanation is a refreshing breath of fresh air.

Might as well apologize for creating a language without a garbage collector, sure most people are unwilling to think, but some of us like nice things and are actually willing to apply effort.

mlugg · 2026-06-25T16:27:54 1782404874

I appreciate the kind words :)

grayhatter · 2026-06-25T20:22:07 1782418927

BAH! and I forgot to say the most important part.

Much more important, thanks for not just the devlog, and explaining the changes. But also; thanks for fixing/improving this!

I appreciate all the work you've put in, I really enjoy watching the the language I like constantly improve.

scuff3d · 2026-06-26T02:54:16 1782442456

Why I've moved more to a couple of language/software dev discords and away from Hacker News. Way too much uninteresting AI nonsense on here for a while now.

ind-igo · 2026-06-26T04:01:05 1782446465

Would like to join as well if you're willing to share

scuff3d · 2026-06-26T16:40:09 1782492009

Usually I bounce between the Zig and Odin discords, as well as the Handmade Network discord.

jeffbee · 2026-06-25T15:11:27 1782400287

It wasn't even long! It seemed much shorter than the typical LLM-expanded drivel that crosses the HN front page daily.

frail_figure · 2026-06-25T15:49:25 1782402565

[flagged]

grayhatter · 2026-06-25T16:14:59 1782404099

ryan_n · 2026-06-25T17:52:53 1782409973

Think theyre just implying that the quoted text comes off as a bit pretentious..

grayhatter · 2026-06-25T19:34:09 1782416049

oh, I think it's mostly frustration over how eager everyone is to delegate their thinking to literally anything else, accelerated by [gestures at reality]. Is frustration with apathy really pretentious?

grayhatter · 2026-06-25T05:33:34 1782365614

Oh no, someone is profiting of the work of others?!

anyways...

grayhatter · 2026-06-24T20:56:22 1782334582

It's really impressive how you could apparently argue so strongly for Fender's defense: but the message that I took away was Fender is obviously the bad guy in this story, and I want nothing to do with them... and I haven't even clicked on the new tab for the story yet.

grayhatter · 2026-06-22T19:53:06 1782157986

> Eh, it depends on the workflow. Especially if you have certain stack based workflows.

I would normally assume there's 0 percent chance that `git` (the binary) is a significant impact on LLM based devel. The same applies to git, the protocol/format/tree.

I'd love to hear about what makes the workflow you have, where any part of git becomes a noticable proportion of the process? Unless you mean your LLM just can't figure out how to use git?

grayhatter · 2026-06-20T02:56:21 1781924181

> Hallucination rate scores are a little tricky to interpret because they're conditional on the model not knowing the answer. That means they don't measure the probability of your encountering a hallucination in everyday use, since that also depends on the probability of the model not knowing the answer, as well as how well your distribution of tasks aligns with the distribution tested in the eval.

Do you have a cite for this?

If a human makes up some bullshit lie, I wouldn't accuse them of making it up only if they actually knew the correct answer. If you don't know, the only correct answer is I don't know. Any other answer is made up bullshit. Why is it only a hallucination if and only if the LLM contains the answer? If you make something up it's still wrong. It shouldn't matter if you could give the correct answer. You didn't, and instead invented some bullshit instead?

Follow up question, how can I apply this rule set to the next test I have to take? I'd love to be able to use "I didn't know" as the excuse for why I made something up.

edit:

> and it's not totally clear that this is the main metric that's worth tracking.

I don't know, the rate at which some model is willing to make up something feels useful. If the argument I see repeated on HN so much is that it's impossible to completely get rid of hallucinations; being able to choose a model that's less likely to invent some lie seems like a positive trait, no?

Either way, I'm happy to agree that a restrictive definition, where a lie doesn't count as a hallucination iff the model doesn't know the answer feels strictly, infinitely less useful than an exact error rate. What percentage of emitted tokens are misleading would be useful for me. Anyone know any group that's attempted to quantify the global error rate?

aesthesia · 2026-06-20T05:18:08 1781932688

This isn't quite the point. When comparing two different models' hallucination rates, the denominator is different. The evaluation works more or less like this: for each question, the model has the option to answer or abstain, so there are three possible outcomes: the model answers and gets it right, the model answers and gets it wrong (hallucination), or the model abstains. The hallucination rate is (model answers wrong) / (model answers wrong or abstains). So if a model A has 50 correct answers, 20 incorrect answers, and 30 abstentions, its hallucination rate is 40%, while a model with 20 correct answers, 20 incorrect answers, and 60 abstentions has a hallucination rate of 25%, even though it hallucinated exactly the same number of times. This is why hallucination rate is incomplete as a metric: it says nothing about the accuracy rate.

grayhatter · 2026-06-20T18:41:17 1781980877

The way you define the evaluation criteria seems very problematic[^1].

I don't understand the point of describing it as 3 possible outcomes. I objected to it because the only reason I would do something like that would be to obscure the severity of the model defects. I'm sure I'm missing something, but the reason I suspect that's how it's done, is to [intentionally] obscure the actual meaningful metrics.

I would expect any engineer to evaluate any model using accuracy, (error rate), and usefulness (definitive answer rate), as strictly independent metrics. Did it answer, and if it answered, did emit incorrect or misleading information and how many quantifiable bits of each.

The false negative rate (model confirmed to contain the requested output/information via other method but was unable to for the given test) is significant, but given a non-definitive answer is significantly different from a definitive and incorrect answer. Why would you want to group hallucinations?

Number/rate of useful answers (correct and incorrect) and error rate (given any answer how often will that answer be defective in some way).

To be clear, I'm differentiating hallucination rate from eagerness to answer, even though they're obviously linked because I believe presenting 20 correct answers, 20 incorrect answers, and 60 abstentions as a hallucination rate of 25% as obviously malicious. If I give you 40 answers, 20 correct and 20 incorrect. the error rate is 50% and if it refused an additional 60 times, it's usefulness rate would be 40%... arguably 20% depending on how strict you choose to be about the definition of useful. The matrix we should be using is a 2x2 true positive, false positive, true negative, false negative. But being that honest that might make the model look bad!

[^1]: just in case it's unclear, I'm using you exclusively rhetorically. I don't think you personally are being misleading, only that you're explaining how it's done... but that's the problem isn't it.

jpalomaki · 2026-06-20T10:36:40 1781951800

As human I also give wrong answers if if I know the right one. Sometimes I also give answers even when I don’t really know them.

When pushed, I then start thinking and realise my mistake. System 1 vs 2?

grayhatter · 2026-06-20T14:55:41 1781967341

That's weird, why do you do that?

When someone asks a question, if I don't know the answer; I say I don't know.

System 1 vs 2 doesn't really matter... I won't use an LLM that's willing to make up random shit. Equally I also won't work with a human who does that. Trust and confidence a system will function correctly is an important quality, in both humans and genai

big_paps · 2026-06-20T16:53:31 1781974411

I realized that people from india often show this kind of behaviour in my experience . They superconfidently give you a wrong answer and walk away or even help by making things worse and then dissappear shrugging .. Are you from india ?

luuundonjk · 2026-06-20T07:58:47 1781942327

there is a difference between a human knowingly bullshitting and being confident because he misremembers something

master-lincoln · 2026-06-20T09:54:57 1781949297

there is a difference in their intent, but not necessarily in the effect.

sgc · 2026-06-20T04:11:31 1781928691

Since models just output the the most probable tokens and you can never accuse them of doing anything other than making it all up, I would like to see these tests run with a prompt that attempts to mitigate hallucination and finishes with something like: "Telling me that you don't have the relevant information or that the task is impossible is extremely useful to me and a valid answer", and see how much that changes the scoring - as well as the usefulness of the answers. There are so many skills like context7 that can be tweaked to improve these results as well.

In other words, you shouldn't choose the model that hallucinates the least without detailed prompting, since a well-crafted agents.md clause should go a long way to improving output, and almost certainly the top scoring order will be different. To the point that I don't find this type of raw comparison useful beyond maybe 'make sure you test that one with more explicit prompts'.

grayhatter · 2026-06-20T04:41:42 1781930502

> In other words, you shouldn't choose the model that hallucinates the least without detailed prompting

You're prompting it wrong is quickly becoming the new, you're holding it wrong.

It's wild how willing software engineers are to blame the user when the actual problem is their own defective design.

Ideally we all, as an industry, will stop accepting this as reasonable excuse for the demonstrated incompetence

ordersofmag · 2026-06-20T12:35:02 1781958902

It's not that you're prompting it wrong. It's that you're judging the output against a standard (human intelligence) that just isn't relevant--no matter how much we want it to be and no matter how much the fluency of the output tricks us into thinking there's a human-like mind behind it.

Now granted, if the boat salesmen were pushing hard on the idea that the boat would fly and even put little wings on the side and I bought the boat I might get really angry when I found out that it didn't fly. And I might angrily storm into the salesroom yelling about how the design is defective. But if someone pointed out 'hey, it's a boat perhaps you should stick to sailing around in it and stop getting your undies in a bundle about it not flying' the correct response is probably to take a closer look, ignore the salesmen, and cruise around the lake. LLM's are quite handy at some things and have some weird limits. Learn the limits, enjoy your time at sea.

grayhatter · 2026-06-20T14:51:35 1781967095

> It's not that you're prompting it wrong. It's that you're judging the output against a standard (human intelligence) that just isn't relevant

It's not that you're holding it wrong, you're just wrong for expecting it to work the way it's described (able to one shot most problems these days).