Hacker Newsnew | past | comments | ask | show | jobs | submit | joshka's commentslogin

But calling them out in a partisan may disincentivize half of the people to understand the issue.

If those people want to treat political parties like sports teams then they aren't likely going to contribute much to the discussion

I love not informing the electorate!

A large portion of that half will continue to want the wrong thing anyway.

Parties were not called out and a large amount of ensuing Othering is happening anyway. Arguably, that proves that the EFF was sound in their decision to mitigate that by not calling out the parties/politicians in hopes to keep the focus on the bill itself, doesn't it? I've long suspected that we humans tend to lose the plot so often because we want to immediately sort everyone into buckets as though compartmentalizing them brings about complete understanding of the issue on the table.

Same - I plugged it into ChatGPT to check if I'd missed something contentful. I hadn't really. Not news, more survey of things that matter a bit. If you know those things already then this is just fluff. Nothing about the future, more just here's some things I like.

A couple of things to consider:

Start with a set of work that's well specified and reasonably chunked to sizes that make it so that the review parts fit in the gaps of the working chunk and interleave (1-4 tasks of this). While the agent is working on the A task, you're reading the B task, only add a 3rd task if you're waiting on the agent excessively. Give yourself some slack on this to think about side quests (maintenance / tech debt / planning future work to your system etc.)

Make sure the handoffs are reasonably detailed (prompts / AGENTS.md instructions) - make your agents provide more context than normal - make them assume that you don't read all the text they spit out and need to be handed succinct summaries with information that helps reduce cognitive burdens (what I was doing, how I did it, what's next etc.)

It's reasonable to regularly ask the agent what things you've worked on choices made, options considered and discarded, context rehydrating. There's a lot of small tweaks you do when interactive prompting that get lost in a drift rather than captured as a neat single list. Make the agent help track that stuff. (at least prior to shipping, but often in intermediate steps.

Write flows that keep software in a buildable state so you can run it and queue up changes based on what you see. Avoid long periods of broken refactoring (caller code written before callee, deletion before add in a move etc.) Run quick checks (e.g. rust's cargo check) after each change, not at the end.

Correction of agent errors should end up as future steering. If the agent makes a mistake once it's the agents fault, twice it's your fault.

Leave time to stop and evaluate the current state regularly (where are we on the work). It's easy to mistake momentum for progress when you're the human part of an agentic loop.


> [US jurisdiction]: Anything in the result written by the LLM can not be copyright by anyone.

This is a bit stronger than the actual report where this has been discussed finds. See part 2 in https://www.copyright.gov/ai/ for details, but TL;DR, parts where humans have control over the expression may be copyrightable. But working out which parts those are is likely a difficult question (would likely require proof of provenance across many of those LLM sessions)


Tell me, can I create a copyrighted video that's not GPL licensed using ffmpeg? Now tell me how creating a rust library using the git test suite is different?

> using the git test suite

That's not actually the case at hand here - the agents were given the original source to reference: https://github.com/gitbutlerapp/grit/blob/main/AGENTS.md#sou...

But for the sake of argument: The test suite itself is copyrighted. To the extent the resulting work is a derivative of the test suite it is possibly infringing. For example you might example that the agent would derive variable names, function names, structure sequence and organization of the code from the test suite. It might even copy comments wholesale. Those are copyrightable things. (Which is of course just the first step in analyzing if it is infringement, there would be interesting fair use, de-minimis copying, etc arguments following a conclusion that any of those were copyrighted. A product produced this way definitely could be infringing given the right facts though).


> That's not actually the case at hand here - the agents were given the original source to reference: https://github.com/gitbutlerapp/grit/blob/main/AGENTS.md#sou...

yeah fair - the "The canonical Git source code we're targeting to replicate the functionality of is in the git/ subdirectory." part makes this hard to argue against.

> To the extent the resulting work is a derivative of the test suite it is possibly infringing

It's this bit that I have a problem with. If I run the test, it fails and reports a failure. Now I write code and run the test again. What is the theory there that code that I wrote infringes.

Simplify this down:

Assume the following is copyrighted:

    fn test_sum() {
        assert_eq!(sum(1, 1), 2);
    }
Does writing the following code:

    fn sum(a: u8, b: u8) {
        a + b
    }
infringe on the test copyright?

Writing

    fn sum(a: u8, b: u8) {
        a + b
    }
Doesn't infringe upon copyright period, because there's no creative element in that work.

Imagine a more substantial example though. Perhaps you have a test that checks that some file written in a binary format is correct, and gives names (creative elements) to each field of the format that it prints when you mess up the field, and has comments describing why the bytes are laid out like they are (the comments being copyrightable even if the facts they describe aren't), and the LLM copies those field names and comments verbatim... Now it's quite likely that the LLMs work is a derivative of the test suite.


> Doesn't infringe upon copyright period, because there's no creative element in that work.

There's likely a threshold at some point. It's helpful to look at a minima and then continue from there though.

I'm curious if there's case law that supports your assertions here?


For that assertion in particular I believe I'm practically parroting a ruling by the district court in Oracle vs Google about some extremely simple Java functions that Oracle claimed Google copied. Though I can't say I checked to make sure I'm remembering right.

You're recalling it right, but there's a nice quote from Judge Alsup in that case that talks about this exact situation:

> “So long as the specific code used to implement a method is different, anyone is free under the Copyright Act to write his or her own code to carry out exactly the same function or specification...”

Here given that this is rust and the original expression is C, the implementations cannot be the same by definition.


That's essentially the same thing as modding a game, though. I know there have been lawsuits to stop modding, but I don't think any were successful.

I'd challenge you here to think about this in terms of the legal aspects rather than reaching specifically for similarities as similar is often meaningless in the law or contracts when specific acts are codified rather than generalized ones.

I'd say what we're talking about here is probably a fair bit different to modding a game in most aspects.


I haven't followed any relevant cases but I would be surprised if there's any serious dispute that the common methods of modding games generally create derivative works. I think the dispute would be downstream of that as to whether or not the mods are covered by fair use.

If you did it in a loop until the test passed, maybe?

Your result is essentially impossible without the original. With ffmpeg, your result does not depend on ffmpeg specifically - you can use any video creation tool.


Repetition isn't really a factor in deciding whether something is infringing or not - check the copyright law in your jurisdiction. Here if you look specifically at what an LLM's sampling stage is doing, it's choosing non infringing tokens (i.e. rust source code) over infringing ones (i.e. C source code). So it's making an intentional choice to do something similar rather than creating something that has the same expression. That doesn't seem like it's copyright infringement to me.

A GPL tool that processes data doesn't virally transfer the license to its output. Copyrighted ffmpeg code isn't incorporated into the video output. The LLM didn't just conjure up equivalent behavior to git without ingesting the code and transforming it as new output. There is no other behavioral description that would reproduce all needed functionality.

> There is no other behavioral description that would reproduce all needed functionality.

Tests often are exactly the information necessary to understand exactly what the output should be. See https://github.com/git/git/blob/master/t/t0000-basic.sh for an example of how detailed these tests are.

It would be reasonable to point an LLM at these and use them with a basic knowledge of git to produce a rust version of git in a non-infringing manner.

If you did this manually it would take a long time.


Medium, substitutibility, basics of copyright law.

Fair point on medium - this was a lazy example.

Substitutibility probably doesn't apply here in the way you're implying and if it did it would likely be hampered by the 9th circuits findings about transformation in sony v connectix. Arguments here likely would look at rust not having a stable ABI, and hence not being inherently substitutable as a libray (grit-lib), less clear as an executable (grit-cli) on that side

basics of copyright law - the fundamental thing being protected is the expression... is a rust program's expression the same expression as a c program? I'd say generally not.


The test suite could test aspects of the architecture/design of the codebase that are not necessary for interoperability and constitute novel expression of a piece of software in a way that is not at all language specific.

By definition a test suite is about testing interoperability with the test suite. An HTTP test suite should likely test for whether response code 418 is implemented a particular way and while humorous it would still be an interop test no?

No, the git test suite is about testing the git codebase. If you want something like that, you need a conformance suite, which does not exist for git.

I suspect that the issue is more likely that the LLM code doesn't have an author and hence some parts of it can't be licenses, it's less likely that it's infringing on git's copyright for various reasons. (I am not a lawyer, but I do read copyright law for funsies).

https://www.copyright.gov/newsnet/2025/1060.html

> It concludes that the outputs of generative AI can be protected by copyright only where a human author has determined sufficient expressive elements. This can include situations where a human-authored work is perceptible in an AI output, or a human makes creative arrangements or modifications of the output, but not the mere provision of prompts.

Well that's interesting.


Also "just" the legal opinion of a government office. It has yet to be tested in court

why wouldn't it? If you run git through a compiler it's still copyright the git devs, same if you run it through an LLM.

What makes you think that's what the article says that it did? There's a lot of specific nuance and it doesn't say that anywhere. In fact it speaks of making a test suite pass only. This is the classic cleanroom bios from specs approach but no need to extract it as the test is available to run and there's nothing in the GPL that suggests that running a test suite infects software that you run it on.

Surely git’s source is already in LLM’s training corpus. So this is far from clean room approach.

You've read books and they are in your brains corpus. You only infringe copyright if you reproduce the same actual words from the books in your memory (and then do infringing acts defined by copyright laws with that output).

Here that's not happening. The code being produced by the LLM is Rust, not C.


I think you're saying that you don't believe in the freedoms to use the GPL licensed test suite for certain purposes which are explicitly allowed by the GPL.

You don't get to choose a license and then add extra terms to it when you don't feel like it's up to scratch. That's something explicitly not allowed by the GPL license.


Where does the GPL say you have the freedom to relicense code or derivatives under MIT by fiat?

Isn’t having to stay under the GPL a very big part of the GPL license?


> Where does the GPL say you have the freedom to relicense code or derivatives under MIT by fiat?

The first part of this sentence (where in the GPL) is unreached if the second part of it is unmet (relicense code or derivatives) which I contend it likely is. You're begging the question.

However:

> The output from running a covered work is covered by this License only if the output, given its content, constitutes a covered work

earlier:

> A “covered work” means either the unmodified Program or a work based on the Program.

It's that element that would be difficult to prove "work based on the Program"


Asking an LLM "here's a thing, rewrite it in Rust" is pretty clearly creating either a derivative work or a different form of the same work, just like asking a transpiler would.

There's no evidence that "here's a thing, rewrite it in Rust" is the technique Scott used here.

"here's a test suite, write code in rust that makes that suite pass" is reasonably supported by the article. That would likely not be a derivative work.



Ew. So it tells the LLM where the git source is for the thing they’re duplicating, but I don’t see instructions saying not to read or copy those files or algorithms.

I could have missed them. I didn’t read everything. I did some quick searches.

But the fact they’re not obvious is kind of troubling. Or that they didn’t just copy the tests and documentation for the LLM and not the source to prevent it from looking would hurt any case they had for clean-room privileges in my eyes, ignoring my other comment with concerns about using the tests at all.


If we assume an u licensed or MIT licensed test suite, an LLM could develop from that and documentation and you’d get something you could license MIT.

IMO, IANAL, etc.

And we’ll ignore the question of what the fact the LLM has certainly seen the git code during training means.

But the test suite would have to stay under the original license. And if you use a GPL test suite as they kernel to develop a program from can you license it non-GPL? I’d question that personally. Same acronyms above apply.


This is the exact thing I'm not sure about. See https://news.ycombinator.com/item?id=48470397 where I posit a simpler question: if a `test_sum()` function is copyrighted, does writing a `sum(a, b)` function infringe on the copyright of the software product that `test_sum()` is a part of. I'd say no. There's another part of the GPL that applies here:

> A compilation of a covered work with other separate and independent works, which are not by their nature extensions of the covered work, and which are not combined with it such as to form a larger program, in or on a volume of a storage or distribution medium, is called an “aggregate” if the compilation and its resulting copyright are not used to limit the access or legal rights of the compilation's users beyond what the individual works permit. Inclusion of a covered work in an aggregate does not cause this License to apply to the other parts of the aggregate.

So assuming that sum(a, b) is non-infringing and not combined to form a larger program (i.e. the tests aren't compiled into the grit code), then the GPL explicitly doesn't apply to this use


test_sum is assumedly relatively trivial. So as a lay person I’d expect some sort of obviousness test to apply. Like so much of the stuff in the Google/Oracle lawsuit.

But if you take all the individual tests used to test git as a whole, that seems far more unique. Seems like at that point you’re really having to duplicate the actual git internals, and that seems like it should be covered.


> test_sum is assumedly relatively trivial. So as a lay person I’d expect some sort of obviousness test to apply. Like so much of the stuff in the Google/Oracle lawsuit.

Feel free to extrapolate to the threshold where it's not and at that point apply.

> you’re really having to duplicate the actual git internals

Copyright covers the expression, not the method. So the Rust function:

    fn sum(a: u8, b: u8) {
        a + b
    }
is distinct from the C function:

    int sum(int a, int b) 
    {
        return a + b;
    }

That's not copyrightable because it's trivial.

Please feel free to strongman. Extend the argument until it hits a point of non-triviality and then apply reasoning.

This might be more suitable as a basis for this sort of thing... https://git-meta.com/


This feels like the sort of architecture that starts clean and then gradually grows most of the things a workflow-native system already has. I've seen systems like this, seen companies that are built out of this idea, and built small systems like this over time.

Once you need retries, backoff, timeouts, cancellation, versioning, visibility, task routing, rate limits, leases, heartbeats, stuck-worker detection, replay/debugging semantics, workflow migration, fanout/fanin, long timers, audit trails, and operator tooling, the “just use a database” story becomes “build a poor copy of a workflow engine plus a bunch of workers.” pretty quick.

That may still be a good tradeoff for many applications, especially if Postgres is already the core operational dependency. But the comparison shouldn’t be “database vs overcomplicated orchestrator.” It’s more like “what complexity do you want to own, and what do you want to buy / offload to a professional system?”


Yeah, we've observed that too: people start implementing their own retry logic, idempotency, etc. But then they grow a hard to maintain, complex stack that's not their core business logic. There's a reason why there is a dedicated team building DBOS, every day. Because it's not that easy to build a solid durable workflows engine on Postgres.


Comments like this by people who know exactly what they are talking about are why I love Hackernews



The SKIP LOCKED pattern is fine until the worker count climbs. Then vacuum can't keep up. Dead tuples pile up, visibility map turns to swiss cheese. Queue table is tiny on disk but the planner thinks it's huge and stops using the index. It gets ugly fast.


Ridiculously good analysis! HN is a national treasure because of posts like this.


What was so revolutionary to you in their post to cause you to describe it as a “ridiculously good analysis”?


Did I say revolutionary?


revolutionary, revelatory whatever. but no your comment was just empty platitudes

I suppose you're right. What I liked about it was this very specific graf, which gave me a lot to think about as a (potential) future implementor. It tells me this person has thought deeply about these issues and I feel like I have a much better grasp of the concept of a durable workflow than I did after reading TFA. Thank kindly for spending so much time on my comment.

> Once you need retries, backoff, timeouts, cancellation, versioning, visibility, task routing, rate limits, leases, heartbeats, stuck-worker detection, replay/debugging semantics, workflow migration, fanout/fanin, long timers, audit trails, and operator tooling, the “just use a database” story becomes “build a poor copy of a workflow engine plus a bunch of workers.” pretty quick.


Bingo, not even mentioning the blog post assumes all steps to be serializable.

I feel like this is the usual "just use postgres" garbage post that lacks any kind of nuance.

In fact you could replace that post with any other db and the statements keep being true, and naive.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: