Attempts to impose always small PRs might get me to argue that optimal change sizes are normally distributed.
If you look at how simulated annealing (https://en.wikipedia.org/wiki/Simulated_annealing) is done, while the average size of jumps shrinks in the wikipedia annealing animation, there's always _some_ probability of large jumps in the optimized metropolis-hasting process as the jumps are still normally distributed just with shrinking variance over time.
"[I split a large PR into multiple] But also, I could not have developed the quotas feature in real life in that artificial order. The grants structure evolved as my understanding of pricing and quota enforcement evolved. The original quota semantics sucked, so I rewound back to the data structures, which affected how the pricing got imported, which changed how the quotas were stored. The code reviewers didn't have to worry about that but I did."
This is also one way LLMs are fundamentally different from prior language models which worked by searching over parse trees top-down or bottom up trying to fit independently evolved pieces. LLMs lay everything out in a large matrix of randomized weights and try to slide everything into place jointly.
This means organizing all the pieces well into a single context window unlocks a special AI power: to efficiently jointly converge these pieces to fit better with each other (like a smart human having loaded up on context would do). Splitting the work into multiple PRs or contexts might stymie this powerful aspect of AI.
It is a challenge and somewhat of an art to pack and organize the information in a context window to exploit this type of reasoning LLMs are made for.
One underrated reason that CLIs are often better than MCP is that Unix tools seem to have close to information theoretically optimal layout to enabled reasoning. They are concise, in the Solomonoff/Kolmogorov sense.
This means that the related parts in the inputs and outputs are recursively as close together as possible.
There's a reason humans don't type and read http/json on the command line. It's hard to read and reason over that type of syntax. json is made to be easy to parse for simple algorithms, not meant to organise info in an easy to reason about layout.
AIs benefit from the easy to reason about layout. It's not just about being able to fit in the context window but about the contents of that context window being organized such that the attention mechanism doesn't have to stretch itself out trying to connect far apart related pieces. It doesn't have to try to match brackets to disambiguate information. CLIs tend to use formats that are obvious at a glance for humans and LLMs alike.
Yeah I always wondered if I ever switched to solar panels, would there be a way to accumulate heat to be used in the Canadian cold months that have little sunlight? The closest I found was electric thermal storage based on heating bricks. They can accumulate more energy than water since they can go to higher temperatures. For example these say they go to 1300°F or 700°C https://steffes.com/ets/roomheater/ . They don't seem to have large models that could heat a house for months however.
This matches my intuition. Systematic misalignment seems like it could be prevented by somewhat simple rules like the hippocratic oath or Asimov's Laws of robotics or rather probabilistic bayesian versions of these rules that take into account error bounds and risk.
The probabilistic version of "Do No Harm" is "Do not take excessive risk of harm".
This should work as AIs become smarter because intelligence implies becoming better bayesians which implies being great at calibrating confidence intervals of their interpretations and their reasoning and basically gaining a superhuman ability for evaluating the bounds of ambiguity and risk.
Now this doesn't mean that AIs won't be misaligned, only that it should be possible to align them. Not every AI maker will necessarily bother to align them properly, especially in adversarial, military applications.
The US has had an unfair advantage in tech, defense, science and finance because it hosted the global hubs of the free world. This attracted eye-watering amounts of money to places like SF and NY. With the newfound isolationism, tariffs, threats etc. reducing the viability of hosting the global hubs, there's massive opportunities opening in europe and elsewhere, especially if governments can help bootstrap these sectors with efforts like these.
AI middle managers are coming. The highest-level corporate authority can and will continue to exist as a person that makes sure the AI systems are running correctly and skim profits off the top of the AI substructure, with the lowest stratum being an underclass precariat doing the hands-on tickets from an AI agent at a continuously adjusted market price for the task.
That doesn't make Hacker News European. It is American. Y Combinator is American even if pg is originally British. Stripe is American but its founders are Irish.
Yeah i know, my response was a clarification that BenoitEssiambre was referring to the founder, not the site itself. My interpretation of the "so there's that" part of the message, was an acknowledgement that Hacker News is hosted in US, but if nothing else the founder is living in UK.
It is a sad reality. The US has recently threatened to annex Denmark and Canada. Some of us are suddenly keenly aware that the US is in a position to take control of most of our computers and phones via software updates.
Open source is the global alternative you're looking for. There's even interesting hardware options like https://starlabs.systems/
The US also has had an unfair advantage in tech/defense and finance because it hosted the global hubs of the free world. This attracted eye-watering amounts of money to places like SF and NY. With this newfound isolationism, tariffs etc. reducing the viability of hosting the global hubs, there's massive opportunities opening in europe and elsewhere.
If you look at how simulated annealing (https://en.wikipedia.org/wiki/Simulated_annealing) is done, while the average size of jumps shrinks in the wikipedia annealing animation, there's always _some_ probability of large jumps in the optimized metropolis-hasting process as the jumps are still normally distributed just with shrinking variance over time.
"[I split a large PR into multiple] But also, I could not have developed the quotas feature in real life in that artificial order. The grants structure evolved as my understanding of pricing and quota enforcement evolved. The original quota semantics sucked, so I rewound back to the data structures, which affected how the pricing got imported, which changed how the quotas were stored. The code reviewers didn't have to worry about that but I did."
This is also one way LLMs are fundamentally different from prior language models which worked by searching over parse trees top-down or bottom up trying to fit independently evolved pieces. LLMs lay everything out in a large matrix of randomized weights and try to slide everything into place jointly.
This means organizing all the pieces well into a single context window unlocks a special AI power: to efficiently jointly converge these pieces to fit better with each other (like a smart human having loaded up on context would do). Splitting the work into multiple PRs or contexts might stymie this powerful aspect of AI.
It is a challenge and somewhat of an art to pack and organize the information in a context window to exploit this type of reasoning LLMs are made for.
reply