More

dr_kiszonka · 2026-06-12T19:50:56 1781293856

Maybe it only occurs in certain browsers? It does in my Chrome for Android [...]

dr_kiszonka · 2026-06-11T20:06:54 1781208414

Qwen is a really good name.

dr_kiszonka · 2026-06-10T22:03:31 1781129011

Maybe with very fast models you could request animation frames, e.g., frame 1) right foot at 12, left foot at 6; frame 2) right foot at 3, left foot at 9, etc.?

And instead of reporting tps, you would - of course! - report pfps (pelican frames per second).

dr_kiszonka · 2026-06-05T17:10:27 1780679427

Does Mouseless support multiple monitors?

I have been trying out similar software for a few years but haven't seen one that would let me "click" outside the main monitor on Windows.

dr_kiszonka · 2026-06-03T08:41:43 1780476103

When would a larger model have a smaller inference footprint? If the larger was MoE and the smaller was dense?

Centigonal · 2026-06-03T16:10:04 1780503004

yes, MoE reduces the inference compute requirements (inference memory reqs remain the same)

rajveerb · 2026-06-04T03:13:37 1780542817

As someone who has spent quite a lot of time on inference, I would a add a small note:

Deployment looks very different for MoE than dense style models so I would say that it is more nuanced than "inference memory reqs remain the same". Memory can be very different for MoE style models.

dr_kiszonka · 2026-06-02T15:40:29 1780414829

This is great! I haven't heard of gmailctl before.

dr_kiszonka · 2026-05-31T20:54:54 1780260894

> than a structured collection of md files.

I think it depends on what a memory system includes. Those that automatically inject relevant information into context are in my experience better than just md docs because agents often ignore, forget to read, or don't read md files in full.

dr_kiszonka · 2026-05-31T20:45:48 1780260348

It isn't ideal, but I am starting to write code (with AI tab completions) while waiting for LLMs. The tab completions are sometimes overeager and I wish I had more control over them, but at least I am not staring at "Thinking" all day. Having said that sometimes you have to monitor AI because, e.g., AGY CLI, often goes off the rails completely, including writing code outside of the "sandbox."

nyxtom · 2026-06-01T02:26:50 1780280810

This right here ^ x 1000s. I can’t get into flow state with AI.

dr_kiszonka · 2026-05-29T17:15:44 1780074944

Thank you for explaining. Do you think there are still opportunities for stack optimizations to meaningfully speed up inference on single consumer-grade GPUs?

gaeld · 2026-05-29T18:13:32 1780078412

I'm sure there are, and I really hope we can work on consumer-grade GPUs at some point.

It should be possible to apply the same methodology (digging deep into the hardware details to understand all its little characteristics, and rethinking the inference stack around that).

dr_kiszonka · 2026-05-19T15:26:29 1779204389

For Codex, that is ChatGPT? https://openai.com/index/work-with-codex-from-anywhere/

Or do you want it to speak to you too? I think this would have to be TTS on your phone. You can have ChatGPT speak to you but I don't see that feature in Codex.

CSMastermind · 2026-05-19T16:11:22 1779207082

Sure I speak to ChatGPT all the time and I've used the feature you've linked but it can't do the things I described. It won't be like, "hey let me go look into that" and then come back in 3 minutes with an answer. It's essentially a dictation feature.

dr_kiszonka · 2026-05-19T22:23:02 1779229382

I am lost. Codex can't look up stuff for you in your codebase? GitHub Copilot can't look up PRs for you?