More

legohead · 2026-05-29T23:56:25 1780098985

I guess I'll play devil's advocate here, don't shoot me.

Over the course of my career I've had to deal with multiple hacks, DDOSes, and even situations working with the FBI. It's a mess, and extremely frustrating and unfair to those of us who are just trying to do a good job and make a living. Those of you who are throwing stones at Microsoft's coding, how confident are you that your code is safe from this new AI age?

Obviously MS handled this poorly, even after reading this article it's not clear how MS handles bug bounties. But that doesn’t mean this “researcher” deserves a pass.

Releasing 0-days, especially working exploit code for unpatched vulnerabilities, is extremely unethical. It has real potential to cause a lot of harm to regular engineers, and users who had nothing to do with the dispute.

nemomarx · 2026-05-30T00:06:17 1780099577

I don't think it's their fault for not making code without exploits. I do think they should try and close them in a timely fashion when the exploit is pointed out though - the longer they wait the more chance bad actors find it in addition to the security researchers. Ultimately they need to cooperate here for users to be safe.

rileymat2 · 2026-05-30T03:17:50 1780111070

> I do think they should try and close them in a timely fashion when the exploit is pointed out though - the longer they wait the more chance bad actors find it in addition to the security researchers.

You are assuming it is not already being actively exploited and there will be a timely response to fix it, which is why we have these ticking clocks.

thewebguyd · 2026-05-30T02:20:00 1780107600

They should also be fully transparent and not silently patch, and only issue a CVE weeks later after being called out like they did with RedSun, from this same researcher.

That Microsoft releases vulnerable software isn't the issue (that's a known quality at this point), it's their lack of transparency and refusal to hold themselves accountable.

cdud3 · 2026-05-31T07:56:40 1780214200

Putting it out from "only a small group of people/companies exploit it" to the public is the way how you get it fixed. In this case it seems that was the only way that was left after Microsoft refuses cooperation. What counts are the results: This are fixed now.

legohead · 2026-05-12T17:16:51 1778606211

the new bottleneck for development at work is code reviews. devs are creating whole features that would take months in only a couple weeks, but code reviewing that is a slow, painful process

Marsymars · 2026-05-12T17:26:16 1778606776

The bottleneck at my work for development was already code review before LLMs.

asdfman123 · 2026-05-12T17:34:44 1778607284

This is why I'm not that excited about vibe coding. The bottleneck has always been understanding what the heck is going on.

In my view you should 1) use AI as a tool to help you learn and 2) write boilerplate you could have easily written yourself. Getting it to think for you is counterproductive (at least until it replaces us entirely).

legohead · 2026-05-04T20:52:00 1777927920

The low latency is more of a pain point than a good thing, the way they have it implemented. Trying to have a casual conversation with it, as humans we naturally pause, and GPT will take this as you are "done" and start blabbing away.

I also suffer from finding the appropriate word I want as I've gotten older and slower, and this fast-voice-gpt just ends up frustrating me more than helping. I have to sit there and think out the whole sentence in my head before I say anything -- not very natural.

zamadatix · 2026-05-04T21:04:44 1777928684

I think these are 2 different layers of "latency". The latency in the article is referring to the transport of the audio stream itself while the latency in your scenario is about how quickly to start responding inside the audio stream.

hun3 · 2026-05-04T23:53:00 1777938780

They are orthogonal.

Suppose you have 100ms audio latency and no wait time. Then, natural pause will trigger response immediately but you won't notice it has started until after ~200ms (round-trip time). Twice as annoying.

ericmcer · 2026-05-04T23:06:20 1777935980

I think he’s saying they are doing an insane level of complexity to shave ~100ms off response times in a scenario where that isn’t important and might even be a negative

zamadatix · 2026-05-04T23:27:13 1777937233

When GP mentioned reducing conversational latency as a negative that made sense (and should probably be done IMO), it just wasn't the same category of latency the article talks about reducing. I.e. increasing "network latency" just makes the conversation feel more and more out of sync, it doesn't change the rate at which the AI will interrupt ("turn latency") because the latter is based on the duration of the pause in the audio stream, not the duration it took to deliver that audio stream.

If you meant there is a case where reducing the network latency at the same delivery reliability for a given audio stream is actually a negative then I'd love to hear more about it as I'm a network guy always in search of an excuse for latency :D.

svnt · 2026-05-05T16:13:24 1777997604

> If you meant there is a case where reducing the network latency […] is actually a negative then I'd love to hear more about it

That is exactly what parent did:

> they are doing an insane level of complexity to shave off ~100 ms

The downside is everything they had to do to achieve it, and maintaining that capability going forward, when the product can tolerate much more. It is the definition of premature optimization.

It just maybe isn’t at a level where it is relevant in your argument/decision space in IT.

theptip · 2026-05-05T03:58:23 1777953503

By you want to be able to interject “hold on…” and have it immediately stop talking, when it goes off the rails.

And GP is correctly pointing out that the only negative here (silence waiting latency maybe being too low) is tunable separately from the network latency number.

ButlerianJihad · 2026-05-05T04:04:04 1777953844

I want to be able to click the "Stop" button on my earphones remote. I want to be able to interject "woah" or "stop!" or "wait!" or that it would detect that I've inhaled a breath, or that my eyes glazed over. I want the LLM to figure out that every speed setting for its voice output is in "auctioneer" territory rather than "lecturing university professor with tenure and a pension" pacing.

But we won't get any of that, because the prime directive of LLMs is to burn tokens like there's no tomorrow. Burn tokens on a naïve answer without asking clarifying questions. Burn tokens on writing, debugging, and running a Python script or accessing and parsing 10 websites without asking for consent. Burn tokens on half-baked images with misspellings and 31 fingers. Burn tokens arguing "how many 'r's in strawberry?". Burn tokens asking a followup question at the end of every single answer, begging the user to re-engage and burn more tokens.

There is a little red "Stop" control when text output is being produced, at least, but does "Stop" halt everything and throw away the context? Re-prompt from the beginning?

The "maximize tokens burnt" prime directive is not to be found in any system prompt or user documentation. It is seemingly a common feature of the training for any consumer model.

Currently, if I'm using voice for an LLM, I use the voice dictation in the keyboard feature, because then the response is in text. There is no way to prevent "responding in kind" if I query the thing with audio. Or in Swahili.

spongebobstoes · 2026-05-05T09:59:13 1777975153

newer models tend to use fewer thinking tokens to solve the same problems, and is a strong counterexample to your entire comment

ericmcer · 2026-05-05T16:45:46 1777999546

you actually don’t want it to immediately stop because people say things like “hm” “yeh” during machine output. Maybe you say “no” to someone next to you and don’t want to interrupt output.

To confidently interrupt I would want to assert that the user has been speaking for > N time. You could do other things like parse a streaming transcription for keywords but generally it feels like bad UX to me to cut output the second input is detected. Letting the user talk for 1-2s gives a much stronger signal and it isn’t too weird for someone to keep talking for 1.5s after you start.

janalsncm · 2026-05-04T21:29:38 1777930178

I’ve also experienced this and it’s really annoying. There is this pressure to keep talking if I’m not done with my thought that feels pretty unnatural at least for me. If I’m searching for the right word, I want the opportunity to find it.

I think the solution is to handle pauses more intelligently rather than having a higher latency protocol. With low latency you can interrupt and the bot can immediately stop rambling.

wnmurphy · 2026-05-04T23:18:18 1777936698

100%. I have to hold the floor by filling the space with "ummmmmmmm.... uhhhh...." which inevitably distracts me from my point altogether. Poor user experience.

Gracana · 2026-05-05T01:35:41 1777944941

Seems like there's a big risk of having that habit leak into human conversation. A lot of people try really hard to train themselves not to add those fillers.

taneq · 2026-05-05T01:21:09 1777944069

I find this is a problem even with human conversations. Some people just aren’t very good at telegraphing when they’ve finished ‘their turn’ talking. Or worse yet, aren’t willing to take turns in the first place.

discordance · 2026-05-05T00:12:10 1777939930

Have you tried telling it to pause to let you think?

I often use it while I’m walking and tell it to not respond until I initiate a conversation.

pottertheotter · 2026-05-05T00:40:19 1777941619

I’ve tried this and it says it will but just keeps cutting in. I hate this feature so much.

650REDHAIR · 2026-05-05T04:27:44 1777955264

If anyone has an alternative I’m all ears.

This would be a killer feature for me and something I’ve tried to use on cross-country road trips.

notrealyme123 · 2026-05-05T06:58:31 1777964311

I know it's not the perfect solution for you, but I use a voice recorder and send the LLM the transcript. And my god is it working great.

Usually I just explain the things I want it to do. The longest was 30 minutes rambling of explaining the methods section of a paper in non chronological order. It worked unbelievable good for me.

momotheritz · 2026-05-05T12:12:23 1777983143

If you're setting this up yourself instead of using a lab's built-it speech functionality, you can run a small LLM in parallel, on a local model or small model like Haiku, that acts as a gate for either doing TTS on the response or not. Its only job is to decide if the transcription it receives is of someone being done talking or if that person is likely to still be mid-thought or mid-sentence.

throwuxiytayq · 2026-05-04T21:25:26 1777929926

With higher latency this would be even more of an issue. When you pause and start talking again, the model wouldn't catch that until it has already interrupted you.

The actual implementation is at fault. I had some luck with instructing the model to only respond with "Mhm" until I've explicitly finished my thought and asked it a question. Makes this much less of an issue.

But I've decided that their voice mode is completely unusable for a different reason: the model feels incredibly dumb to interact with, keeps repeating and re-phrasing what I said, ends every single answer with a "hook" making the entire interaction idiotically robotic, completely ignores instructions when you ask it to stop that, and - most importantly - doesn't feel helpful for brainstorming. I was completely surprised how bad it is in practice; this should be their killer app but the model feels incredibly badly tuned.

dtran · 2026-05-04T21:58:25 1777931905

This has more to do with Voice Activity Detection (VAD) than the latency described in the article

lxgr · 2026-05-04T23:08:14 1777936094

That seems to be the issue: VAD is insufficient here.

Knowing when to respond requires semantic understanding, which probably only the model itself is capable enough.

Maybe it’s hard for them to train it to only respond once it seems appropriate to do so?

Sean-Der · 2026-05-04T23:35:24 1777937724

I am excited for VAD to go away. PersonaPlex totally seems like the future.

However things like 'Call center helpline' turn based actually seems better! You don't want to be interrupted when giving information back and forth (I think?)

wnmurphy · 2026-05-04T23:18:52 1777936732

Exactly. It's a tangent, but clearly a pain point for enough users.

ehnto · 2026-05-05T06:06:50 1777961210

There's a really interesting project in Japanese natural language processing called J-Moshi that had a novel approach and in my opinion good results.

They tried to make it mimic the way Japanese is full of really quick acknowledgement sounds and it seems to allow it to handle those pauses and interruptions really well.

https://en.nagoya-u.ac.jp/news/articles/say-hello-to-j-moshi... (english)

https://nu-dialogue.github.io/j-moshi/ (japanese and english)

I must admit it's a bit weird when LLMs laugh, I don't really know how I feel about that but it seems to laugh at the right times. Very tangential, but cockatoos have been known to mimic the right time to laugh presumably based on tonal cues that a joke was just made (I have experienced this first hand with rescue birds who li e amongst humans)

saturdaysaint · 2026-05-04T21:24:59 1777929899

In voice conversations I tell it not to reply at all or only say “Understood” until I use some kind of code word. Not perfect, but less intrusive.

jdironman · 2026-05-04T22:40:40 1777934440

Roger that, over.

futureshock · 2026-05-05T07:55:31 1777967731

Reducing the network latency helps with this exactly. OpenAI can make better timed decisions when to begin responding so it'll feel less like an interruption. I've also seen some research on full duplex voice models that handle interruption more like an organic conversation and low latency will help there as well

richardw · 2026-05-04T21:05:29 1777928729

Hard problem. I find myself adding in filler to stop the thing from jabbering.

I also think it spends most of its iq on sounding good rather than thinking about the problem. “Yeah absolutely I can see why you’d like to…” etc. This is likely because it’s on a timer and maybe voice is more expensive to process? Text responses spend more time on the task.

lxgr · 2026-05-04T23:10:26 1777936226

Their voice capable model is several generations behind the state of the art text-only one, as far as I know.

I don’t think it even has reasoning tokens, so it’s no surprise that it’s as most as smart as the “instant” models (i.e., not very).

asdfman123 · 2026-05-04T21:55:22 1777931722

Fwiw you can prompt it to respond differently to you.

jameshush · 2026-05-05T01:10:05 1777943405

This is more of a VAD/turn detection issue. It's gotten a lot better over the last few years, but it's a hard problem. The extra ~100ms of latency makes a huge difference otherwise, especially when you have use cases that require tool calling that can easily add 500ms+ of latency.

angry_octet · 2026-05-05T02:43:11 1777948991

It seems that tool calling shouldn't be 500ms of latency?

hobofan · 2026-05-05T07:55:39 1777967739

If you have tool calling complex enough that it necessitates a higher reasoning level, and you would otherwise have reasoning set to "none", this can easily come out to 500ms.

miki_oomiri · 2026-05-05T02:38:53 1777948733

People are migrating to the "End Of Thought" triggers. Deepgram does that wonderfully.

christophilus · 2026-05-05T13:12:44 1777986764

Agreed. It’s stressful. I think they need to have an option to adopt a suffix, so they don’t start babbling until there is an “over” followed by a pause like in the old army walkie talkie days.

wnmurphy · 2026-05-04T23:17:10 1777936630

Strongly agree, some of us like to choose our words more carefully when interacting with an LLM.

I've tried to convey this to OpenAI through various available channels (dev forums, app feedback, etc.).

Grok solves this by having an optional push-to-talk mode, but this is not hands-free and thus more cumbersome than just having a user-configurable variable like seconds_delay_before_sending_voice_input.

ericmcer · 2026-05-04T23:05:15 1777935915

yeh exactly, you cannot get a strong signal that a user is done speaking without some amount of “wait for 500ms of silence”. You could kick of processing and abandon if they continued talking, but that seems over optimized.

1-2s replies feel natural and like you pointed out pausing for 2-3s mid sentence is super normal.

charcircuit · 2026-05-05T01:08:42 1777943322

The AI should be able to model a probability for when is a natural moment to start talking.

MagicMoonlight · 2026-05-04T21:13:11 1777929191

It’s possible to change the amount of time it waits if you’re using the API

legohead · 2026-04-16T18:55:36 1776365736

Just happened to me and I was really confused. First time I've seen any malware callouts so it had me worried for a minute.

> This file is clearly not malware

Yeah, it's all my code, that you've seen before...

legohead · 2026-04-14T20:48:22 1776199702

waste of time and resources. you aren't going to win a fight against 3d printers. might as well outlaw the printers completely.

legohead · 2026-04-08T04:40:30 1775623230

adapt or die

waiting on the govt to do something is a path of failure

Frieren · 2026-04-08T14:34:42 1775658882

> waiting on the govt to do something is a path of failure

To keep the goverment accountable is a duty of every citizen and the only way to have a functioning society. The failure is to let the goverment be arbitrary and cater to the powerful instead of following the rule of law and applying it equally at all levels.

legohead · 2026-04-09T17:14:31 1775754871

we have failed a long time ago in that regard

legohead · 2026-04-06T19:26:42 1775503602

only a couple steps away from Idiocracy

1whizkid1 · 2026-04-06T20:03:09 1775505789

what does this mean lol

legohead · 2026-02-03T23:32:30 1770161550

what thermometer would you use to measure the temperature of space?

iancmceachern · 2026-02-04T06:16:03 1770185763

Thermocouples

legohead · 2026-01-28T17:20:28 1769620828

Wasn't 2048 first posted on HN before it got big?

morserer · 2026-01-28T18:29:44 1769624984

https://news.ycombinator.com/item?id=7373566

:)

dang · 2026-01-28T18:31:45 1769625105

Thanks! Massive list of related threads as of Oct 2024: https://news.ycombinator.com/item?id=41938249

legohead · 2026-01-21T19:39:58 1769024398

Battery tech

KellyCriterion · 2026-01-21T19:48:47 1769024927

Blockchain & DeFi!

Bingo!

:-D