Thinking Machines Lab unveils full‑duplex AI voice model with sub‑second replies

Thinking Machines Lab, the startup founded by former OpenAI CTO Mira Murati, announced a full‑duplex interaction model that can listen and speak simultaneously. The TML‑Interaction‑Small model generates responses in about 0.40 seconds, a speed the company says approaches natural human conversation. The technology is currently in a research preview phase, with limited access slated for the coming months and a wider release planned later this year. If the model delivers on its promise, AI voice assistants could become noticeably more fluid and less prone to awkward pauses.

Thinking Machines Lab, the venture launched last year by former OpenAI chief technology officer Mira Murati, rolled out its first full‑duplex interaction model on Tuesday. Named TML‑Interaction‑Small, the system can process incoming speech while it crafts a reply, cutting response time to roughly 0.40 seconds. That latency places the model near the speed of ordinary human back‑and‑forth, the company claims.

Full‑duplex capability marks a shift from the traditional push‑to‑talk style of most AI assistants, which wait for a speaker to finish before generating an answer. By overlapping listening and speaking, the new model aims to reduce the pauses that make voice assistants feel artificial. In a demonstration, the system answered follow‑up questions without the usual half‑second lag, giving the impression of a more natural dialogue.

Murati’s team says the speed advantage also eclipses comparable offerings from major AI labs such as OpenAI and Google, though the claim rests on internal benchmarks. The company has not released pricing, platform support, or performance data outside controlled tests. Those details remain pending as the technology moves from research preview to broader availability.

Access to TML‑Interaction‑Small will be limited at first. The firm plans a research preview in the next few months, followed by a wider rollout later this year. Interested developers and enterprises can apply for early access, but the company has not disclosed selection criteria. The preview will let participants evaluate whether the sub‑second responses translate into smoother user experiences in real‑world applications.

Industry observers note that faster turn‑taking alone does not guarantee a better conversation. An assistant that speaks too early may interrupt the user or misinterpret partial input, creating new friction points. The challenge, Murati acknowledges, is to balance speed with timing precision, ensuring the AI intervenes only when it has enough context to respond accurately.

If the model lives up to its promise, the impact could be felt across smartphones, smart speakers, and automotive infotainment systems. Users who rely on voice assistants for quick clarifications or hands‑free tasks would experience fewer awkward silences. For developers, the technology opens the door to more dynamic voice‑first applications that feel conversational rather than scripted.

While the announcement generated excitement, the practical test will come when the preview participants put the model through everyday scenarios. The AI community will watch closely to see whether the 0.40‑second benchmark holds up under diverse accents, background noise, and complex queries. Until then, Thinking Machines Lab’s full‑duplex breakthrough remains a promising glimpse into a future where talking to machines feels as natural as talking to another person.

Key Points