OpenAI adds real‑time voice, translation and transcription to its API

OpenAI adds real‑time voice, translation and transcription to its API
TechCrunch

Key Points

  • OpenAI adds three voice‑AI models to its API: GPT‑Realtime‑2, GPT‑Realtime‑Translate, GPT‑Realtime‑Whisper.
  • GPT‑Realtime‑2 uses GPT‑5‑class reasoning for complex conversational tasks.
  • Translate model supports 70+ input languages and 13 output languages in real time.
  • Whisper provides live speech‑to‑text transcription billed by the minute.
  • Pricing: translation and transcription per minute; conversational model per token.
  • Target users include customer‑service, education, media, events and creator platforms.
  • Built‑in guardrails halt interactions that breach harmful‑content guidelines.
  • All models accessed via OpenAI’s Realtime API.

OpenAI announced Thursday that its API now supports three new voice‑focused models—GPT‑Realtime‑2, GPT‑Realtime‑Translate and GPT‑Realtime‑Whisper. The suite lets developers build applications that can converse, translate and transcribe speech on the fly, with support for more than 70 input languages and 13 output languages. Billing is split between per‑minute rates for translation and transcription and token‑based pricing for the conversational model. OpenAI says the tools target customer‑service, education, media and creator platforms, and includes guardrails to curb misuse.

OpenAI unveiled a trio of voice‑intelligence models for its API on Thursday, signaling a shift from simple call‑and‑response systems to more versatile audio interfaces. The flagship, GPT‑Realtime‑2, builds on the earlier GPT‑Realtime‑1.5 but runs on GPT‑5‑class reasoning, allowing it to handle complex user requests while maintaining a natural conversational tone.

Alongside the new conversational model, OpenAI introduced GPT‑Realtime‑Translate, a real‑time translation engine that supports over 70 source languages and can output speech in 13 target languages. The company describes the service as keeping pace with a speaker, delivering fluent, context‑aware translations as a dialogue unfolds.

The third addition, GPT‑Realtime‑Whisper, provides live speech‑to‑text conversion. Users can capture spoken words as they occur, turning audio streams into accurate transcripts without a separate post‑processing step.

All three models are accessible through OpenAI’s Realtime API. Pricing differs by function: translation and transcription are billed by the minute, while the conversational model follows token‑based consumption. This structure gives developers flexibility in managing costs based on usage patterns.

OpenAI highlighted several sectors that could benefit from the new capabilities. Customer‑service platforms can deploy voice agents that listen, reason and act within a single interaction. Educational tools may use real‑time translation to bridge language barriers, while media outlets and event organizers can automate captioning and multilingual coverage. Creator platforms stand to gain from seamless voice integration that enhances user engagement.

Recognizing the potential for abuse, OpenAI embedded safeguards into the models. Specific triggers pause conversations that violate the company’s harmful‑content policy, aiming to prevent spam, fraud and other malicious activities. The firm emphasized that these guardrails are part of its broader effort to ensure responsible deployment of powerful AI tools.

Industry observers note that the announcement expands OpenAI’s competitive edge in the rapidly growing voice‑AI market. By offering a single API that handles conversation, translation and transcription, the company reduces the need for developers to stitch together multiple services. The move could accelerate adoption of voice interfaces across a range of applications, from virtual assistants to real‑time multilingual support.

#OpenAI#API#voice AI#real-time translation#speech-to-text#GPT‑Realtime‑2#GPT‑Realtime‑Translate#GPT‑Realtime‑Whisper#developer tools#artificial intelligence
Generated with  News Factory -  Source: TechCrunch

Also available in: