OpenAI Launches Three Real-Time Audio Models with Reasoning, Translation, and Transcription Capabilities
OpenAI Unveils GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper
OpenAI today released three new audio models through its Realtime API, marking a major leap in live voice applications. The models—GPT-Realtime-2 for voice agents with reasoning , GPT-Realtime-Translate for live speech translation, and GPT-Realtime-Whisper for streaming transcription—are available immediately. The Realtime API also exits beta, now generally available for production use.

“These models push voice applications beyond simple Q&A loops,” said an OpenAI spokesperson. “They listen, reason, translate, transcribe, and act within a single conversation.” Developers can test all three in the Playground.
GPT-Realtime-2: First Voice Model with GPT-5-Class Reasoning
GPT-Realtime-2 is the flagship release, described as OpenAI’s first voice model with GPT-5-class reasoning. It handles complex requests, manages interruptions, and maintains natural conversation flow. The context window expands from 32K to 128K tokens, enabling longer, context-rich interactions.
“Previous voice models stalled on multi-step requests or lost context in long sessions,” the spokesperson noted. “GPT-Realtime-2 keeps the conversation moving while reasoning through a request.” Developers can add short preamble phrases like “let me check that” to signal processing, avoiding awkward silence.
The model also supports tool calling and narrates actions in real time. Adjustable reasoning effort (minimal, low, medium, high, xhigh) lets teams tune performance. Tone control adapts speaking style—calm, empathetic, or upbeat—based on scenario. On Big Bench Audio, GPT-Realtime-2 with high reasoning scored 96.6%, up from 81.4% for GPT-Realtime-1.
GPT-Realtime-Translate: Live Speech Translation Across 100+ Languages
GPT-Realtime-Translate enables simultaneous speech translation for conversational scenarios. It supports over 100 languages and processes speech-to-speech in near real time, preserving tone and intention.
“This model breaks language barriers in live conversations,” an industry analyst commented. “It’s designed for customer support, international meetings, and tourism.” The model handles code-switching and idiomatic expressions, though OpenAI advises auditing for specialized domains.

GPT-Realtime-Whisper: Streaming Transcription for Low-Latency Applications
GPT-Realtime-Whisper focuses on streaming transcription, delivering text in near real time. It optimizes for low latency, suitable for live captioning, meeting notes, and voice-controlled interfaces.
“Transcription is foundational for voice apps,” the spokesperson said. “This model processes audio as it arrives, minimizing delay.” It leverages OpenAI’s Whisper architecture with streaming improvements.
Background
OpenAI’s Realtime API launched in beta in early 2024, offering developers early access to voice capabilities. The API enables building voice agents, translation tools, and transcription services. Today’s release marks the first major upgrade since beta, with three specialized models replacing earlier generic offerings.
The company has been investing heavily in voice AI, competing with Google’s Chirp and Amazon’s Alexa Foundation models. This release signals OpenAI’s commitment to production-grade voice solutions.
What This Means
For developers, the general availability of the Realtime API and new models means they can now build production systems without beta uncertainties. GPT-Realtime-2’s reasoning and tone control could revolutionize customer service and virtual assistants. GPT-Realtime-Translate opens international communication, while GPT-Realtime-Whisper improves real-time accessibility.
“These models bridge the gap between experimental voice AI and enterprise-ready tools,” the analyst noted. “Expect rapid adoption in healthcare, finance, and customer support.” Businesses should evaluate the adjustable reasoning effort to balance speed and accuracy.
Related Articles
- Scaling to Billions: How OpenAI Built a Global Identity Infrastructure with Ory
- LangChain Exodus: AI Engineers Ditch Frameworks for Native Agent Architectures in Production Push
- AI Showdown: Which Chatbot Gives the Best Advice for Selling Your Car?
- Achieving Persistent Memory Across AI Agents with Hooks and Neo4j
- Your Guide to Choosing Claude, Gemini, or Any AI Assistant as Your Default in iOS 27
- 8 Key Insights About MIT's SEAL: The New Frontier in Self-Improving AI
- How to Understand Android AICore Storage Spikes: A Step-by-Step Guide
- How Microsoft’s DLSS competitor is now available on the Xbox Ally X handheld