Breaking News

OpenAI rolls out real-time voice models that can reason, translate & multitask

OpenAI introduced three new audio models - GPT-Realtime-2, GPT-Realtime-Translate and GPT-Realtime-Whisper for its developer platform, designed to make voice-based software agents more ‌conversational and capable of completing tasks in real time.

The new application programming interface (API) moves the ChatGPT-maker beyond transcription and chat toward agents that can listen, translate ⁠and act during live conversations.

The new models are now available for testing in OpenAI’s developer playground.

GPT-Realtime-2 is built to handle harder requests, call tools, manage interruptions, and maintain context during longer voice interactions.

GPT-Realtime-Translate supports translation from more than 70 languages into 13 output languages, targeting customer support, education and other settings.

GPT-Realtime-Whisper delivers live speech-to-text capabilities, allowing captions, meeting notes, and workflow updates to be generated in real time as a person speaks.

Early customers testing the models include Zillow, Priceline, and Deutsche Telekom.

Pricing for GPT-Realtime-2 starts at $32 ⁠per million audio input tokens, GPT-Realtime-Translate costs $0.034 per minute and GPT-Realtime-Whisper $0.017 per minute.