OpenAI introduced three new audio models - GPT-Realtime-2, GPT-Realtime-Translate and GPT-Realtime-Whisper for its developer platform, designed to make voice-based software agents more conversational and capable of completing tasks in real time.
The new application programming interface (API) moves the ChatGPT-maker beyond transcription and chat toward agents that can listen, translate and act during live conversations.
The new models are now available for testing in OpenAI’s developer playground.
GPT-Realtime-2 is built to handle harder requests, call tools, manage interruptions, and maintain context during longer voice interactions.
GPT-Realtime-Translate supports translation from more than 70 languages into 13 output languages, targeting customer support, education and other settings.
GPT-Realtime-Whisper delivers live speech-to-text capabilities, allowing captions, meeting notes, and workflow updates to be generated in real time as a person speaks.
Early customers testing the models include Zillow, Priceline, and Deutsche Telekom.
Pricing for GPT-Realtime-2 starts at $32 per million audio input tokens, GPT-Realtime-Translate costs $0.034 per minute and GPT-Realtime-Whisper $0.017 per minute.
See What’s Next in Tech With the Fast Forward Newsletter
Tweets From @varindiamag
Nothing to see here - yet
When they Tweet, their Tweets will show up here.




