Tech

OpenAI could soon launch a multimodal AI digital assistant

OpenAI showed some of its customers a new multimodal AI model that can both talk to you and recognize objects, according to a new report from Information. Citing anonymous sources who have seen it, the outlet says it could be part of what the company plans to show on Monday.

The new model would offer faster and more accurate interpretation of images and audio than its existing separate transcription and text-to-speech models can.. It would apparently be able to help customer service agents “better understand the intonation of callers’ voices or if they are being sarcastic” and “in theory”, the model can help students with mathematics or translate signs of the real world, written The information.

The outlet’s sources say the model can outperform GPT-4 Turbo at “answering certain types of questions,” but it’s still likely to be confidently wrong.

It’s possible that OpenAI is also preparing a new built-in ChatGPT feature for making phone calls, according to developer Ananay Arora, who posted the above screenshot of the call-related code. Arora also spotted evidence that OpenAI had provisioned servers intended for real-time audio and video communication.

None of this would be GPT-5 if it was released next week. CEO Sam Altman explicitly denied that his upcoming announcement had anything to do with the model supposedly being “materially better” than GPT-4. Information writes that GPT-5 could be released to the public by the end of the year.

News Source : www.theverge.com
Gn tech

Back to top button