September 27, 2023
OpenAI is experimenting with new voice and image capabilities in ChatGPT. According to the company, users can now “speak with ChatGPT and have it talk back,” thanks to an intuitive new interface that, in addition to facilitating voice conversations, will allow users to show ChatGPT an image to discuss. “Snap a picture of a landmark while traveling and have a live conversation about what’s interesting about it,” OpenAI explains, alternatively suggesting you “snap pictures of your fridge and pantry to figure out what’s for dinner” or have it help with homework based on pictures of a math problem.
“We’re rolling out voice and images in ChatGPT to Plus and Enterprise users over the next two weeks,” OpenAI writes in a blog post that specifies it will follow for developers and others. While voice is debuting only on iOS and Android, images is being made available on all platforms.
The voice capability “is powered by a new text-to-speech model, capable of generating human-like audio from just text and a few seconds of sample speech,” OpenAI says, adding that it “collaborated with professional voice actors to create each of the voices.”
The technology also draws on Whisper, OpenAI’s open-source speech recognition system that transcribes spoken words into text. These are the same technologies Spotify announced it using to voice podcasts in other languages.
“OpenAI’s excellent Whisper model does a lot of the speech-to-text work, and the company is rolling out a new text-to-speech model it says can generate ‘human-like audio from just text and a few seconds of sample speech,’” The Verge reports, noting that while voices are currently limited to five options “OpenAI seems to think the model has vastly more potential than that.”
While The Verge says the voice chat should be familiar to anyone using Alexa or Google Assistant, “the image search, meanwhile, is a bit like Google Lens.”
A ChatGPT drawing tool can also be used as part of the visual query, and a combination of speech, text and images can help refine a search, which The Verge cites as an example where AI’s “back-and-forth nature is helpful; rather than doing a search, getting the wrong answer, and then doing another search, you can prompt the bot and refine the answer as you go.”
CNBC calls this OpenAI’s most significant ChatGPT upgrade since GPT-4 (still not available to non-paying users) and says it arrives “alongside ever-rising stakes of the AI arms race among chatbot leaders such as OpenAI, Microsoft, Google and Anthropic.”
OpenAI’s GPT-4 with Vision Still Has Flaws, Paper Reveals, TechCrunch, 9/26/23