OpenAI Voice Cloning Tool Needs Only a 15-Second Sample

OpenAI has debuted a new text-to-voice generation platform called Voice Engine, available in limited access. Voice Engine can generate a synthetic voice from a 15-second clip of someone’s voice. The synthetic voice can then read a provided text, even translating to other languages. For now, only a handful of companies are using the tech under a strict usage policy as OpenAI grapples with the potential for misuse. “These small scale deployments are helping to inform our approach, safeguards, and thinking about how Voice Engine could be used for good across various industries,” OpenAI explained.

Companies helping OpenAI test the technology include “the education technology company Age of Learning, visual storytelling platform HeyGen, frontline health software maker Dimagi, AI communication app creator Livox, and health system Lifespan,” according to The Verge.

Sample content from OpenAI showcases “what Age of Learning has been doing with the technology to generate pre-scripted voice-over content, as well as reading out ‘real-time, personalized responses’ to students written by GPT-4.”

OpenAI writes in a blog post of Voice Engine use cases including translating content, global communications, support for people who are non-verbal, and helping with recovery from degenerative speech conditions, showcasing various tests.

Entertainment applications like dubbing and automatic dialogue replacement (ADR) are obvious opportunities.

Safety is emphasized. “We recognize that generating speech that resembles people’s voices has serious risks, which are especially top of mind in an election year,” writes OpenAI. “We are engaging with U.S. and international partners from across government, media, entertainment, education, civil society and beyond to ensure we are incorporating their feedback as we build.”

“As deepfakes proliferate, OpenAI is refining the tech used to clone voices — but the company insists it’s doing so responsibly,” reports TechCrunch, noting the Voice Engine, “an expansion of the company’s existing text-to-speech API,” has been under development for about two years.

“We want to make sure that everyone feels good about how it’s being deployed — that we understand the landscape of where this tech is dangerous and we have mitigations in place for that,” OpenAI Voice Engine product staffer Jeff Harris told TechCrunch.

The model used to power the Voice Engine is called Voice Generation, according to The Verge, which says it is the same model that drives the Read Aloud feature in ChatGPT.

No Comments Yet

You can be the first to comment!

Leave a comment

You must be logged in to post a comment.