March 29, 2018
According to members of Google’s Brain and Machine Perception teams, researchers at the tech giant have developed “ways to make machine-generated speech sound more natural to humans,” even providing examples of the more expressive speech in a company blog post, reports VentureBeat. Google also announced the release of its Cloud Text-to-Speech services, which could “be used to bring more natural speech to devices, apps or digital services that utilize voice control or voice computing,” the article explains.
Cloud Text-to-Speech, which is powered by DeepMind’s WaveNet, provides “customers the same speech synthesis used by Google Assistant,” adds VentureBeat.
The news of these new ways of making computer-generated voices sound more human come from two papers published by Google. They cover how the researchers approached incorporating prosody (stress and intonation) into the voices, which are new developments building on previous work.
“Both papers document techniques that build on top of Tacotron 2, an AI system that uses neural networks trained to mimic human speech that made its debut last December,” explains VentureBeat. “Though Tacotron sounds like a human voice to the majority of people in an initial test with 800 subjects, it’s unable to imitate things like stress or a speaker’s natural intonation.”
Practical implications for advances like these can be applied to existing technology like Google Assistant, the company’s voice-activated virtual assistant.
“Getting away from monotonous voices without range appears to be part of the strategy for tech giants with assistants like Alexa, Siri, and Google Assistant. Siri got a more expressive voice last year, and last April, Alexa got SSML tags for voice app developers to add expression to the assistant’s voice like a pause, whisper, or expressions like ‘BOOM’ or ‘Bada bing.’ SSML has also been made available for the makers of Google Assistant actions,” according to VentureBeat.