Facebook Reveals New AI-Powered Text-to-Speech System

Facebook introduced an AI text-to-speech system (TTS) that produces a second of audio in 500 milliseconds. According to Facebook, the system, which is used with a new approach to data collection, powered the creation of a British accent-inflected voice in six months, versus over a year required for other voices. The TTS is now used for Facebook’s Portal smart display brand. The system can be hosted in real time via ordinary processors and is also available as a service for other apps, including Facebook’s VR. Continue reading Facebook Reveals New AI-Powered Text-to-Speech System

Google and IBM Create Advanced Text-to-Speech Systems

Both IBM and Google recently advanced development of Text-to-Speech (TTS) systems to create high-quality digital speech. OpenAI found that, since 2012, the compute power needed to train TTS models has exploded to more than 300,000 times. IBM created a much less compute-intensive model for speech synthesis, stating that it is able to do so in real-time and adapt to new speaking styles with little data. Google and Imperial College London created a generative adversarial network (GAN) to create high-quality synthetic speech. Continue reading Google and IBM Create Advanced Text-to-Speech Systems

New Alexa Speaking Style Created by Neural Text-to-Speech

Amazon is training Alexa to speak like a newscaster, a feature that will roll out in a few weeks. The new speaking style is based on Amazon’s neural text-to-speech (NTTS) developments. The new voice style doesn’t sound human, but does stress words as a TV or radio announcer would. Before creating this voice, Amazon did a survey that showed that users prefer this newscaster style when listening to articles. The new voice is also an example of “the next generation of speech synthesis,” based on machine learning. Continue reading New Alexa Speaking Style Created by Neural Text-to-Speech

Adobe Project VoCo Audio Editor Offers Photoshop-Like Tools

Adobe Research and Princeton University are collaborating on software that acts like Photoshop for audio, including the ability to add words not found in the original audio file. Adobe developer Zeyu Jin, who spoke at the Adobe MAX conference, described the would-be product, codenamed Project VoCo, as a “sneak peak.” Project VoCo is intended to be an audio editing application, with more typical speech editing and noise cancellation features, but the Photoshop-like tool also raises potential ethical issues regarding the use of doctored audio clips.

Continue reading Adobe Project VoCo Audio Editor Offers Photoshop-Like Tools