Speech Synthesis Archives

Auto-GPT Generates Social Sizzle, Ushers in Era of AI Agents

By Paula Parisi
April 25, 2023

Auto-GPT, an open source app that uses OpenAI’s text-generating models, is currently generating a great deal of social media attention. The program can act somewhat autonomously in that it creates its own feedback loop, asking itself a series of questions to help build a more nuanced and complete response to a text prompt. In short, something that would take a user multiple prompts to produce the desired information using ChatGPT could be accomplished using a single request of Auto-GPT, which could independently explore a subject before spitting back a comprehensive response. Continue reading Auto-GPT Generates Social Sizzle, Ushers in Era of AI Agents

Facebook Reveals New AI-Powered Text-to-Speech System

By Debra Kaufman
May 22, 2020

Facebook introduced an AI text-to-speech system (TTS) that produces a second of audio in 500 milliseconds. According to Facebook, the system, which is used with a new approach to data collection, powered the creation of a British accent-inflected voice in six months, versus over a year required for other voices. The TTS is now used for Facebook’s Portal smart display brand. The system can be hosted in real time via ordinary processors and is also available as a service for other apps, including Facebook’s VR. Continue reading Facebook Reveals New AI-Powered Text-to-Speech System

Google and IBM Create Advanced Text-to-Speech Systems

By Debra Kaufman
October 2, 2019

Both IBM and Google recently advanced development of Text-to-Speech (TTS) systems to create high-quality digital speech. OpenAI found that, since 2012, the compute power needed to train TTS models has exploded to more than 300,000 times. IBM created a much less compute-intensive model for speech synthesis, stating that it is able to do so in real-time and adapt to new speaking styles with little data. Google and Imperial College London created a generative adversarial network (GAN) to create high-quality synthetic speech. Continue reading Google and IBM Create Advanced Text-to-Speech Systems

New Alexa Speaking Style Created by Neural Text-to-Speech

By Debra Kaufman
November 27, 2018

Amazon is training Alexa to speak like a newscaster, a feature that will roll out in a few weeks. The new speaking style is based on Amazon’s neural text-to-speech (NTTS) developments. The new voice style doesn’t sound human, but does stress words as a TV or radio announcer would. Before creating this voice, Amazon did a survey that showed that users prefer this newscaster style when listening to articles. The new voice is also an example of “the next generation of speech synthesis,” based on machine learning. Continue reading New Alexa Speaking Style Created by Neural Text-to-Speech

Adobe Project VoCo Audio Editor Offers Photoshop-Like Tools

By Debra Kaufman
November 7, 2016

Adobe Research and Princeton University are collaborating on software that acts like Photoshop for audio, including the ability to add words not found in the original audio file. Adobe developer Zeyu Jin, who spoke at the Adobe MAX conference, described the would-be product, codenamed Project VoCo, as a “sneak peak.” Project VoCo is intended to be an audio editing application, with more typical speech editing and noise cancellation features, but the Photoshop-like tool also raises potential ethical issues regarding the use of doctored audio clips.

Continue reading Adobe Project VoCo Audio Editor Offers Photoshop-Like Tools