Deepgram’s Speech Portfolio Now Includes Human-Like Aura

Deepgram’s new Aura software turns text into generative audio with a “human-like voice.” The 9-year-old voice recognition company has raised nearly $86 million to date on the strength of its Voice AI platform. Aura is an extremely low-latency text-to-speech voice AI that can be used for voice AI agents, the company says. Paired with Deepgram’s Nova-2 speech-to-text API, developers can use it to “easily (and quickly) exchange real-time information between humans and LLMs to build responsive, high-throughput AI agents and conversational AI applications,” according to Deepgram. Continue reading Deepgram’s Speech Portfolio Now Includes Human-Like Aura

Pika Taps ElevenLabs Audio App to Add Lip Sync to AI Video

On the heels of ElevenLabs’ demo of a text-to-sound app unveiled using clips generated by OpenAI’s text-to-video artificial intelligence platform Sora, Pika Labs is releasing a feature called Lip Sync that lets its paid subscribers use the ElevenLabs app to add AI-generated voices and dialogue to Pika-generated videos and have the characters’ lips moving in sync with the speech. Pika Lip Sync supports both uploaded audio files and text-to-audio AI, allowing users to type or record dialogue, or use pre-existing sound files, then apply AI to change the voicing style. Continue reading Pika Taps ElevenLabs Audio App to Add Lip Sync to AI Video

ElevenLabs Promotes Its Latest Advances in AI Audio Effects

“What if you could describe a sound and generate it with AI?,” asks startup ElevenLabs, which set out to do just that, and says it has succeeded. The two-year-old company explains it “used text prompts like ‘waves crashing,’ ‘metal clanging,’ ‘birds chirping,’ and ‘racing car engine’ to generate audio.” Best known for using machine learning to clone voices, the AI firm founded by Google and Palantir alums has yet to make publicly available its new text-to-sound model but began teasing it by releasing online demos this week. Some see the technology as a natural complement to the latest wave of image generators. Continue reading ElevenLabs Promotes Its Latest Advances in AI Audio Effects

Amazon Claims ’Emergent Abilities’ for Text-to-Speech Model

Researchers at Amazon have trained what they are calling the largest text-to-speech model ever created, which they claim is exhibiting “emergent” qualities — the ability to inherently improve itself at speaking complex sentences naturally. Called BASE TTS, for Big Adaptive Streamable TTS with Emergent abilities, the new model could pave the way for more human-like interactions with AI, reports suggest. Trained on 100,000 hours of public domain speech data, BASE TTS offers “state-of-the-art naturalness” in English as well as some German, Dutch and Spanish. Text-to-speech models are used in developing voice assistants for smart devices and apps and accessibility. Continue reading Amazon Claims ’Emergent Abilities’ for Text-to-Speech Model

Meta AI Seamless Translator Converts Nearly 100 Languages

The research division of Meta AI has developed Seamless Communication, a suite of artificial intelligence models that generate what the company says is natural and authentic communication across languages, facilitating what amounts to real-time universal speech translation. The models were released with accompanying research papers and data. The flagship model, Seamless, merges capabilities from a trio of models — SeamlessExpressive, SeamlessStreaming and SeamlessM4T v2 — into a single system that can translate between almost 100 spoken and written languages, preserving idioms, emotion and the speaker’s vocal style, Meta says. Continue reading Meta AI Seamless Translator Converts Nearly 100 Languages

Meta’s Multimodal AI Model Translates Nearly 100 Languages

Meta Platforms is releasing SeamlessM4T, the world’s “first all-in-one multilingual multimodal AI translation and transcription model,” according to the company. SeamlessM4T can perform speech-to-text, speech-to-speech, text-to-speech, and text-to-text translations for up to 100 languages, depending on the task. “Our single model provides on-demand translations that enable people who speak different languages to communicate more effectively,” Meta claims, adding that SeamlessM4T “implicitly recognizes the source languages without the need for a separate language identification model.” Continue reading Meta’s Multimodal AI Model Translates Nearly 100 Languages

QuickVid Uses AI to Create Short Videos from Text Prompts

QuickVid is a new AI-driven text-to-video platform aiming for a mass market user base. The tool draws on various generative AI systems to automatically create short-form videos for YouTube, Instagram, TikTok and other platforms. Created by former Meta Platforms programmer Daniel Habib “in a matter of weeks,” QuickVid is quite rudimentary, though Habib says he plans to continue fine tuning and adding features. Unlike Google and Meta have done with their nascent text-to-video systems, QuickVid has bypassed the formalities of research papers and industry previews and jumped directly to a public-facing website. Continue reading QuickVid Uses AI to Create Short Videos from Text Prompts

Google Brings Personalization Features to Your News Update

Google is adding new features to Your News Update, its news aggregation service, to personalize 90-minute news feeds from each user’s preferred sources. The goal is to create a seamless listening experience akin to a customized song playlist. Each news playlist, similar to those on public radio, will begin with short clips about the major headlines moving into longer stories. The end product, available only in the U.S., will compile radio, podcast clips and text-to-speech translations tailored to the individual user. Continue reading Google Brings Personalization Features to Your News Update

Facebook Reveals New AI-Powered Text-to-Speech System

Facebook introduced an AI text-to-speech system (TTS) that produces a second of audio in 500 milliseconds. According to Facebook, the system, which is used with a new approach to data collection, powered the creation of a British accent-inflected voice in six months, versus over a year required for other voices. The TTS is now used for Facebook’s Portal smart display brand. The system can be hosted in real time via ordinary processors and is also available as a service for other apps, including Facebook’s VR. Continue reading Facebook Reveals New AI-Powered Text-to-Speech System

Amazon Licenses Original Interactive Audio Series for Alexa

Amazon has inked an exclusive license for “Tala’s World,” a seven-episode young adult adventure series produced by audio startup Xandra, which has produced Alexa skills for HBO, Sesame Workshop and Ubisoft. In the new adventure series, listeners help elf-like character Blobby find his missing best friend Tala by making decisions, collecting clues, and interrogating suspects. Available exclusively on Alexa, Amazon recently released the first episode and plans to release the second episode on December 13. Continue reading Amazon Licenses Original Interactive Audio Series for Alexa

Google and IBM Create Advanced Text-to-Speech Systems

Both IBM and Google recently advanced development of Text-to-Speech (TTS) systems to create high-quality digital speech. OpenAI found that, since 2012, the compute power needed to train TTS models has exploded to more than 300,000 times. IBM created a much less compute-intensive model for speech synthesis, stating that it is able to do so in real-time and adapt to new speaking styles with little data. Google and Imperial College London created a generative adversarial network (GAN) to create high-quality synthetic speech. Continue reading Google and IBM Create Advanced Text-to-Speech Systems

Publishers and Authors Guild Oppose Audible Text Feature

Audible, the audiobook app owned by Amazon, is using machine learning to transcribe audio recordings, so listeners can also read along with the narrator. Audible is promoting it as an educational feature, but some publishers are up in arms, demanding their books be excluded because captions are “unauthorized and brazen infringements of the rights of authors and publishers.” Publishers are concerned that this will lead to fewer people buying physical or e-books if they can get the text with an Audible audiobook. Continue reading Publishers and Authors Guild Oppose Audible Text Feature

Text-to-Speech System Quickly Mimics Hundreds of Accents

As another example of the significant advances we have been following in artificial intelligence and deep learning, Chinese search giant Baidu has introduced Deep Voice 2, the second iteration of its compelling text-to-speech system. The company introduced Deep Voice just three months ago, with the ability to produce speech “in near real time” that was “nearly indistinguishable from an actual human voice,” according to The Verge. While the first system was limited to learning one voice at a time, “and required many hours of audio or more from which to build a sample,” the updated version “can learn the nuances of a person’s voice with just half an hour of audio, and a single system can learn to imitate hundreds of different speakers.” Continue reading Text-to-Speech System Quickly Mimics Hundreds of Accents

Google Redoubles its Cloud Ambitions, Offering AI Programs

Cloud computing is booming, and Google is losing ground to Amazon and Microsoft. As the business of renting computer servers to outside businesses grows more lucrative, Google has decided to promote its artificial intelligence software to enterprise customers. Now, potential customers of Google’s cloud offering can also take advantage two software programs — converting text to speech and extracting meaning from text — that, up until now, have only been used internally. Rivals Amazon and Microsoft offer competing AI products. Continue reading Google Redoubles its Cloud Ambitions, Offering AI Programs

VideoDubber Automatically Dubs Video into 30+ Languages

Foreign film fans may have a new reason to get excited. Israeli startup VideoDubber is introducing a new technology that could address complaints of subtitles in media content. The company claims that its TruDub technology can automatically dub films, TV shows and video into more than 30 languages including Arabic, Chinese, Spanish, and four dialects of English. The service uses synthetic voices that it says sound natural since they are based on professional voice talent. Continue reading VideoDubber Automatically Dubs Video into 30+ Languages