Lip Sync Archives - ETCentric

Nvidia Audio2Face AI Avatar-Generator Is Now Open Source

By Paula Parisi
September 30, 2025

Nvidia has made its Audio2Face open source, a potential boon for game developers and other 3D uses such as customer service. The generative AI facial animation system brings lifelike speech and expression to avatars on an accelerated basis using real-time facial animation and lip-sync. It works by analyzing acoustic features to create a stream of animation data that is then mapped onto a character’s facial poses. The data translates to “accurate lip-sync and emotional expressions,” says Nvidia, noting the imagery can be rendered offline for pre-scripted content or streamed in real time for dynamic characters with accurate lip-sync and emotional expressions. Continue reading Nvidia Audio2Face AI Avatar-Generator Is Now Open Source

Meta Updates Features for Reels on Facebook and Instagram

By Paula Parisi
August 27, 2025

Instagram has started to let creators link multiple Reels, effectively creating their own “series.” The new capability, which TikTok already offers, makes it more convenient for viewers to discover and follow related content. The new “Link a Reel” feature lets creators sequentially either link generally related content or connect specific, sequentially related material, as in Part 2, Part 3, etc. Meta Platforms has also added a free AI-powered voice translation feature. Now available globally on Facebook and Instagram, it automatically dubs and lip syncs Reels into other languages using the sound and tone of the creator’s own voice. Continue reading Meta Updates Features for Reels on Facebook and Instagram

Google Upgrades GenAI Models, Debuts AI Storyteller ‘Flow’

By Paula Parisi
May 22, 2025

Google is in a filmmaking frame of mind. The search giant introduced Veo 3, the latest version of its generative video model, loading it with cinematic capabilities including a new AI storytelling tool called Flow. At the Google I/O conference the company also debuted an upgraded image generator, Imagen 4, and announced expanded access to the AI music tool Lyria 2. Veo 3 can generate videos with audio — a Google first, adding things like background traffic noises, birds singing, “even dialogue between characters.” It offers improved consistency of characters, scenes and objects, while gaining camera controls, outpainting and object add/remove. Continue reading Google Upgrades GenAI Models, Debuts AI Storyteller ‘Flow’

Amazon Prime Video Tests AI Dubbing for Movies and Series

By Paula Parisi
March 12, 2025

Amazon is experimenting with AI dubbing so Prime Video customers globally can experience content from other territories, gaining access more quickly and efficiently to licensed films and TV series. The company is using a hybrid “AI-aided” system in which localization professionals oversee the AI output to ensure quality control. Currently limited to a dozen movies and series that will be AI-dubbed in English and Latin American Spanish, the pilot will expand if the results prove popular with audiences. In December, Netflix experienced backlash against AI-assisted dubbing, with viewers complaining generative mouth adjustments looked unnatural. Continue reading Amazon Prime Video Tests AI Dubbing for Movies and Series

ByteDance’s AI Model Can Generate Video from Single Image

By Paula Parisi
February 6, 2025

ByteDance has developed a generative model that can use a single photo to generate photorealistic video of humans in motion. Called OmniHuman-1, the multimodal system supports various visual and audio styles and can generate people doing things like singing, dancing, speaking and moving in a natural fashion. ByteDance says its new technology clears hurdles that hinder existing human-generators — obstacles like short play times and over-reliance on high-quality training data. The diffusion transformer-based OmniHuman addressed those challenges by mixing motion-related conditions into the training phase, a solution ByteDance researchers claim is new. Continue reading ByteDance’s AI Model Can Generate Video from Single Image

BodyTalk Dubs into 29 Languages with Facial Moves to Match

By Paula Parisi
November 12, 2024

Panjaya is a AI startup that aims to disrupt the world of video dubbing with a way to generate “hyperrealistic” recreations of a person’s voice speaking a new language. The system also automatically modifies the imagery to match lip and other physical movements to match the new speech patterns. Called BodyTalk, the technique is the launch point for Panjaya as it emerges from the stealth in which it conducted its R&D the past three years, backed by $9.5 million from venture funds and angel backers. The startup describes BodyTalk as “AI dubbing that looks and feels as natural as the original.” Continue reading BodyTalk Dubs into 29 Languages with Facial Moves to Match

D-ID Employs AI to Translate Videos into Multiple Languages

By Paula Parisi
August 23, 2024

D-ID, a platform that uses AI to generate digital humans, has announced D-ID Video Translate in general availability. The tool lets businesses and content creators automatically re-voice videos in multiple languages, “cloning the speaker’s voice and adapting their lip movements from a single upload.” D-ID is making the Video Translate tool, which accommodates 30 different languages, free to D-ID subscribers for a limited time, available through the D-ID Studio or the company’s API. Languages include Arabic, Mandarin, Japanese, Hindi and Ukrainian, in addition to Spanish, German, French and Italian. Users can simultaneously translate content using bulk translation. Continue reading D-ID Employs AI to Translate Videos into Multiple Languages

Runway’s Gen-3 Alpha Creates Realistic Video from Still Image

By Paula Parisi
August 5, 2024

AI media firm Runway has launched Gen-3 Alpha, building on the text-to-video model by using images to prompt realistic videos generated in seconds. Navigate to Runway’s web-based interface and click on “try Gen 3-Alpha” and you’ll land on a screen with an image uploader, as well as a text box for those who either prefer that approach or want to use natural language to tweak results. Runway lets users generate up to 10 seconds of contiguous video using a credit system. “Image to Video is major update that greatly improves the artistic control,” Runway said in an announcement. Continue reading Runway’s Gen-3 Alpha Creates Realistic Video from Still Image

Lifelike AI Avatars to Get New Features with Synthesia Update

By Paula Parisi
June 27, 2024

Synthesia, which uses AI to create business avatars for use in content such as training, presentation and customer service videos, has announced a major platform update. “Coming soon” with Synthesia 2.0 are full-body avatars that include hands capable of a wide range of motions. Users can animate motion using skeletal sequences on which the persona selected from the catalog can then be automatically mapped. Starting next month, the Nvidia-backed UK company will offer the ability to incorporate brand identity — including typography, colors and logos — into templated videos. A new translation tool automatically applies updates to all languages. Continue reading Lifelike AI Avatars to Get New Features with Synthesia Update

ElevenLabs Launches an AI Tool for Generating Sound Effects

By Paula Parisi
June 6, 2024

ElevenLabs has launched its text-to-sound generator Sound Effects for all users, available now at the company’s website. The new AI tool can create audio effects, short instrumental tracks, soundscapes and even character voices. Sound Effects “has been designed to help creators — including film and television studios, video game developers, and social media content creators — generate rich and immersive soundscapes quickly, affordably and at scale,” according to the startup, which developed the tool in partnership with Shutterstock, using its library of licensed audio tracks. Continue reading ElevenLabs Launches an AI Tool for Generating Sound Effects

Synthesia Express-1 Model Gives ‘Expressive Avatars’ Emotion

By ETCentric Staff
May 2, 2024

London-based AI-startup Synthesia, which creates avatars for enterprise-level generative video presentations, has added “Expressive Avatars” to its feature kit. Powered by Synthesia’s new Express-1 model, these fourth-generation avatars have achieved a new benchmark in realism by using contextual expressions that approximates human emotion, the company says. Express-1 has been trained “to understand the intricate relationship between what we say and how we say it,” allowing Expressive Avatars to perform a script with the correct vocal tone, body language and lip movement, “like a real actor,” according to Synthesia. Continue reading Synthesia Express-1 Model Gives ‘Expressive Avatars’ Emotion

Microsoft’s VASA-1 Can Generate Talking Faces in Real Time

By ETCentric Staff
April 22, 2024

Microsoft has developed VASA, a framework for generating lifelike virtual characters with vocal capabilities including speaking and singing. The premiere model, VASA-1, can perform the feat in real time from a single static image and a vocalization clip. The research demo showcases realistic audio-enhanced faces that can be fine-tuned to look in different directions or change expression in video clips of up to one minute at 512 x 512 pixels and up to 40fps “with negligible starting latency,” according to Microsoft, which says “it paves the way for real-time engagements with lifelike avatars that emulate human conversational behaviors.” Continue reading Microsoft’s VASA-1 Can Generate Talking Faces in Real Time

Midjourney Creates a Feature to Advance Image Consistency

By ETCentric Staff
March 15, 2024

Artificial intelligence imaging service Midjourney has been embraced by storytellers who have also been clamoring for a feature that enables characters to regenerate consistently across new requests. Now Midjourney is delivering that functionality with the addition of the new “–cref” tag (short for Character Reference), available for those who are using Midjourney v6 on the Discord server. Users can achieve the effect by adding the tag to the end of text prompts, followed by a URL that contains the master image subsequent generations should match. Midjourney will then attempt to repeat the particulars of a character’s face, body and clothing characteristics. Continue reading Midjourney Creates a Feature to Advance Image Consistency

Alibaba’s EMO Can Generate Performance Video from Images

By ETCentric Staff
March 11, 2024

Alibaba is touting a new artificial intelligence system that can animate portraits, making people sing and talk in realistic fashion. Researchers at the Alibaba Group’s Institute for Intelligent Computing developed the generative video framework, calling it EMO, short for Emote Portrait Alive. Input a single reference image along with “vocal audio,” as in talking or singing, and “our method can generate vocal avatar videos with expressive facial expressions and various head poses,” the researchers say, adding that EMO can generate videos of any duration, “depending on the length of video input.” Continue reading Alibaba’s EMO Can Generate Performance Video from Images

Pika Taps ElevenLabs Audio App to Add Lip Sync to AI Video

By ETCentric Staff
March 1, 2024

On the heels of ElevenLabs’ demo of a text-to-sound app unveiled using clips generated by OpenAI’s text-to-video artificial intelligence platform Sora, Pika Labs is releasing a feature called Lip Sync that lets its paid subscribers use the ElevenLabs app to add AI-generated voices and dialogue to Pika-generated videos and have the characters’ lips moving in sync with the speech. Pika Lip Sync supports both uploaded audio files and text-to-audio AI, allowing users to type or record dialogue, or use pre-existing sound files, then apply AI to change the voicing style. Continue reading Pika Taps ElevenLabs Audio App to Add Lip Sync to AI Video