By 
Paula ParisiSeptember 30, 2025
 
          
            Nvidia has made its Audio2Face open source, a potential boon for game developers and other 3D uses such as customer service. The generative AI facial animation system brings lifelike speech and expression to avatars on an accelerated basis using real-time facial animation and lip-sync. It works by analyzing acoustic features to create a stream of animation data that is then mapped onto a character’s facial poses. The data translates to “accurate lip-sync and emotional expressions,” says Nvidia, noting the imagery can be rendered offline for pre-scripted content or streamed in real time for dynamic characters with accurate lip-sync and emotional expressions. Continue reading Nvidia Audio2Face AI Avatar-Generator Is Now Open Source
           
        
        
        
          
                        
            By 
Paula ParisiAugust 27, 2025
 
          
            Instagram has started to let creators link multiple Reels, effectively creating their own “series.” The new capability, which TikTok already offers, makes it more convenient for viewers to discover and follow related content. The new “Link a Reel” feature lets creators sequentially either link generally related content or connect specific, sequentially related material, as in Part 2, Part 3, etc. Meta Platforms has also added a free AI-powered voice translation feature. Now available globally on Facebook and Instagram, it automatically dubs and lip syncs Reels into other languages using the sound and tone of the creator’s own voice. Continue reading Meta Updates Features for Reels on Facebook and Instagram
           
        
        
        
          
                        
            By 
Paula ParisiMay 22, 2025
 
          
            Google is in a filmmaking frame of mind. The search giant introduced Veo 3, the latest version of its generative video model, loading it with cinematic capabilities including a new AI storytelling tool called Flow. At the Google I/O conference the company also debuted an upgraded image generator, Imagen 4, and announced expanded access to the AI music tool Lyria 2. Veo 3 can generate videos with audio — a Google first, adding things like background traffic noises, birds singing, “even dialogue between characters.” It offers improved consistency of characters, scenes and objects, while gaining camera controls, outpainting and object add/remove. Continue reading Google Upgrades GenAI Models, Debuts AI Storyteller ‘Flow’
           
        
        
        
          
                        
            By 
Paula ParisiMarch 12, 2025
 
          
            Amazon is experimenting with AI dubbing so Prime Video customers globally can experience content from other territories, gaining access more quickly and efficiently to licensed films and TV series. The company is using a hybrid “AI-aided” system in which localization professionals oversee the AI output to ensure quality control. Currently limited to a dozen movies and series that will be AI-dubbed in English and Latin American Spanish, the pilot will expand if the results prove popular with audiences. In December, Netflix experienced backlash against AI-assisted dubbing, with viewers complaining generative mouth adjustments looked unnatural. Continue reading Amazon Prime Video Tests AI Dubbing for Movies and Series
           
        
        
        
          
                        
            By 
Paula ParisiFebruary 6, 2025
 
          
            ByteDance has developed a generative model that can use a single photo to generate photorealistic video of humans in motion. Called OmniHuman-1, the multimodal system supports various visual and audio styles and can generate people doing things like singing, dancing, speaking and moving in a natural fashion. ByteDance says its new technology clears hurdles that hinder existing human-generators — obstacles like short play times and over-reliance on high-quality training data. The diffusion transformer-based OmniHuman addressed those challenges by mixing motion-related conditions into the training phase, a solution ByteDance researchers claim is new. Continue reading ByteDance’s AI Model Can Generate Video from Single Image
           
        
        
        
          
                        
            By 
Paula ParisiNovember 12, 2024
 
          
            Panjaya is a AI startup that aims to disrupt the world of video dubbing with a way to generate “hyperrealistic” recreations of a person’s voice speaking a new language. The system also automatically modifies the imagery to match lip and other physical movements to match the new speech patterns. Called BodyTalk, the technique is the launch point for Panjaya as it emerges from the stealth in which it conducted its R&D the past three years, backed by $9.5 million from venture funds and angel backers. The startup describes BodyTalk as “AI dubbing that looks and feels as natural as the original.” Continue reading BodyTalk Dubs into 29 Languages with Facial Moves to Match
           
        
        
        
          
                        
            By 
Paula ParisiAugust 23, 2024
 
          
            D-ID, a platform that uses AI to generate digital humans, has announced D-ID Video Translate in general availability. The tool lets businesses and content creators automatically re-voice videos in multiple languages, “cloning the speaker’s voice and adapting their lip movements from a single upload.” D-ID is making the Video Translate tool, which accommodates 30 different languages, free to D-ID subscribers for a limited time, available through the D-ID Studio or the company’s API. Languages include Arabic, Mandarin, Japanese, Hindi and Ukrainian, in addition to Spanish, German, French and Italian. Users can simultaneously translate content using bulk translation. Continue reading D-ID Employs AI to Translate Videos into Multiple Languages
           
        
        
        
          
                        
            By 
Paula ParisiAugust 5, 2024
 
          
            AI media firm Runway has launched Gen-3 Alpha, building on the text-to-video model by using images to prompt realistic videos generated in seconds. Navigate to Runway’s web-based interface and click on “try Gen 3-Alpha” and you’ll land on a screen with an image uploader, as well as a text box for those who either prefer that approach or want to use natural language to tweak results. Runway lets users generate up to 10 seconds of contiguous video using a credit system. “Image to Video is major update that greatly improves the artistic control,” Runway said in an announcement. Continue reading Runway’s Gen-3 Alpha Creates Realistic Video from Still Image
           
        
        
        
          
                        
            By 
Paula ParisiJune 27, 2024
 
          
            Synthesia, which uses AI to create business avatars for use in content such as training, presentation and customer service videos, has announced a major platform update. “Coming soon” with Synthesia 2.0 are full-body avatars that include hands capable of a wide range of motions. Users can animate motion using skeletal sequences on which the persona selected from the catalog can then be automatically mapped. Starting next month, the Nvidia-backed UK company will offer the ability to incorporate brand identity — including typography, colors and logos — into templated videos. A new translation tool automatically applies updates to all languages. Continue reading Lifelike AI Avatars to Get New Features with Synthesia Update
           
        
        
        
          
                        
            By 
Paula ParisiJune 6, 2024
 
          
            ElevenLabs has launched its text-to-sound generator Sound Effects for all users, available now at the company’s website. The new AI tool can create audio effects, short instrumental tracks, soundscapes and even character voices. Sound Effects “has been designed to help creators — including film and television studios, video game developers, and social media content creators — generate rich and immersive soundscapes quickly, affordably and at scale,” according to the startup, which developed the tool in partnership with Shutterstock, using its library of licensed audio tracks. Continue reading ElevenLabs Launches an AI Tool for Generating Sound Effects
           
        
        
        
          
          
            London-based AI-startup Synthesia, which creates avatars for enterprise-level generative video presentations, has added “Expressive Avatars” to its feature kit. Powered by Synthesia’s new Express-1 model, these fourth-generation avatars have achieved a new benchmark in realism by using contextual expressions that approximates human emotion, the company says. Express-1 has been trained “to understand the intricate relationship between what we say and how we say it,” allowing Expressive Avatars to perform a script with the correct vocal tone, body language and lip movement, “like a real actor,” according to Synthesia. Continue reading Synthesia Express-1 Model Gives ‘Expressive Avatars’ Emotion
           
        
        
        
          
                        
            By 
ETCentric StaffApril 22, 2024
 
          
            Microsoft has developed VASA, a framework for generating lifelike virtual characters with vocal capabilities including speaking and singing. The premiere model, VASA-1, can perform the feat in real time from a single static image and a vocalization clip. The research demo showcases realistic audio-enhanced faces that can be fine-tuned to look in different directions or change expression in video clips of up to one minute at 512 x 512 pixels and up to 40fps “with negligible starting latency,” according to Microsoft, which says “it paves the way for real-time engagements with lifelike avatars that emulate human conversational behaviors.” Continue reading Microsoft’s VASA-1 Can Generate Talking Faces in Real Time
           
        
        
        
          
                        
            By 
ETCentric StaffMarch 15, 2024
 
          
            Artificial intelligence imaging service Midjourney has been embraced by storytellers who have also been clamoring for a feature that enables characters to regenerate consistently across new requests. Now Midjourney is delivering that functionality with the addition of the new “–cref” tag (short for Character Reference), available for those who are using Midjourney v6 on the Discord server. Users can achieve the effect by adding the tag to the end of text prompts, followed by a URL that contains the master image subsequent generations should match. Midjourney will then attempt to repeat the particulars of a character’s face, body and clothing characteristics. Continue reading Midjourney Creates a Feature to Advance Image Consistency
           
        
        
        
          
                        
            By 
ETCentric StaffMarch 11, 2024
 
          
            Alibaba is touting a new artificial intelligence system that can animate portraits, making people sing and talk in realistic fashion. Researchers at the Alibaba Group’s Institute for Intelligent Computing developed the generative video framework, calling it EMO, short for Emote Portrait Alive. Input a single reference image along with “vocal audio,” as in talking or singing, and “our method can generate vocal avatar videos with expressive facial expressions and various head poses,” the researchers say, adding that EMO can generate videos of any duration, “depending on the length of video input.” Continue reading Alibaba’s EMO Can Generate Performance Video from Images
           
        
        
        
          
                        
            By 
ETCentric StaffMarch 1, 2024
 
          
            On the heels of ElevenLabs’ demo of a text-to-sound app unveiled using clips generated by OpenAI’s text-to-video artificial intelligence platform Sora, Pika Labs is releasing a feature called Lip Sync that lets its paid subscribers use the ElevenLabs app to add AI-generated voices and dialogue to Pika-generated videos and have the characters’ lips moving in sync with the speech. Pika Lip Sync supports both uploaded audio files and text-to-audio AI, allowing users to type or record dialogue, or use pre-existing sound files, then apply AI to change the voicing style. Continue reading Pika Taps ElevenLabs Audio App to Add Lip Sync to AI Video