VideoPoet: Google Launches a Multimodal AI Video Generator

Google has unveiled a new large language model designed to advance video generation. VideoPoet is capable of text-to-video, image-to-video, video stylization, video inpainting and outpainting, and video-to-audio. “The leading video generation models are almost exclusively diffusion-based,” Google says, citing Imagen Video as an example. Google finds this counter intuitive, since “LLMs are widely recognized as the de facto standard due to their exceptional learning capabilities across various modalities.” VideoPoet eschews the diffusion approach of relying on separately trained tasks in favor of integrating many video generation capabilities in a single LLM. Continue reading VideoPoet: Google Launches a Multimodal AI Video Generator

Stability Introduces GenAI Video Model: Stable Video Diffusion

Stability AI has opened research preview on its first foundation model for generative video, Stable Video Diffusion, offering text-to-video and image-to-video. Based on the company’s Stable Diffusion text-to-image model, the new open-source model generates video by animating existing still frames, including “multi-view synthesis.” While the company plans to enhance and extend the model’s capabilities, it currently comes in two versions: SVD, which transforms stills into 576×1024 videos of 14 frames, and SVD-XT that generates up to 24 frames — each at between three and 30 frames per second. Continue reading Stability Introduces GenAI Video Model: Stable Video Diffusion

Meta Touts Its Emu Foundational Model for Video and Editing

Having made the leap from image generation to video generation over the course of a few months in 2022, Meta Platforms introduces Emu, its first visual foundational model, along with Emu Video and Emu Edit, positioned as milestones in the trek to AI moviemaking. Emu uses just two diffusion models to generate 512×512 four-second long videos at 16 frames per second, Meta said, comparing that to 2022’s Make-A-Video, which requires a “cascade” of five models. Internal research found Emu video generations were “strongly preferred” over the Make-A-Video model based on quality (96 percent) and prompt fidelity (85 percent). Continue reading Meta Touts Its Emu Foundational Model for Video and Editing

Social Startup Plai Labs Debuts Free Text-to-Video Generator

The entrepreneurs behind the Myspace social network and gaming company Jam City have shifted their focus to generative AI and web3 with a new venture, Plai Labs, a social platform that provides AI tools for collaboration and connectivity. Plai Labs has released a free text-to-video generator, PlaiDay, which will compete with other GenAI video tools from the likes of OpenAI (DALL-E 2), Google (Imagen), Meta Platforms (Make-A-Video) and Stable Diffusion. But PlaiDay hopes to set itself apart by offering the ability to personalize videos with selfie likenesses. Continue reading Social Startup Plai Labs Debuts Free Text-to-Video Generator

Startup Kaiber Launches Mobile GenAI App for Music Videos

Kaiber, the AI-powered creative studio whose credits include music video collaborations with artists such as Kid Cudi and Linkin Park, has launched a mobile version of its creator tools designed to give musicians and graphic artists on-the-go access to its suite of GenAI tools offering text-to-video, image-to-video and video-to-video, “now with curated music to reimagine the music video creation process.” Users can select artist tracks to accompany visuals to build a music video “with as much or little AI collaboration as they wish.” Users can also upload their own music or audio and tap Kaiber for visuals. Continue reading Startup Kaiber Launches Mobile GenAI App for Music Videos

Samsung Next Invests in Irreverent Labs’ Text-to-Video Tech

Seattle-area startup Irreverent Labs has shifted its focus from blockchain-based video games and NFTs to artificial intelligence. Specifically, it wants to build foundation models for text-to-video generation and related content creation tools. Text-to-video is being explored by several companies but is still in development. Samsung Next was intrigued enough with the proposition to invest an undisclosed sum in Irreverent. While there are several apps that output cartoonish results, ambitious efforts are limited. Animations that aim for photorealism, such as Meta’s Make-a-Video and Runway’s Gen-2, can output only four or five seconds of video at a time. Continue reading Samsung Next Invests in Irreverent Labs’ Text-to-Video Tech

Runway Makes Next Advance in Consumer Text-to-Video AI

Google-backed AI startup Runway has released Gen-2, an early entry among commercially available text-to-video models. Previously waitlisted in limited release, the commercial availability is impactful, since text-to-video is predicted as the next big bump in artificial intelligence, following the explosion of AI use generating text and images. While Runway’s solution may not be ready to serve as a professional video tool, this is the next step in development of tech expected to impact media and entertainment. Filmmaker Joe Russo recently predicted that within the next two years, AI may have the ability to create feature films. Continue reading Runway Makes Next Advance in Consumer Text-to-Video AI

Runway Opens Waitlist for Its Gen 2 Text-to-Video AI System

New York-based Runway is releasing its Gen 2 system, which generates video clips of up to a few seconds from text or image-based user prompts. The company, which specializes in artificial intelligence-enhanced film and editing tools, has opened a waitlist for the new product that will be accessed through a private Discord channel by an audience grown over time. Last year, Meta Platforms and Google both previewed text-to-video software in the research stage, but neither detailed plans to make their platforms public. Bloomberg called Runway’s limited launch “the most high-profile instance of such text-to-video generation outside of a lab.” Continue reading Runway Opens Waitlist for Its Gen 2 Text-to-Video AI System

QuickVid Uses AI to Create Short Videos from Text Prompts

QuickVid is a new AI-driven text-to-video platform aiming for a mass market user base. The tool draws on various generative AI systems to automatically create short-form videos for YouTube, Instagram, TikTok and other platforms. Created by former Meta Platforms programmer Daniel Habib “in a matter of weeks,” QuickVid is quite rudimentary, though Habib says he plans to continue fine tuning and adding features. Unlike Google and Meta have done with their nascent text-to-video systems, QuickVid has bypassed the formalities of research papers and industry previews and jumped directly to a public-facing website. Continue reading QuickVid Uses AI to Create Short Videos from Text Prompts

OpenAI’s Point-E Offers a New Take on Text-to-3D Modeling

In the wake of overwhelming public response to recent offerings DALL-E 2 and ChatGPT, OpenAI this week introduced Point-E, a text-to-3D model generator that is garnering positive feedback. Faster and less resource intensive than comparable systems, it’s still in the early stages and prone to occasional disjointed results but has advanced the proposition. Using a single Nvidia V100 GPU, Point-E can create a 3D model in under two minutes, generating “point clouds” — data sets representing a 3D shape. Point clouds compute more easily than the wire-fame meshes traditionally used to model 3D objects. Continue reading OpenAI’s Point-E Offers a New Take on Text-to-3D Modeling

Google Shows Off Impressive Range of AI at NY Media Event

Google Research is touting new advances in artificial intelligence, which can now generate its own code and write fiction, in addition to better text-to-video and language translation. At a New York media event at Google’s Pier 57 office — which opened earlier this year to become the company’s third Manhattan outpost — roughly a dozen projects in various stages of development were on display, with robot learning, LaMDA (language model for dialogue applications) and text-generated 3D images sharing the spotlight with practical AI for things like disaster management, weather forecasts and healthcare. Continue reading Google Shows Off Impressive Range of AI at NY Media Event

Google and Meta Are Developing AI Text-to-Video Generators

AI image generators like OpenAI’s DALL-E 2 and Google’s Imagen have been generating a lot of attention recently. Now AI text-to-video generators are edging into the spotlight, with Google debuting Imagen Video on the heels of Meta AI’s Make-A-Video rollout last month. Imagen Video has been used to generate videos of up to 25-minutes at a 24 fps, 1280×768 pixel spec. Imagen Video was trained “on a combination of an internal dataset consisting of 14 million video-text pairs and 60 million image-text pairs,” resulting in some unusual functionality, according to Google Research. Continue reading Google and Meta Are Developing AI Text-to-Video Generators