Google and Meta Are Developing AI Text-to-Video Generators

By Paula Parisi
October 10, 2022

AI image generators like OpenAI’s DALL-E 2 and Google’s Imagen have been generating a lot of attention recently. Now AI text-to-video generators are edging into the spotlight, with Google debuting Imagen Video on the heels of Meta AI’s Make-A-Video rollout last month. Imagen Video has been used to generate videos of up to 25-minutes at a 24 fps, 1280×768 pixel spec. Imagen Video was trained “on a combination of an internal dataset consisting of 14 million video-text pairs and 60 million image-text pairs,” resulting in some unusual functionality, according to Google Research.

“We find that Imagen Video is capable of generating high fidelity video” and offers “several unique capabilities that are not traditionally found in unstructured generative models learned purely from data,” the Google Research Brain Team writes in a scholarly paper.

For example, Imagen Video can generate “videos with artistic styles learned from image information, such as videos in the style of van Gogh paintings or watercolor paintings.” Imagen Video also understands 3D structure, producing “videos of objects rotating while roughly preserving structure.”

Imagen Video’s model training also includes the publicly available LAION-400M image-text dataset, which TechCrunch says “enabled it to generalize to a range of aesthetics,” pointing out that “a portion of LAION was used to train Stable Diffusion,” the image generator released by Stability AI this summer.

Imagen Video is based on a cascade of video diffusion models. “Currently, it’s in a research phase, but its appearance five months after Google Imagen points to the rapid development of video synthesis models,” writes Ars Technica. While the results thus far “aren’t perfect,” according to TechCrunch, which says “the looping clips the system generates tend to have artifacts and noise,” and the results “are jittery and distorted in parts,” with “objects that blend together in physically unnatural — and impossible — ways.”

As a result of machine learning, the more it tries, the more rapidly it will improve. Google says the program has “a high degree of controllability and world knowledge.”

TechCrunch concedes Imagen Video represents “a significant leap over the previous state-of-the-art, showing an aptitude for animating captions that existing systems would have trouble understanding.”

Meanwhile, Meta’s Make-a-Video looks, if not quite ready for prime time, at least prepared for a social media close-up. A flying Corgi sporting sunglasses and a super-hero cape seems like the stuff virality is made of. Make-a-Video is also in the research stage.

Google and Meta aren’t the only ones experimenting with AI-generated video. “Earlier this year, a group of researchers from Tsinghua University and the Beijing Academy of Artificial Intelligence released CogVideo, which can translate text into reasonably high-fidelity short clips,” TechCrunch writes.

And last month, Adobe released AI-powered versions of Elements for Photoshop and Premiere that add motion and artistic styles to stills and video.

Google and Meta Are Developing AI Text-to-Video Generators

No Comments Yet