November 20, 2023
Having made the leap from image generation to video generation over the course of a few months in 2022, Meta Platforms introduces Emu, its first visual foundational model, along with Emu Video and Emu Edit, positioned as milestones in the trek to AI moviemaking. Emu uses just two diffusion models to generate 512×512 four-second long videos at 16 frames per second, Meta said, comparing that to 2022’s Make-A-Video, which requires a “cascade” of five models. Internal research found Emu video generations were “strongly preferred” over the Make-A-Video model based on quality (96 percent) and prompt fidelity (85 percent).
“With Emu Video, which leverages our Emu model, we present a simple method for text-to-video generation based on diffusion models,” Meta says in a blog post that explains its new unified architecture for video-generation tasks “can respond to a variety of inputs: text only, image only, and both text and image.”
Emu (which stands for Expressive Media Universe), was announced at Meta Connect in September and now the company is more fully demonstrating its capabilities and positioning products, while still describing Emu as “purely fundamental research.”
“Emu Video’s clips can be edited with a complementary AI model called Emu Edit,” writes TechCrunch, explaining “users can describe the modifications they want to make to Emu Edit in natural language — e.g. ‘the same clip, but in slow motion’ — and see the changes reflected in a newly generated video.”
“Imagine generating your own animated stickers or clever GIFs on the fly to send in the group chat rather than having to search for the perfect media for your reply,” Meta writes. “Or editing your own photos and images, no technical skills required. Or adding some extra oomph to your Instagram posts by animating static photos.”
“Emu Edit was trained on a massive dataset consisting of 10 million synthesized samples, making it capable of delivering high-quality results in terms of instruction faithfulness and image quality,” reports VentureBeat, noting that “a user could input the text ‘Aloha!’ to be added to an image of a baseball cap, and Emu Edit would accomplish this task without altering the cap itself.”
Meta has provided a research paper on the new model.