VideoPoet: Google Launches a Multimodal AI Video Generator

Google has unveiled a new large language model designed to advance video generation. VideoPoet is capable of text-to-video, image-to-video, video stylization, video inpainting and outpainting, and video-to-audio. “The leading video generation models are almost exclusively diffusion-based,” Google says, citing Imagen Video as an example. Google finds this counter intuitive, since “LLMs are widely recognized as the de facto standard due to their exceptional learning capabilities across various modalities.” VideoPoet eschews the diffusion approach of relying on separately trained tasks in favor of integrating many video generation capabilities in a single LLM. Continue reading VideoPoet: Google Launches a Multimodal AI Video Generator