Stability AI Advances Image Generation with Stable Cascade

Stability AI, purveyor of the popular Stable Diffusion image generator, has introduced a completely new model called Stable Cascade. Now in preview, Stable Cascade uses a different architecture than Stable Diffusion’s SDXL that the UK company’s researchers say is more efficient. Cascade builds on a compression architecture called Würstchen (German for “sausage”) that Stability began sharing in research papers early last year. Würstchen is a three-stage process that includes two-step encoding. It uses fewer parameters, meaning less data to train on, greater speed and reduced costs. Continue reading Stability AI Advances Image Generation with Stable Cascade

Google Takes New Approach to Create Video with Lumiere AI

Google has come up with a new approach to high resolution AI video generation with Lumiere. While most GenAI video models output individual high resolution frames at various points in the sequence (called “distant keyframes”), fill in the missing frames with low-res images to create motion (known as “temporal super-resolution,” or TSR), then up-res that connective tissue (“spatial super-resolution,” or SSR) of non-overlapping frames, Lumiere takes what Google calls a “Space-Time U-Net architecture,” which processes all frames at once, “without a cascade of TSR models, allowing us to learn globally coherent motion.” Continue reading Google Takes New Approach to Create Video with Lumiere AI

VideoPoet: Google Launches a Multimodal AI Video Generator

Google has unveiled a new large language model designed to advance video generation. VideoPoet is capable of text-to-video, image-to-video, video stylization, video inpainting and outpainting, and video-to-audio. “The leading video generation models are almost exclusively diffusion-based,” Google says, citing Imagen Video as an example. Google finds this counter intuitive, since “LLMs are widely recognized as the de facto standard due to their exceptional learning capabilities across various modalities.” VideoPoet eschews the diffusion approach of relying on separately trained tasks in favor of integrating many video generation capabilities in a single LLM. Continue reading VideoPoet: Google Launches a Multimodal AI Video Generator

Nvidia’s New AI Method Can Reconstruct an Image in Seconds

Nvidia debuted a deep learning method that can edit or reconstruct an image that is missing pixels or has holes via a process called “image inpainting.” The model can handle holes of “any shape, size, location or distance from image borders,” and could be integrated in photo editing software to remove undesirable imagery and replace it with a realistic digital image – instantly and with great accuracy. Previous AI-based approaches focused on rectangular regions in the image’s center and required post processing. Continue reading Nvidia’s New AI Method Can Reconstruct an Image in Seconds