Stable Diffusion Archives

Consistency Is Key: Lessons on Generative AI via ‘The Bends’

By Ben Abergel and Rachel Jobin
September 23, 2025

In less than three years, generative AI has evolved from an experimental toy to a regular presence in studio pitches, previs workflows, and even the festival circuit. Yet one challenge has stymied the full adoption of generative AI in long-form storytelling: establishing and maintaining control over outputs. This challenge also fuels many of the anxieties surrounding the use of artificial intelligence in media production. How can artists maintain their creative voice when a machine is doing all the artistic work, and often doing so with inconsistent results? The Entertainment Technology Center at USC set out to tackle these and related challenges with a new film project, “The Bends.” Continue reading Consistency Is Key: Lessons on Generative AI via ‘The Bends’

DeepSeek Follows Its R1 LLM Debut with Multimodal Janus-Pro

By Paula Parisi
January 30, 2025

Less than a week after sending tremors through Silicon Valley and across the media landscape with an affordable large language model called DeepSeek-R1, the Chinese AI startup behind that technology has debuted another new product — the multimodal Janus-Pro-7B with an aptitude for image generation. Further mining the vein of efficiency that made R1 impressive to many, Janus-Pro-7B utilizes “a single, unified transformer architecture for processing.” Emphasizing “simplicity, high flexibility and effectiveness,” DeepSeek says Janus Pro is positioned to be a frontrunner among next-generation unified multimodal models. Continue reading DeepSeek Follows Its R1 LLM Debut with Multimodal Janus-Pro

Black Forest Labs Announces Suite of Text-to-Image Models

By Paula Parisi
August 6, 2024

A new generative AI startup called Black Forest Labs has hit the scene, debuting with a suite of text-to-image models branded FLUX.1. Based in Germany, Black Forest was founded by some of the researchers involved in developing Stable Diffusion and has raised $31 million in funding from principal investor Andreessen Horowitz and angels including CAA founder and former talent agent Michael Ovitz. The FLUX.1 suite focuses on “image detail, prompt adherence, style diversity and scene complexity,” the company says of its three initial variants: FLUX.1 [pro], FLUX.1 [dev] and FLUX.1 [schnell]. Continue reading Black Forest Labs Announces Suite of Text-to-Image Models

Canva Aims to Boost Its GenAI Efforts with Leonardo Purchase

By Rob Scott
August 1, 2024

Graphic design company Canva announced it is acquiring fellow Australian startup Leonardo AI with plans to have Leonardo’s 120 employees, including executives, join the Canva AI team. Financial terms of the deal were not disclosed. Sydney-based Leonardo has been gaining attention for its advanced generative AI platform that helps users create images and art based on the open-source Stable Diffusion model developed by Stability AI. The Leonardo team claims its offering is different than other AI art platforms since it provides users with more control. Users can experiment with text prompts and quick sketches as Leonardo.ai creates photorealistic images in real time. Continue reading Canva Aims to Boost Its GenAI Efforts with Leonardo Purchase

Stable Video 4D Adds Time Dimension to Generative Imagery

By Paula Parisi
July 29, 2024

Stability AI has unveiled an experimental new model, Stable Video 4D, which generates photorealistic 3D video. Building on what it created with Stable Video Diffusion, released in November, this latest model can take moving image data of an object and iterate it from multiple angles — generating up to eight different perspectives. Stable Video 4D can generate five frames across eight views in about 40 seconds using a single inference, according to the company, which says the model has “future applications in game development, video editing, and virtual reality.” Users begin by uploading a single video and specifying desired 3D camera poses. Continue reading Stable Video 4D Adds Time Dimension to Generative Imagery

New Prototype Is the World’s First AI-Powered Movie Camera

By Paula Parisi
July 1, 2024

The world’s first AI-powered movie camera has surfaced. Still in development, it aims to enable filmmakers to turn footage into AI imagery in real time while shooting. Called the CMR-M1, for camera model 1, it is the product of creative tech agency SpecialGuestX and media firm 1stAveMachine, with the goal of providing creatives with a familiar interface for AI imagemaking. It was inspired by the Cine-Kodak device, the first portable 16mm camera. “We designed a camera that serves as a physical interface to AI models,” said Miguel Espada, co-founder and executive creative technologist at SpecialGuestX, a company that does not think directors will use AI sitting at a keyboard. Continue reading New Prototype Is the World’s First AI-Powered Movie Camera

New Tech from MIT, Adobe Advances Generative AI Imaging

By ETCentric Staff
March 28, 2024

Researchers from the Massachusetts Institute of Technology and Adobe have unveiled a new AI acceleration tool that makes generative apps like DALL-E 3 and Stable Diffusion up to 30x faster by reducing the process to a single step. The new approach, called distribution matching distillation, or DMD, maintains or enhances image quality while greatly streamlining the process. Theoretically, the technique “marries the principles of generative adversarial networks (GANs) with those of diffusion models,” consolidating “the hundred steps of iterative refinement required by current diffusion models” into one step, MIT PhD student and project lead Tianwei Yin says. Continue reading New Tech from MIT, Adobe Advances Generative AI Imaging

Stable Video 3D Generates Orbital Animation from One Image

By ETCentric Staff
March 25, 2024

Stability AI has released Stable Video 3D, a generative video model based on the company’s foundation model Stable Video Diffusion. SV3D, as it’s called, comes in two versions. Both can generate and animate multi-view 3D meshes from a single image. The more advanced version also let users set “specified camera paths” for a “filmed” look to the video generation. “By adapting our Stable Video Diffusion image-to-video diffusion model with the addition of camera path conditioning, Stable Video 3D is able to generate multi-view videos of an object,” the company explains. Continue reading Stable Video 3D Generates Orbital Animation from One Image

Alibaba’s EMO Can Generate Performance Video from Images

By ETCentric Staff
March 11, 2024

Alibaba is touting a new artificial intelligence system that can animate portraits, making people sing and talk in realistic fashion. Researchers at the Alibaba Group’s Institute for Intelligent Computing developed the generative video framework, calling it EMO, short for Emote Portrait Alive. Input a single reference image along with “vocal audio,” as in talking or singing, and “our method can generate vocal avatar videos with expressive facial expressions and various head poses,” the researchers say, adding that EMO can generate videos of any duration, “depending on the length of video input.” Continue reading Alibaba’s EMO Can Generate Performance Video from Images

Stability AI Advances Image Generation with Stable Cascade

By ETCentric Staff
February 16, 2024

Stability AI, purveyor of the popular Stable Diffusion image generator, has introduced a completely new model called Stable Cascade. Now in preview, Stable Cascade uses a different architecture than Stable Diffusion’s SDXL that the UK company’s researchers say is more efficient. Cascade builds on a compression architecture called Würstchen (German for “sausage”) that Stability began sharing in research papers early last year. Würstchen is a three-stage process that includes two-step encoding. It uses fewer parameters, meaning less data to train on, greater speed and reduced costs. Continue reading Stability AI Advances Image Generation with Stable Cascade

Google Takes New Approach to Create Video with Lumiere AI

By Paula Parisi
January 26, 2024

Google has come up with a new approach to high resolution AI video generation with Lumiere. While most GenAI video models output individual high resolution frames at various points in the sequence (called “distant keyframes”), fill in the missing frames with low-res images to create motion (known as “temporal super-resolution,” or TSR), then up-res that connective tissue (“spatial super-resolution,” or SSR) of non-overlapping frames, Lumiere takes what Google calls a “Space-Time U-Net architecture,” which processes all frames at once, “without a cascade of TSR models, allowing us to learn globally coherent motion.” Continue reading Google Takes New Approach to Create Video with Lumiere AI

CES: The Asus ROG Phone 8 Series Highlights Mobile Gaming

By Paula Parisi
January 24, 2024

The Asus ROG Phone 8 series — demonstrated at CES 2024 in Las Vegas last week — is generating excellent reviews for its gaming capabilities and additional praise for its functionality as a smartphone. The devices start at $1,100 and tick up to an entry level of $1,500 for the ROG Phone 8 Pro. Asus calls the ROG Phone 8 series “the biggest redesign in its history,” and says it has evolved from just a gaming phone into a device suitable for streamers and content creators. At the heart of that is Qualcomm’s Snapdragon 8 Gen 3 Mobile Platform, supported by 8,533 Mbps LPDDR5X RAM and UFS 4.0 storage. Continue reading CES: The Asus ROG Phone 8 Series Highlights Mobile Gaming

CES: HP Spectre Laptops Get Intel Core Ultra, 9MP Webcam

By Paula Parisi
January 23, 2024

HP has updated its popular flagship laptop, the HP Spectre x360, and the early reviews are quite impressive. HP has added Intel Core Ultra processors with neural processing for AI tasks and a 9MP webcam and Wi-Fi 7 capability. The Spectre x360 14 features a 14-inch screen and Intel Arc integrated graphics processing, while the Spectre x360 16 screen is two-inches larger, and includes the option to add an Nvidia GeForce RTX 4050 GPU. Both OLED screens display at 2,880 x 1,800, 120 Hz, with VESA True Black HDR 400. The 2-in-1 laptops use Intel’s latest H series chips, which are 14th generation, Meteor Lake, integrating both x86 and Arm cores on the same chip. Continue reading CES: HP Spectre Laptops Get Intel Core Ultra, 9MP Webcam

VideoPoet: Google Launches a Multimodal AI Video Generator

By Paula Parisi
December 22, 2023

Google has unveiled a new large language model designed to advance video generation. VideoPoet is capable of text-to-video, image-to-video, video stylization, video inpainting and outpainting, and video-to-audio. “The leading video generation models are almost exclusively diffusion-based,” Google says, citing Imagen Video as an example. Google finds this counter intuitive, since “LLMs are widely recognized as the de facto standard due to their exceptional learning capabilities across various modalities.” VideoPoet eschews the diffusion approach of relying on separately trained tasks in favor of integrating many video generation capabilities in a single LLM. Continue reading VideoPoet: Google Launches a Multimodal AI Video Generator

Standalone Image Generator Is Among New AI Tools by Meta

By Paula Parisi
December 8, 2023

Meta Platforms is moving Imagine with Meta from its test bed as a generative AI experience in chats to a standalone experience on the Web that allows users to create high-resolution images using natural language text prompts. That is one of more than 20 generative AI features Meta is deploying to create new business opportunities globally leveraging AI across search, ads, business messaging and more. While most will wind up on Facebook, Instagram, Messenger and WhatsApp, some say Meta’s popular Facebook and Instagram platforms have plateaued at 2 to 3 billion users per month, circumscribing ad growth. Continue reading Standalone Image Generator Is Among New AI Tools by Meta