Generative Audio Archives

Google Veo 3.1 Advances Generative Video in Flow and Vertex

By Paula Parisi
October 24, 2025

Google has released Veo 3.1 and Veo 3.1 Fast in paid preview, adding new capabilities to the generative video model that is already a leader in the field. Creative and technical upgrades include richer native audio from dialogue to sound effects, greater understanding of cinematic styles and better prompt adherence. The two new models are available via the Gemini API in Google AI Studio and Vertex AI, with Veo 3.1 also available in the Gemini app and the storytelling tool Flow, which now gets native audio. Flow has generated more than 275 million videos since its release at Google I/O in May, according to the company. Continue reading Google Veo 3.1 Advances Generative Video in Flow and Vertex

OpenAI Sora 2 Vid Generator Has Sound and Social Features

By Paula Parisi
October 2, 2025

Sora 2 is here, “marking a giant leap forward in realism,” claims OpenAI. And it includes sound and dialogue generation, catching up to Google’s Veo 3. Coming nearly two years after Sora was first introduced, the new model is being released in conjunction with a free iOS social app with a vertical feed and “swipe-and-scroll” functionality like TikTok, YouTube Shorts and Instagram Reels. Available in the U.S. and Canada, the fee version — which currently requires an invitation — is also available at sora.com. ChatGPT Pro subscribers can access an experimental, higher quality Sora 2 Pro model online only. Continue reading OpenAI Sora 2 Vid Generator Has Sound and Social Features

Alibaba’s Qwen3-Omni AI Ingests Text, Images, Audio, Video

By Paula Parisi
September 24, 2025

Alibaba Cloud’s newest AI model, Qwen3-Omni-30B-A3B, has debuted with a splash. The Chinese company is touting it as “the first natively end-to-end omni-modal AI unifying text, image, audio & video in one model.” While Qwen3-Omni can accept prompts of text, image, audio and video, it only outputs text and audio. Alibaba Cloud has released the three versions of Qwen3-Omni so users can select based on their needs, choosing between general multimodal capabilities, deep reasoning or specialized audio understanding. Alibaba has also developed an AI chip called T-Head that performs comparably to Nvidia’s H20. Continue reading Alibaba’s Qwen3-Omni AI Ingests Text, Images, Audio, Video

YouTube Unveils Array of New AI Tools, Including Video Model

By Paula Parisi
September 18, 2025

AI-powered creator tools were a central focus at this week’s Made on YouTube event, where the company unveiled Veo 3 Fast, a “quicker and more cost effective version” of Google’s powerful video model, Veo 3 that is being made available free for creators in YouTube Shorts. Veo 3 Fast generates outputs with lower latency at 480p and also generates sound, a first for YouTube Shorts. YouTube created the mobile-first Veo 3 Fast model in partnership with artificial intelligence research lab Google DeepMind. Video creators will soon be able to leverage Veo 3 Fast to add backgrounds and objects or change styles using prompts. Continue reading YouTube Unveils Array of New AI Tools, Including Video Model

Adobe Adds Generative Audio and Text-to-Avatar to Firefly AI

By Paula Parisi
July 21, 2025

Adobe’s Firefly Video model has introduced new updates including Generate Sound Effects, in beta, and a text-to-avatar feature that lets users turn scripts into avatar-led videos “in just a few clicks.” Firefly becomes the second video model to generate audio, joining Veo 3, although unlike Google’s AI video tool Firefly does not yet generate dialogue. What it can do is output foley-like sound and sound effects, while text-to-avatar can generate speech. As with Firefly’s generative visuals, Adobe says Generate Sound Effects is “commercially safe,” which means they are trained only on licensed or publicly available material. Continue reading Adobe Adds Generative Audio and Text-to-Avatar to Firefly AI

OpenAI: sCM Generates Media 50x Faster Than Other Models

By Paula Parisi
October 28, 2024

OpenAI is taking a new approach to generating media that it says is 50 times faster than the models commonly used today. Called sCM, the approach is a “consistency model,” a variation on the diffusion method used by many leading systems. OpenAI claims its new model is ideal for training for large scale datasets and generating video, audio and images that are of “comparable sample quality to leading diffusion models.” Such models often require hundreds of steps, creating challenges when it comes to real-time applications. OpenAI aims to change this with a faster system that requires less power. Continue reading OpenAI: sCM Generates Media 50x Faster Than Other Models