Google Is Testing ‘Hosted’ GenAI Audio Summaries in Search

Google is testing podcast-like audio search summaries generated by AI. Audio Overviews uses Google’s latest Gemini models to generate “quick, conversational audio overviews for certain search queries.” It can be enabled through Google Labs, the company’s public-facing portal to AI experiments. An Audio Overview “can help you get a lay of the land, offering a convenient, hands-free way to absorb information,” Google says, noting that the feature displays search results “right within the audio player” to make it easy to delve further. Google already had AI audio summaries in NotebookLM and Gemini. Like those, Search features AI discussion “hosts.” Continue reading Google Is Testing ‘Hosted’ GenAI Audio Summaries in Search

Character.AI Goes Wide with AvatarFX, Adds Mobile Features

Chatbot platform Character.AI is rolling out its video generator, AvatarFX, in general release after a month in closed beta. It’s also adding a sharing feature called Scenes and Streams that will serve content to Character.AI’s community feed, coming soon to mobile. Users can now tap AvatarFX to create up to five videos per day, starting by uploading a photo, choosing a voice and writing dialogue for the character. Character.AI started as 1:1 text chat in the summer of 2023. Now the company is “expanding into a multi-modal world” with “more ways for creators to build immersive narratives and dynamic experiences.” Continue reading Character.AI Goes Wide with AvatarFX, Adds Mobile Features

Amazon Tests Conversational AI ‘Hear the Highlights’ Feature

Amazon is testing audio product summaries that make “AI shopping experts” available for interactive pre-purchase exploration, guiding customers through the retail experience by highlighting key product features and analyzing customer reviews. The feature — launching in the U.S. for select products — is designed to “make product research fun and convenient, like having helpful friends discuss potential purchases to make shopping easier,” the company says. The initial focus is on “products that typically require consideration before purchase,” saving time through focused discussion. Customers can tap the “Hear the Highlights” button on product detail pages in the Amazon Shopping app. Continue reading Amazon Tests Conversational AI ‘Hear the Highlights’ Feature

Google Upgrades GenAI Models, Debuts AI Storyteller ‘Flow’

Google is in a filmmaking frame of mind. The search giant introduced Veo 3, the latest version of its generative video model, loading it with cinematic capabilities including a new AI storytelling tool called Flow. At the Google I/O conference the company also debuted an upgraded image generator, Imagen 4, and announced expanded access to the AI music tool Lyria 2. Veo 3 can generate videos with audio — a Google first, adding things like background traffic noises, birds singing, “even dialogue between characters.” It offers improved consistency of characters, scenes and objects, while gaining camera controls, outpainting and object add/remove. Continue reading Google Upgrades GenAI Models, Debuts AI Storyteller ‘Flow’

Stability AI Releases a Fast Stereo Audio-Generator for Mobile

Stability AI has released an AI model that generates stereo audio that is quick and lightweight enough for mobile devices. Called Stable Audio Open Small, the open-source model is the result of a collaboration between the AI startup and chipmaker Arm. While there are several AI-powered apps that generate audio — Suno and Udio among them — most rely on cloud processing, thus can’t be used offline. Stability says Stable Audio Open Small is also IP safe due to being trained entirely on audio from the royalty-free libraries Free Music Archive and Freesound. Continue reading Stability AI Releases a Fast Stereo Audio-Generator for Mobile

Audible Using AI Narration and Translation to Expand Catalog

Amazon’s Audible audiobook service is partnering with select publishers to bring more print and e-books into the spoken word realm and is leveraging AI narration and translation to help it happen at scale. This move aims to quickly boost Audible’s product offerings so it can compete more effectively against streamers like Apple and Spotify who have rapidly expanded their literary market share. “Audiobooks are the fastest-growing format in publishing,” yet of the millions of titles available today in print and as e-books, only 2-5 percent exist in audio form, according to the company. Continue reading Audible Using AI Narration and Translation to Expand Catalog

Character.AI Introduces New Video Generator in Closed Beta

Character.AI, a platform offering AI chatbots for socializing and role play, has released a video generation model called AvatarFX in closed beta. Promising the ability to make photorealistic images “come to life — speak, sing and emote — all with the click of a button,” the technology combines audio and video to create a variety of visual style and voice, from realistic 3D — including “non-human faces (like a favorite pet)” — to 2D animations, according to the company. AvatarFX also has the ability “to maintain strong temporal consistency with face, hand and body movement” and can “power videos with multiple speakers.” Continue reading Character.AI Introduces New Video Generator in Closed Beta

Instagram ‘Edits’ Video App Is Released for iOS and Android

Instagram has released a standalone video editing tool called Edits that is being described as a full-fledged suite that also has camera capabilities. The resulting content can be released on any social platform, not just those from Meta Platforms, though an Instagram account is required to access Edits. Available worldwide for iOS and Android, Edits is positioned as a way for social videographers to level-up their Instagram or Facebook Reels, but also as a tool for professionals who want a simple mobile solution for short-form videos. Edits also offers analytics so creators can see how their work is performing. Continue reading Instagram ‘Edits’ Video App Is Released for iOS and Android

Vertex AI Movie Studio Can Create Videos from Start to Score

Among the many tech advancements unveiled at Google Cloud Next include a major generative media upgrade to Vertex AI, Google Cloud’s managed AI development platform. The new Vertex AI Media Studio lets enterprise users generate complete videos from scratch using text prompts. Lyria, Google’s text-to-music model is now available on Vertex in private preview. Both are subject to an “allowlist.” Chirp 3 now creates custom voices with just 10 seconds of audio input, while Imagen 3 has gained improved abilities for reconstructing missing or damaged portions of an image. Continue reading Vertex AI Movie Studio Can Create Videos from Start to Score

Netflix Expands Dubbing and Subtitle Options to 30 Languages

Netflix has gone multilingual, adding a feature that lets viewers choose from a list of more than 30 languages for dubbing or subtitles on any title. The option has previously only been available via mobile and Web browsers, with TV options limited to a handful of choices deemed relevant based on geographic location. Referencing some of its most popular programming — such as South Korea’s “Squid Game,” Spain’s “Berlin” and France’s “Lupin” — Netflix explains, “we know that language availability is what helped these stories and characters find fans beyond their country of origin.” Continue reading Netflix Expands Dubbing and Subtitle Options to 30 Languages

Patreon Signs Podcasting Deals with Wondery and Sony Music

Patreon, a subscription platform popular among individual creators and small companies, is expanding beyond boutique service with a network initiative that has inked Wondery and Sony Music Entertainment to podcasting deals. Patreon says podcasting is its largest category, with participants earning more than $472 million from over 6.7 million paid memberships. The figure marked a 35 percent increase from 2023. With more than 100 million total memberships, Patreon says it is “the best place on the Internet for independent podcasters and media networks alike.” The 12-year-old company provides tools for creators to connect directly with fans. Continue reading Patreon Signs Podcasting Deals with Wondery and Sony Music

Nvidia Forges AI Initiative to Streamline Production Workflows

During Nvidia’s GTC AI Conference in San Jose earlier this month, VP and GM of Media & Entertainment Richard Kerris presented the Nvidia Media2 initiative that builds on the company’s Blackwell GPU foundation to enable real-time AI solutions for all aspects of media production workflows. His talk showcased a broad range of generative AI breakthroughs in real-time ray tracing and VFX, video search and summarization, and musically-based sound effects (SFX). Kerris also shared insights on the media industry’s reception to AI thus far and humbly implored the audience to consider using such technology as an effective new tool for storytelling. Continue reading Nvidia Forges AI Initiative to Streamline Production Workflows

Alibaba’s Powerful Multimodal Qwen Model Is Built for Mobile

Alibaba Cloud has released Qwen2.5-Omni-7B, a new AI model the company claims is efficient enough to run on edge devices like mobile phones and laptops. Boasting a relatively light 7-billion parameter footprint, Qwen2.5-Omni-7B understands text, images, audio and video and generates real-time responses in text and natural speech. Alibaba says its combination of compact size and multimodal capabilities is “unique,” offering “the perfect foundation for developing agile, cost-effective AI agents that deliver tangible value, especially intelligent voice applications.” One example would be using a phone’s camera to help a vision impaired-person navigate their environment. Continue reading Alibaba’s Powerful Multimodal Qwen Model Is Built for Mobile

Google Debuts Next-Gen Reasoning Models with Gemini 2.5

Google has released what it calls its most intelligent AI model yet, Gemini 2.5. The first 2.5 model release, an experimental version of Gemini 2.5 Pro, is a next-gen reasoning model that Google says outperformed OpenAI o3-mini and Claude 3.7 Sonnet from Anthropic on common benchmarks “by meaningful margins.” Gemini 2.5 models “are thinking models, capable of reasoning through their thoughts before responding, resulting in enhanced performance and improved accuracy,” according to Google. The new model comes just three months after Google released Gemini 2.0 with reasoning and agentic capabilities. Continue reading Google Debuts Next-Gen Reasoning Models with Gemini 2.5

Google Launches Agentspace in the UK and Promotes Chirp 3

Google is expanding its AI presence in the UK market, hosting a splashy launch event there for Agentspace. Google in December launched Agentspace, an AI agent hub that makes it easy for enterprises to build, manage and deploy custom agents using Gemini. The gathering was hosted by Google DeepMind CEO Demis Hassabis, and Google Cloud CEO Thomas Kurian and included participation by local customers BT Group and advertising powerhouse WPP. Google invited UK businesses to store cloud data locally using its $1 billion data center, opening there this year. The company also promoted its new Chirp 3 audio generator, which offers HD voice synthesis. Continue reading Google Launches Agentspace in the UK and Promotes Chirp 3