Google Offers Public Preview of Gemini Pro for Cloud Clients

Google is moving its most powerful artificial intelligence model, Gemini 1.5 Pro, into public preview for developers and Google Cloud customers. Gemini 1.5 Pro includes what Google claims is a breakthrough in long context understanding, with the ability to run 1 million tokens of information “opening up new possibilities for enterprises to create, discover and build using AI.” Gemini’s multimodal capabilities allow it to process audio, video, text, code and more, which when combined with long context, “enables enterprises to do things that just weren’t possible with AI before,” according to Google. Continue reading Google Offers Public Preview of Gemini Pro for Cloud Clients

Deepgram’s Speech Portfolio Now Includes Human-Like Aura

Deepgram’s new Aura software turns text into generative audio with a “human-like voice.” The 9-year-old voice recognition company has raised nearly $86 million to date on the strength of its Voice AI platform. Aura is an extremely low-latency text-to-speech voice AI that can be used for voice AI agents, the company says. Paired with Deepgram’s Nova-2 speech-to-text API, developers can use it to “easily (and quickly) exchange real-time information between humans and LLMs to build responsive, high-throughput AI agents and conversational AI applications,” according to Deepgram. Continue reading Deepgram’s Speech Portfolio Now Includes Human-Like Aura

Alibaba’s EMO Can Generate Performance Video from Images

Alibaba is touting a new artificial intelligence system that can animate portraits, making people sing and talk in realistic fashion. Researchers at the Alibaba Group’s Institute for Intelligent Computing developed the generative video framework, calling it EMO, short for Emote Portrait Alive. Input a single reference image along with “vocal audio,” as in talking or singing, and “our method can generate vocal avatar videos with expressive facial expressions and various head poses,” the researchers say, adding that EMO can generate videos of any duration, “depending on the length of video input.” Continue reading Alibaba’s EMO Can Generate Performance Video from Images

Pika Taps ElevenLabs Audio App to Add Lip Sync to AI Video

On the heels of ElevenLabs’ demo of a text-to-sound app unveiled using clips generated by OpenAI’s text-to-video artificial intelligence platform Sora, Pika Labs is releasing a feature called Lip Sync that lets its paid subscribers use the ElevenLabs app to add AI-generated voices and dialogue to Pika-generated videos and have the characters’ lips moving in sync with the speech. Pika Lip Sync supports both uploaded audio files and text-to-audio AI, allowing users to type or record dialogue, or use pre-existing sound files, then apply AI to change the voicing style. Continue reading Pika Taps ElevenLabs Audio App to Add Lip Sync to AI Video

Adobe’s Prototype AI Tool Is a ‘Photoshop for Music-Making’

Project Music GenAI Control, an experimental work from Adobe Research, is setting out to change how people create and edit custom audio and music. The prototype tool lets creators generate music from text prompts, “and then have fine-grained control to edit that audio for their precise needs,” according to Adobe. Designed to help create music for broadcasts, podcasts or other “audio that’s just the right mood, tone, and length,” it can generate music from text prompts like “powerful rock,” “happy dance” or “sad jazz,” says Adobe Research Senior Research Scientist Nicholas Bryan, a creator of the technology. Continue reading Adobe’s Prototype AI Tool Is a ‘Photoshop for Music-Making’

Spotify In-House Agency AUX to Connect Brands with Music

Spotify is rolling out AUX, an in-house music advisory agency for brands. “With AUX, we’ll use our deep expertise to counsel brands about how best to use music to enrich their campaigns and connect them with emerging artists to help them reach new audiences,” Spotify announced, joining Meta Platforms, YouTube, Snapchat and others in connecting creatives with brands. AUX aims to provide emerging artists with an avenue to another potential income source, as well as a path to wider exposure, as the idea is to get brands to pay Spotify to access the new service. Continue reading Spotify In-House Agency AUX to Connect Brands with Music

Latest Disney Accelerator Backs AI, VR, Autonomous Vehicles

The Walt Disney Company has selected five companies to be in its annual Accelerator program, three of them AI startups, one in robotics and one developing VR. The program, now in its tenth year, identifies promising new tech companies to benefit from Disney funding and mentorship in exchange for an inside track on talent and acquisitions. The class of 2024 includes AudioShake, which leverages AI to aid in mixing and dubbing audio tracks for mixing or dubbing; ElevenLabs, which has a text-to-speech app for GenAI voicing; and Promethean AI, a digital archives search platform that informs prototype design. Continue reading Latest Disney Accelerator Backs AI, VR, Autonomous Vehicles

ElevenLabs Promotes Its Latest Advances in AI Audio Effects

“What if you could describe a sound and generate it with AI?,” asks startup ElevenLabs, which set out to do just that, and says it has succeeded. The two-year-old company explains it “used text prompts like ‘waves crashing,’ ‘metal clanging,’ ‘birds chirping,’ and ‘racing car engine’ to generate audio.” Best known for using machine learning to clone voices, the AI firm founded by Google and Palantir alums has yet to make publicly available its new text-to-sound model but began teasing it by releasing online demos this week. Some see the technology as a natural complement to the latest wave of image generators. Continue reading ElevenLabs Promotes Its Latest Advances in AI Audio Effects

Amazon Claims ’Emergent Abilities’ for Text-to-Speech Model

Researchers at Amazon have trained what they are calling the largest text-to-speech model ever created, which they claim is exhibiting “emergent” qualities — the ability to inherently improve itself at speaking complex sentences naturally. Called BASE TTS, for Big Adaptive Streamable TTS with Emergent abilities, the new model could pave the way for more human-like interactions with AI, reports suggest. Trained on 100,000 hours of public domain speech data, BASE TTS offers “state-of-the-art naturalness” in English as well as some German, Dutch and Spanish. Text-to-speech models are used in developing voice assistants for smart devices and apps and accessibility. Continue reading Amazon Claims ’Emergent Abilities’ for Text-to-Speech Model

CES: Sennheiser Touts Its New Wireless Momentum Earbuds

Sennheiser has updated its flagship Momentum True Wireless earbuds, adding support for Qualcomm’s aptX audio tech. The company also debuted a Momentum Sport edition that tracks heart rate and body temperature. The Momentum True Wireless 4 promises “unparalleled sound,” combining Sennheiser’s audio expertise with Qualcomm’s S5 Sound Gen 2 platform and Snapdragon Sound Technology with aptX for lossless sound and ultra-low latency. Boasting 7.5 hours of continuous listening, the new buds come in black copper, metallic silver, and graphite for $300. The more rugged Momentum Sport with biometric features lists for $330. Continue reading CES: Sennheiser Touts Its New Wireless Momentum Earbuds

CES: Samsung Updates Frame TV, Debuts New Music Frame

Samsung Electronics has updated its most popular lifestyle television, debuting The Frame TV 2004 at CES and spinning off The Music Frame, a wireless speaker with Dolby Atmos capability that also displays favorite photos or artwork. The Frame TV offers improved energy efficiency and a larger selection of display images. The Music Frame, which takes its inspiration from The Frame TV, features built-in woofers and intelligent audio processing for “a premium audio experience.” It can serve as a standalone wireless speaker or, using Q-Symphony, can provide surround sound when paired with 2024 Samsung TVs and soundbars. Continue reading CES: Samsung Updates Frame TV, Debuts New Music Frame

CES: Voiseed Upgrades Its Platform for Expressive AI Voices

Milano-based Voiseed demonstrated its web-based Revoiceit platform at CES, pitched as the best way to manage synthetic voice actors, particularly ensuring that synthetic voices present realistic emotions. The company describes it as a cloud-based solution that uses “generative AI to infuse virtual voices with human emotions and prosody, creating highly expressive, lifelike audio experiences.” While Revoiceit’s most obvious feature is its Studio (imagine Adobe Audition devoted to second-by-second management of voices), it may well be the product’s forthcoming API that provides real value to developers of entertaining technology products. Continue reading CES: Voiseed Upgrades Its Platform for Expressive AI Voices

CES: Will.i.am Discusses the Intersection of Music and Tech

Musician will.i.am of the Black Eyed Peas — who is also a noted technologist, entrepreneur, investor and philanthropist — discussed his work with Mercedes-AMG, why he attends the CES conference each year in Las Vegas, and his vision of the future. In 2022 he was asked by Mercedes to reimagine a vehicle. He loves pattern-matching, he said, and seeing how things align. After developing ideas with his team and auctioning off the working prototype WILL.I.AMG to raise funds for his inner-city education philanthropy, he went back to Mercedes with a simple but powerful pitch with a focus on audio. Continue reading CES: Will.i.am Discusses the Intersection of Music and Tech

CES: Circana Foresees an Increase in Tech Spending in 2024

The technology sector had a tough 2023. During a CES session in Las Vegas yesterday, Circana Vice President Paul Gagnon and Circana Industry Analyst Ben Arnold revealed the details of what they’ve been tracking over the last year, noting both areas of decline and “pockets of growth.” They encouraged tech vendors to look for innovative ways to sell products, targeting consumers via age group and income bracket as well as looking at geographic zones such as Mexico that are experiencing growth. The good news, they stressed, was the return of growth in the latter half of 2023. Continue reading CES: Circana Foresees an Increase in Tech Spending in 2024

Roku to Demo Its Pro Series TVs and Smart Picture AI at CES

Roku is following up the budget-priced, self-branded TV sets it introduced in January last year at CES with the more ambitious Roku Pro Series TV lineup debuting at next week’s CES 2024 and shipping later this spring. The 4K QLED Pro TVs will come in 55-inch, 65-inch, and 75-inch sizes retailing for under $1,500. Included are features like Mini LED local dimming for heightened contrast and deeper blacks. The Pro TVs also tap artificial intelligence for a Smart Picture feature that automatically adjusts picture and audio. The feature is scheduled to roll out to all Roku TVs this year. Continue reading Roku to Demo Its Pro Series TVs and Smart Picture AI at CES