Microsoft’s VASA-1 Can Generate Talking Faces in Real Time

Microsoft has developed VASA, a framework for generating lifelike virtual characters with vocal capabilities including speaking and singing. The premiere model, VASA-1, can perform the feat in real time from a single static image and a vocalization clip. The research demo showcases realistic audio-enhanced faces that can be fine-tuned to look in different directions or change expression in video clips of up to one minute at 512 x 512 pixels and up to 40fps “with negligible starting latency,” according to Microsoft, which says “it paves the way for real-time engagements with lifelike avatars that emulate human conversational behaviors.” Continue reading Microsoft’s VASA-1 Can Generate Talking Faces in Real Time

Amazon Claims ’Emergent Abilities’ for Text-to-Speech Model

Researchers at Amazon have trained what they are calling the largest text-to-speech model ever created, which they claim is exhibiting “emergent” qualities — the ability to inherently improve itself at speaking complex sentences naturally. Called BASE TTS, for Big Adaptive Streamable TTS with Emergent abilities, the new model could pave the way for more human-like interactions with AI, reports suggest. Trained on 100,000 hours of public domain speech data, BASE TTS offers “state-of-the-art naturalness” in English as well as some German, Dutch and Spanish. Text-to-speech models are used in developing voice assistants for smart devices and apps and accessibility. Continue reading Amazon Claims ’Emergent Abilities’ for Text-to-Speech Model

Meta AI Seamless Translator Converts Nearly 100 Languages

The research division of Meta AI has developed Seamless Communication, a suite of artificial intelligence models that generate what the company says is natural and authentic communication across languages, facilitating what amounts to real-time universal speech translation. The models were released with accompanying research papers and data. The flagship model, Seamless, merges capabilities from a trio of models — SeamlessExpressive, SeamlessStreaming and SeamlessM4T v2 — into a single system that can translate between almost 100 spoken and written languages, preserving idioms, emotion and the speaker’s vocal style, Meta says. Continue reading Meta AI Seamless Translator Converts Nearly 100 Languages

Adobe Reveals Its New AI Tool for Editing Problematic Audio

Adobe has unveiled Project Sound Lift, an AI-powered technology that separates speech recordings into discrete tracks of voices, non-speech sounds and other background noise in video. The company describes Project Sound Lift as “a one-click solution” that leverages AI to help users easily manipulate audio recordings “across a range of scenarios” to “enhance, transform, and control speech and sound independently.” Adobe’s existing Enhance Speech technology, available in the company’s Premiere Pro editing program, has been integrated within Project Sound Lift to aid creators in producing studio-quality audio content. Continue reading Adobe Reveals Its New AI Tool for Editing Problematic Audio

OpenAI’s ChatGPT Upgraded with ‘Talk’ Tech, Image Search

OpenAI is experimenting with new voice and image capabilities in ChatGPT. According to the company, users can now “speak with ChatGPT and have it talk back,” thanks to an intuitive new interface that, in addition to facilitating voice conversations, will allow users to show ChatGPT an image to discuss. “Snap a picture of a landmark while traveling and have a live conversation about what’s interesting about it,” OpenAI explains, alternatively suggesting you “snap pictures of your fridge and pantry to figure out what’s for dinner” or have it help with homework based on pictures of a math problem. Continue reading OpenAI’s ChatGPT Upgraded with ‘Talk’ Tech, Image Search

Meta Creates Voicebox Generative AI Model for Audio Synth

Meta Platforms has unveiled Voicebox, an AI model that can produce high-quality audio clips and edit pre-recorded audio. It also uses artificial intelligence for speech generation efforts, using what Meta calls “in-context learning” to accomplish tasks it was not specifically trained for. The company says Voicebox is first in class with this type of generalized learning for audio. Untrained tasks include sampling, stylizing and editing. As an editor, it can isolate and remove sounds like car horns and background animal noise while preserving the content and style of the source audio. The multilingual model generates speech in six languages. Continue reading Meta Creates Voicebox Generative AI Model for Audio Synth

CES: Startup Leverages AI to Address Problematic Acoustics

There are a growing number of companies working on technologies that strive to make a person’s voice more intelligible to the listener over speakers, headphones, hearing aids and other consumer audio devices. Augmented Hearing, a Danish startup launched two years ago, is one of the more interesting companies at CES 2023 focusing on this space. The firm’s software-based solution runs on iOS, Windows and other CE operating systems. Their solution could mitigate the current trend of people across all age groups turning on closed captioning because they often find video dialogue difficult to understand. Continue reading CES: Startup Leverages AI to Address Problematic Acoustics

Microsoft Project Oxford Updates Could Bring AI to More Apps

Following announcements that Google is releasing its TensorFlow machine learning platform so developers can create their own artificial intelligence programs, and Nvidia has made a significant update to its Jetson TX1 supercomputer-on-a-chip, Microsoft is the latest with major AI news. The company has updated its Project Oxford suite of AI tools with powerful new features and programs designed to identify human emotions and voices, for example, that could make their way into the apps we use on a daily basis. Continue reading Microsoft Project Oxford Updates Could Bring AI to More Apps

Google Using RankBrain Artificial Intelligence Tech for Search

Google is now relying on artificial intelligence, with a system dubbed RankBrain, for a small but significant part of its search business. Since Google is identified with search, keeping on the bleeding edge of search technology is critical to its dominance, and Google has been researching artificial intelligence — software that learns about the world — for over five years. Prior to launching RankBrain for search, Google has been a big corporate sponsor of AI, invested in it for videos, speech and translation. Continue reading Google Using RankBrain Artificial Intelligence Tech for Search

Breakthrough in AI Technology Mimics Synapses in the Brain

Researchers from Nanyang Technical University in Singapore have developed a microfiber technology that enables them to build brain-like computers. “Photonic synapses” are collections of microfibers that pass electronic signals. The optical fibers can send signals at the speed of light, much faster than the neurons in real brains. This breakthrough could provide a boost to both robotics and AI technology. Improved vehicle control, speech, and search are just some of the possible applications. Continue reading Breakthrough in AI Technology Mimics Synapses in the Brain

Building Tomorrow’s Search Engines to Sense as Humans Do

In the past decade and a half, there have been only minimal modifications to Google Search. The popular search engine functions as it always has; one enters a query into the type box and in return is given a list of instantaneous results based on the keywords. Although the search engine continues to be effective, Stefan Weitz, senior director of search at Microsoft’s Bing predicts the search engine of tomorrow will be much more advanced and proactive than anything we have today. Continue reading Building Tomorrow’s Search Engines to Sense as Humans Do

Social: Facebook Acquires Startup with Siri-Like Technology

Facebook has reportedly acquired Wit.ai, a Palo Alto-based startup with an API that allows developers to make use of voice recognition and natural language processing technology for their products. Although Facebook has not yet disclosed any details about Wit moving forward, it seems that the social network’s instant messaging app Messenger will likely be a part of the plan. The voice recognition technology that Wit provides Facebook is in line with that of Apple’s Siri. Continue reading Social: Facebook Acquires Startup with Siri-Like Technology

FCC Chairman Explains Next Steps to Protect an Open Internet

In a blog post yesterday, FCC Chairman and former telecom lobbyist Tom Wheeler wrote that he is “a strong believer in the importance of an Open Internet.” In response to what Wheeler views as “misinformed” commentaries regarding the Open Internet Notice of Proposed Rulemaking (NPRM) currently before the FCC, he offers two points of clarification: 1) This is not a final decision, but a formal request for input on the proposal, and 2) “all options for protecting and promoting an Open Internet are on the table.” Continue reading FCC Chairman Explains Next Steps to Protect an Open Internet

Newsbeat Creates Custom Radio Show Based on Your Interests

Last week the Tribune Company released a new iOS and Android app called Newsbeat, which plans to change how we consume our daily news by offering a more personalized podcast-like experience. Newsbeat has access to more than 7,000 sources from major newspapers to smaller blogs. Users can specify what types of stories and publications they are interested in, and the app will create a customized newscast by using Pandora-like artificial intelligence technology. Continue reading Newsbeat Creates Custom Radio Show Based on Your Interests

Will Wearable Tech Have a Future in Entertainment Media?

Even a cursory look at the news coming out of CES makes it clear that wearables have garnered a lot of the buzz. Smartwatches, augmented reality headsets, digital health solutions and fitness tracking monitors are all the rage here. What’s not clear is if wearables will ever intersect with the entertainment industry. Although the question itself may seem risible, it’s worth remembering that most people dismissed the mobile phone as an entertainment device only a few years ago. Continue reading Will Wearable Tech Have a Future in Entertainment Media?