Speech Recognition Archives

Amazon Adds Agentic AI to ‘Connect’ Customer Service Tool

By Paula Parisi
December 2, 2025

Amazon has added agentic AI capabilities to Amazon Connect, the neural text-to-speech tool that provides AI-powered customer service support and analytics in real time. Connect is capable of neural text-to-speech in more than 30 languages and also delivers automated speech recognition. Leveraging advanced speech models from Nova Sonic, the Connect agents “deliver natural, human-like conversations, responding with the right pace, tone, and understanding across multiple languages and accents,” Amazon says. The company has also integrated third-party automated speech recognition and text-to-speech solutions from Deepgram and ElevenLabs with Connect. Continue reading Amazon Adds Agentic AI to ‘Connect’ Customer Service Tool

OpenAI Rolls Out Open-Source Speech Recognition System

By Paula Parisi
September 26, 2022

OpenAI has released a new open source AI speech recognition model called Whisper that can recognize and translate audio at levels it says compare in accuracy and robustness to human abilities. Case uses include transcription of speeches, interviews, podcasts and conversations. “Moreover, it enables transcription in multiple languages, as well as translation from those languages into English,” says OpenAI, which is open-sourcing models and inference code on GitHub “to serve as a foundation for building useful applications and for further research on robust speech processing.” Continue reading OpenAI Rolls Out Open-Source Speech Recognition System

Companies Turn to AI for New Approaches to Audio Solutions

By Paula Parisi
January 18, 2022

To understand speech visually, by reading lips, in addition to aurally, is an advantage for which AI has been waiting, according to researchers at Meta Platforms (formerly Facebook). The company says it has developed a framework that learns by watching — Audio-Visual Hidden Unit BERT (AV-HuBERT) — and that it is 75 percent more accurate than competing automated speech recognition systems on several metrics. Meta claims that AV-HuBERT outperforms the former best audiovisual speech recognition system with only one-tenth the inuput, which makes it potentially useful with languages with little or no audio data. Continue reading Companies Turn to AI for New Approaches to Audio Solutions

Microsoft to Buy AI and Speech Recognition Provider Nuance

By Debra Kaufman
April 14, 2021

Microsoft is on track to acquire Nuance Communications, an AI and speech recognition software company, for about $16 billion. The company intends to expand its offerings in medical computing; Nuance already has speech and text data related to healthcare, an established customer base and the transcription tool Dragon. According to Microsoft, the purchase will “double the size of the healthcare market where it competed to almost $500 billion.” With the purchase, Microsoft could also develop advanced AI solutions for the workplace across numerous industries. Microsoft’s last big purchase was LinkedIn, for $26.2 billion in 2015. Continue reading Microsoft to Buy AI and Speech Recognition Provider Nuance

Facebook Using Self-Supervised Models to Build AI Systems

By Debra Kaufman
March 16, 2021

Facebook debuted Learning from Videos, a project designed to learn audio, images and text from publicly available Facebook videos to improve its core AI systems. By culling data from hundreds of languages and countries, said Facebook, the project will also help to enable “entirely new experiences.” Learning from Videos, which began in 2020, has also helped to improve recommendations in Instagram Reels. Facebook, Google and others are focused on self-supervised techniques rather than labeled datasets to improve AI. Continue reading Facebook Using Self-Supervised Models to Build AI Systems

CES: Panel Examines Issues of Gender and Racial Bias in AI

By Debra Kaufman
January 13, 2021

During a CES 2021 panel moderated by The Female Quotient chief executive Shelley Zalis, AI industry executives probed issues related to gender and racial bias in artificial intelligence. Google head of product inclusion Annie Jean-Baptiste, SureStart founder and chief executive Dr. Taniya Mishra and ResMed senior director of health economics and outcomes research Kimberly Sterling described the parameters of such bias. At Google, Jean-Baptiste noted that, “the most important thing we need to remember is that inclusion inputs lead to inclusion outputs.” Continue reading CES: Panel Examines Issues of Gender and Racial Bias in AI

BBC Partners with Microsoft to Release Beeb Voice Assistant

By Debra Kaufman
June 4, 2020

The BBC partnered with Microsoft to release an early version of Beeb, its digital voice assistant. Its U.K. debut will be part of Microsoft’s Windows Insider Program to encourage users to help improve Beeb prior to a wider rollout. The BBC first announced Beeb last year, noting that the aim was to integrate voice services into all its products. The public broadcaster will collect data by requiring users to log in to Beeb with their BBC accounts but such data will not be used for targeted ads. Continue reading BBC Partners with Microsoft to Release Beeb Voice Assistant

Apple Researchers Improving Accuracy of Virtual Assistant

By Debra Kaufman
February 5, 2020

Over 50 million people worldwide use Apple’s virtual assistant Siri. Apple, focused on improving Siri’s capabilities, published research on how to improve voice trigger detection, speaker verification and language identification for multiple speakers. Apple researchers suggest that an AI model be trained for automatic speech recognition and speaker recognition. Rather than approach it as two independent tasks, the researchers proved that those tasks might actually help one another to “estimate both properties.” Continue reading Apple Researchers Improving Accuracy of Virtual Assistant

CES 2020: The Next Decade Brings the Intelligence of Things

By Debra Kaufman
January 6, 2020

At Sunday’s opening CES event, CTA’s VP of research Steve Koenig and director of research Lesley Rohrbaugh revealed trends for CES 2020, as we move “into the data age.” “In the previous decade, we could describe the dynamic in hardware, software, apps and even content as IoT, the Internet of Things,” said Koenig. “In the new decade, we’ll be increasingly confronted with a new IoT: the Intelligence of Things. This new IoT bears testimony to the fact that AI is permeating commerce and culture.” Continue reading CES 2020: The Next Decade Brings the Intelligence of Things

Congress Calls For End to Tech Firms’ Audio Transcriptions

By Debra Kaufman
August 16, 2019

A bipartisan group of Congress members castigated Facebook for hiring contractors to transcribe audio clips and urged regulation to prevent it in the future. The transcriptions were made to help Facebook improve its artificial intelligence-enabled speech recognition, and are part of a move to improve the capabilities of voice assistants (Amazon, Apple and Google are among companies that have taken similar approaches). Last year, Senator Ron Wyden (D-Oregon) circulated a draft law that would impose steep fines and even prison for executives who failed to protect users’ personal data. Continue reading Congress Calls For End to Tech Firms’ Audio Transcriptions

Google Stops Human Review of Assistant Voice Clips in EU

By Debra Kaufman
August 7, 2019

Google is pausing Google Assistant voice transcriptions in the European Union for at least three months. In mid-July, it admitted that about 1,000 private communications were made available to human contractors evaluating Assistant’s speech recognition accuracy, revealing personal and private information. A Google spokesperson reported that the company ceased voice transcription involving human moderators after learning of additional leaks in the Netherlands. Amazon will allow Alexa users to opt out of the human review of recordings and Apple has halted its program allowing human contractors to listen in on Siri recordings. Continue reading Google Stops Human Review of Assistant Voice Clips in EU

Google and Amazon Use AI to Improve Speech Recognition

By Debra Kaufman
April 24, 2019

Google’s artificial intelligence researchers made an unexpected discovery with its new SpecAugment data augmentation model for automatic speech recognition. Rather than augmenting input audio waveforms, SpecAugment applies augmentation directly to the audio spectrogram. Researchers discovered, to their surprise, that models trained with SpecAugment out-performed all other speech recognition methods, even without a language model. Amazon also revealed research on improving Alexa’s speech recognition by 15 percent. Continue reading Google and Amazon Use AI to Improve Speech Recognition

Startup Within to Release Augmented Reality App for Children

By Debra Kaufman
November 9, 2018

Los Angeles-based immersive media startup Within plans to release Wonderscope, an augmented reality app for children, later this month. With Wonderscope, mobile AR superimposes characters, scenes and stories onto an iPad camera view of a real-world environment. Within chief executive Chris Milk noted that, with Wonderscope and a smartphone, anyone can have “this new magical ability.” “It’s like a lens for invisible magical things that you couldn’t see with your naked eye,” he added. Continue reading Startup Within to Release Augmented Reality App for Children

Microsoft Builds on Existing Tech, Voices Moral Conscience

By Debra Kaufman
May 9, 2018

At its Build developer conference this week, Microsoft is showing products that highlight its changed direction under the aegis of chief executive Satya Nadella. Among them is a DJI drone loaded with Microsoft software to identify oil pipeline faults without an Internet connection. Although Microsoft is helping customers enhance their existing gear, the company promised “big things ahead” to those entirely in the Microsoft ecosystem. Uninvolved in recent data scandals, some deem Microsoft to be the tech industry’s moral conscience. Continue reading Microsoft Builds on Existing Tech, Voices Moral Conscience

NAB 2018: IBM Watson on Refining AI for Closed Captioning

By Debra Kaufman
April 13, 2018

Closed captioning isn’t just for the hard-of-hearing anymore. According to Digiday, 85 percent of Facebook video is viewed without sound. That signals a trend of viewers who prefer to watch closed captioning, putting the heat on solutions providers to come up with compliant systems that are also accurate and speedy. With artificial intelligence, says IBM Watson Media senior offering manager David Kulczar, closed captioning can be enhanced to go beyond transcription, and automatically identify background audio descriptions. Continue reading NAB 2018: IBM Watson on Refining AI for Closed Captioning