Deepgram’s Speech Portfolio Now Includes Human-Like Aura

Deepgram’s new Aura software turns text into generative audio with a “human-like voice.” The 9-year-old voice recognition company has raised nearly $86 million to date on the strength of its Voice AI platform. Aura is an extremely low-latency text-to-speech voice AI that can be used for voice AI agents, the company says. Paired with Deepgram’s Nova-2 speech-to-text API, developers can use it to “easily (and quickly) exchange real-time information between humans and LLMs to build responsive, high-throughput AI agents and conversational AI applications,” according to Deepgram. Continue reading Deepgram’s Speech Portfolio Now Includes Human-Like Aura

OpenAI’s ChatGPT Upgraded with ‘Talk’ Tech, Image Search

OpenAI is experimenting with new voice and image capabilities in ChatGPT. According to the company, users can now “speak with ChatGPT and have it talk back,” thanks to an intuitive new interface that, in addition to facilitating voice conversations, will allow users to show ChatGPT an image to discuss. “Snap a picture of a landmark while traveling and have a live conversation about what’s interesting about it,” OpenAI explains, alternatively suggesting you “snap pictures of your fridge and pantry to figure out what’s for dinner” or have it help with homework based on pictures of a math problem. Continue reading OpenAI’s ChatGPT Upgraded with ‘Talk’ Tech, Image Search

Meta’s Multimodal AI Model Translates Nearly 100 Languages

Meta Platforms is releasing SeamlessM4T, the world’s “first all-in-one multilingual multimodal AI translation and transcription model,” according to the company. SeamlessM4T can perform speech-to-text, speech-to-speech, text-to-speech, and text-to-text translations for up to 100 languages, depending on the task. “Our single model provides on-demand translations that enable people who speak different languages to communicate more effectively,” Meta claims, adding that SeamlessM4T “implicitly recognizes the source languages without the need for a separate language identification model.” Continue reading Meta’s Multimodal AI Model Translates Nearly 100 Languages

OpenAI Targets Affordable AI with ChatGPT and Whisper APIs

OpenAI is now allowing third-party developers integrate ChatGPT into their apps, a solution the company says will be a more cost-effective alternative. The language model can be used for more than chat, says OpenAI, which also has a new speech-to-text model called Whisper. The company is also touting gpt-3.5-turbo, calling it the “best model for many non-chat use cases.” With a major investment from Microsoft, and the eyes of the industry on it, OpenAI seems to be feeling some pressure to add earnings to the success it has as a thought leader. Continue reading OpenAI Targets Affordable AI with ChatGPT and Whisper APIs

NAB 2018: IBM Watson on Refining AI for Closed Captioning

Closed captioning isn’t just for the hard-of-hearing anymore. According to Digiday, 85 percent of Facebook video is viewed without sound. That signals a trend of viewers who prefer to watch closed captioning, putting the heat on solutions providers to come up with compliant systems that are also accurate and speedy. With artificial intelligence, says IBM Watson Media senior offering manager David Kulczar, closed captioning can be enhanced to go beyond transcription, and automatically identify background audio descriptions. Continue reading NAB 2018: IBM Watson on Refining AI for Closed Captioning

Mozilla Intros Open-Source Speech Recognition, Voice Dataset

Mozilla unveiled Project DeepSpeech and Project Common Voice to leverage the capabilities of speech recognition. The company says it has just reached “two important milestones” in the project out of its Machine Learning Group. Mozilla is releasing its open source speech recognition model, which it states is nearly as accurate as what humans can perceive from the same recordings, and is also unveiling the world’s second largest publicly available voice dataset, with contributions by almost 20,000 people around the world. Continue reading Mozilla Intros Open-Source Speech Recognition, Voice Dataset

Apple Clips Launches: Cool Features, But Not Always Intuitive

Apple is debuting a standalone video app called Apple Clips that allows users to shoot, edit and share video clips for mobile phones. Apple Clips, for iOS 10.3 or higher, features real-time captioning and facial recognition as well as giant emoji, cartoon filters and lively title screens — and the end results can be distributed to iMessage contacts. Automatic captioning, dubbed Live Titles, allows the user to choose a font and style; after hitting record, the app transcribes speech to text. But less ideal features mar the app, say critics. Continue reading Apple Clips Launches: Cool Features, But Not Always Intuitive

Microsoft Stream Offers Familiar Video Tools for Businesses

Microsoft introduced Stream, a service that will allow businesses the ability to share internal video easily and securely. Now available as a free preview, Stream offers the same easy-to-use, flexible tools as YouTube, but with security tools for enterprise content. Office 365 already has a Video tool, and Microsoft’s idea is to eventually and seamlessly merge the two services. Unlike Office 365, Stream will make use of tools — including likes, comments, and recommendations — found in consumer platforms such as Vimeo and YouTube. Continue reading Microsoft Stream Offers Familiar Video Tools for Businesses

New Hound App Could Prove Rival to Siri, Cortana, Google Now

As the battle heats up with tech companies over artificial intelligence and digital assistants, SoundHound released an app this week called “Hound” that promises to enhance voice search with its ability to quickly and efficiently handle complex questions. According to Keyvan Mohajer, SoundHound founder and chief exec, Hound has a leg up on the competition since it performs voice recognition and natural-language processing in a single step, as opposed to translating speech to text and then performing a search using that text. Continue reading New Hound App Could Prove Rival to Siri, Cortana, Google Now

Amazon Purchasing Yap: Possible Siri Rival for the Kindle Fire?

  • In a quiet acquisition deal, Amazon is purchasing Yap, a speech-to-text startup that may find its voice recognition technology in future Kindle products.
  • “Yap is truly a leader in freeform speech recognition and driving innovation in the mobile user experience,” says Paul Grim of SunBridge Partners, which funded Yap in 2008.
  • “Yap’s technology may give Amazon the ability to add voice controls to its tablets capable of understanding far more than the rudimentary commands currently supported by Android software, potentially allowing the company to erode Apple’s dominance,” reports Forbes.
  • Apple has yet to make a move toward installing Siri on its iPad, so Amazon could get a jump start. “If Amazon puts Yap’s technology to good use and releases tablets with intuitive voice recognition in the near future, it may give Android-powered tablets a stronger handhold in the market,” suggests the article.