Otter Adds New Generative AI Features to Its Meeting Assistant

Web-based transcription service Otter.ai is expanding its toolkit with Meeting GenAI, aimed at corporate customers who want to increase meeting productivity while decreasing effort. Multi-meeting capabilities have been added using Otter AI Chat, which can respond to queries like “What did I miss in the meetings from the past two weeks?” Conversation Summary View summarizes meetings in real-time along with automatically identified action items that are assigned owners, deadlines and tracking. Otter is positioning itself as a David versus the Goliaths of AI meeting assists: Microsoft Copilot, Zoom AI Companion and Google’s Gemini for Workspace. Continue reading Otter Adds New Generative AI Features to Its Meeting Assistant

Meta’s Multimodal AI Model Translates Nearly 100 Languages

Meta Platforms is releasing SeamlessM4T, the world’s “first all-in-one multilingual multimodal AI translation and transcription model,” according to the company. SeamlessM4T can perform speech-to-text, speech-to-speech, text-to-speech, and text-to-text translations for up to 100 languages, depending on the task. “Our single model provides on-demand translations that enable people who speak different languages to communicate more effectively,” Meta claims, adding that SeamlessM4T “implicitly recognizes the source languages without the need for a separate language identification model.” Continue reading Meta’s Multimodal AI Model Translates Nearly 100 Languages

OpenAI Rolls Out Open-Source Speech Recognition System

OpenAI has released a new open source AI speech recognition model called Whisper that can recognize and translate audio at levels it says compare in accuracy and robustness to human abilities. Case uses include transcription of speeches, interviews, podcasts and conversations. “Moreover, it enables transcription in multiple languages, as well as translation from those languages into English,” says OpenAI, which is open-sourcing models and inference code on GitHub “to serve as a foundation for building useful applications and for further research on robust speech processing.” Continue reading OpenAI Rolls Out Open-Source Speech Recognition System

Twitter Intros Ephemeral Tweets, Gathering Spaces for Audio

Twitter is launching Fleets, a feature that allows users to post photos or text that will disappear after 24 hours. Snapchat pioneered the ephemeral post, followed by Instagram and Facebook. Rollout of the Stories-like feature is moving forward, but has been scaled back as Twitter addresses “some performance and stability problems.” The platform’s “global town square” continues to be its “marquee product” but, said Twitter director of design Joshua Harris, the Fleets feature creates a space with less pressure for users who lurk but don’t post. The company is also testing Spaces, a new audio feature similar to Clubhouse, a startup that debuted earlier this year. Continue reading Twitter Intros Ephemeral Tweets, Gathering Spaces for Audio

Google Open-Sources Technology For Real-Time Captions

Google is looking to help developers create real-time captioning for long-form conversations in multiple languages. The company recently open-sourced the speech engine used for Live Transcribe, its Android speech-to-text transcription app designed for those who are deaf or hard of hearing, and posted the source code on GitHub. Live Transcribe, launched in February, is a tool that uses machine learning algorithms to convert audio into captions. Live Transcribe can transcribe speech in more than 70 languages and dialects into captions in real-time. Continue reading Google Open-Sources Technology For Real-Time Captions

Publishers and Authors Guild Oppose Audible Text Feature

Audible, the audiobook app owned by Amazon, is using machine learning to transcribe audio recordings, so listeners can also read along with the narrator. Audible is promoting it as an educational feature, but some publishers are up in arms, demanding their books be excluded because captions are “unauthorized and brazen infringements of the rights of authors and publishers.” Publishers are concerned that this will lead to fewer people buying physical or e-books if they can get the text with an Audible audiobook. Continue reading Publishers and Authors Guild Oppose Audible Text Feature

Android Q Live Caption Feature Enables Real-Time Subtitles

During Google’s I/O 2019 developers conference this week, the company demonstrated an impressive new feature for mobile operating system Android Q. Called Live Caption, the feature enables real-time transcription for any video or audio that users play on their smartphones. No matter if they’re listening or watching via YouTube, Skype, Instagram, Pocket Casts, or other applications, Live Caption overlays the text on top of whatever is being used. Additionally, Live Caption will work on top of original video or audio recordings on users’ phones.

Continue reading Android Q Live Caption Feature Enables Real-Time Subtitles

AI Firm Shows Multilingual Translator That Fits in Your Pocket

The iFLYTEK Translator 2.0 is a handheld spoken language translator developed with Chinese AI technology and training. The size of a mobile phone, it can translate between any two of 63 languages and is trained in a number of “professional vocabularies.” The device touts a 5-hour battery life, and at $450, would be a useful and affordable business and personal tool. This Chinese tech also raises some interesting privacy and geopolitical issues. In addition to the upgraded Translator 2.0, the company also announced its iFLYREC Series voice-to-text products, AI Note for recording and transcription, and iFLYOS voice-interaction system at CES. Continue reading AI Firm Shows Multilingual Translator That Fits in Your Pocket

NAB 2018: IBM Watson on Refining AI for Closed Captioning

Closed captioning isn’t just for the hard-of-hearing anymore. According to Digiday, 85 percent of Facebook video is viewed without sound. That signals a trend of viewers who prefer to watch closed captioning, putting the heat on solutions providers to come up with compliant systems that are also accurate and speedy. With artificial intelligence, says IBM Watson Media senior offering manager David Kulczar, closed captioning can be enhanced to go beyond transcription, and automatically identify background audio descriptions. Continue reading NAB 2018: IBM Watson on Refining AI for Closed Captioning

Facebook Messenger Will Roll Out Voice-to-Text Capabilities

Facebook will continue to improve its Messenger app this year. The standalone app already has more than 500 million monthly users, but the company is hoping to get to a billion users by the end of the year. One attractive new feature will be the voice-to-text transcription. A release date has yet to be announced, but the company is already testing it. Also, Facebook will experiment with ways to generate revenue and give people a way to communicate with businesses on the Messenger app. Continue reading Facebook Messenger Will Roll Out Voice-to-Text Capabilities