Yasa-1: Startup Reka Launches New AI Multimodal Assistant

Startup Reka AI is releasing in preview its first artificial intelligence assistant, Yasa-1. The multimodal AI is described as “a language assistant with visual and auditory sensors.” The year-old company says it “trained Yasa-1 from scratch,” including pretraining foundation models “from ground zero,” then aligning them and optimizing to its training and server infrastructures. “Yasa-1 is not just a text assistant, it also understands images, short videos and audio (yes, sounds too),” said Reka AI co-founder and Chief Scientist Yi Tay. Yasa-1 is available via Reka’s APIs and as docker containers for on-site or virtual private cloud deployment. Continue reading Yasa-1: Startup Reka Launches New AI Multimodal Assistant

Meta’s Multimodal AI Model Translates Nearly 100 Languages

Meta Platforms is releasing SeamlessM4T, the world’s “first all-in-one multilingual multimodal AI translation and transcription model,” according to the company. SeamlessM4T can perform speech-to-text, speech-to-speech, text-to-speech, and text-to-text translations for up to 100 languages, depending on the task. “Our single model provides on-demand translations that enable people who speak different languages to communicate more effectively,” Meta claims, adding that SeamlessM4T “implicitly recognizes the source languages without the need for a separate language identification model.” Continue reading Meta’s Multimodal AI Model Translates Nearly 100 Languages

YouTube Introduces Multi-Language Audio Tracks Worldwide

Following several months of tests, YouTube is launching is multi-language audio track feature worldwide, with popular vlogger MrBeast helping to promote the new feature’s benefits. MrBeast, who has over 135 million global subscribers, is hoping to attract new subscribers to his channel now that the most popular videos are dubbed into 11 different languages. The multi-language audio feature allows creators to dub new and existing videos. YouTube says more than 3,500 multi-language videos have been uploaded to the site in 40-plus languages since January of this year. Continue reading YouTube Introduces Multi-Language Audio Tracks Worldwide

Facebook Adds 24 Languages to Rosetta Translation Feature

Facebook’s Rosetta is a machine learning system that extracts text in many languages from over one billion images in a real time. Facebook built its own optical character recognition system that can process such huge amount of content, day in and day out. In a recent blog post, Facebook explained how Rosetta works, using a convolutional neural network to recognize and transcribe text, even non-Latin alphabets and non-English words. The system was trained with a mix of human- and machine-annotated public images. Continue reading Facebook Adds 24 Languages to Rosetta Translation Feature