Meta AI Seamless Translator Converts Nearly 100 Languages

The research division of Meta AI has developed Seamless Communication, a suite of artificial intelligence models that generate what the company says is natural and authentic communication across languages, facilitating what amounts to real-time universal speech translation. The models were released with accompanying research papers and data. The flagship model, Seamless, merges capabilities from a trio of models — SeamlessExpressive, SeamlessStreaming and SeamlessM4T v2 — into a single system that can translate between almost 100 spoken and written languages, preserving idioms, emotion and the speaker’s vocal style, Meta says.

“The Seamless translator represents a new frontier in the use of AI for communication across the blog,” write VentureBeat, noting the three models focus on different aspects of communication, with SeamlessExpressive focusing on maintaining the vocal style and emotional nuance in the speaker’s voice.

As described in the Meta AI research paper: “Translations should capture the nuances of human expression. While existing translation tools are skilled at capturing the content within a conversation, they typically rely on monotone, robotic text-to-speech systems for their output.”

SeamlessStreaming offers “near realtime translation with only about two seconds of latency,” VentureBeat says, underscoring the researchers’ claims that “it is the ‘first massively multilingual model’ to deliver such fast translation speeds across nearly 100 spoken and written languages.”

SeamlessM4T v2 is essentially the foundation model for the other two. An upgraded version of last year’s SeamlessM4T model, the new architecture provides “improved consistency between text and speech output,” Meta researchers say, suggesting the Seamless tech advances the idea of a Universal Speech Translator “from a science fiction concept into a real-world technology.”

Engadget calls Meta’s Seamless Communication suite work “impressive,” giving it the edge over similar tools from Google and Samsung. “There’s no word on when the public will be able to utilize these new features,” notes Engadget, speculating about “Meta baking them into its smart glasses some day, making them even more practical than ever.”

In a Facebook blog post, Meta also shared advancements in Ego-Exo4D and Audiobox. The latest Ego-Exo4D “simultaneously captures first-person (egocentric) views from a wearable camera, as well as external (exocentric) views from cameras surrounding the person,” Meta explains.

Audiobox lets you use voice prompts or text descriptions to describe sounds or types of speech to generate (advancing the concept behind Voicebox, a GenAI model Meta introduced earlier this year).

No Comments Yet

You can be the first to comment!

Leave a comment

You must be logged in to post a comment.