Meta unveils AI-powered tool for translating speech across multiple languages
Meta Platforms, the parent company of Facebook, has unveiled an AI model that can translate and transcribe speech in numerous languages. This breakthrough could serve as a foundation for developing tools that facilitate seamless communication across language barriers.
The company said in a blog post that its SeamlessM4T model could support text-to-speech translations in nearly 100 languages, as well as full speech-to-speech translation for 35 languages, combining technology previously only available in standalone models.
CEO Mark Zuckerberg has said he sees such tools facilitating interactions between users from around the world in the metaverse, the interconnected virtual worlds he’s betting the company’s future holds.
Meta makes the model available to the public for non-commercial use, the blog post said.
The world’s largest social media company has released a slew of mostly free AI models this year, including a major language model called Llama that poses a serious challenge to proprietary models sold by Microsoft-backed OpenAI and Alphabet’s Google.
Zuckerberg says the open AI ecosystem works in Meta’s favor, as the company has more to gain by effectively crowdsourcing the creation of consumer-facing tools for its social platforms than by charging for the use of models.
Nevertheless, Meta faces similar legal issues as the rest of the industry when it comes to training data needed to create models.
In July, comedian Sarah Silverman and two other authors filed copyright infringement lawsuits against both Meta and OpenAI, accusing the companies of using their books as educational information without permission.
For the SeamlessM4T model, the Meta researchers said in their research paper that they collected audio training data from 4 million hours of “raw audio from a publicly available archive of indexed online data,” without specifying which repository.
Meta’s spokesperson did not answer questions about the origin of the audio data.
The text data came from datasets created last year that extracted content from Wikipedia and related websites, the research paper said.