Meta’s AI Model Enhances Text and Speech Translation Efficiency
Meta has announced a new AI model called SeamlessM4T, which is designed to help users translate text and speech more efficiently in different languages.
According to the company, SeamlessM4T is the first all-in-one multimodal and multilingual AI translation model. It recognizes speech in nearly 100 languages and translates speech into text in nearly 100 input and output languages. It also supports text-to-text translation, text-to-speech translation and even speech-to-speech translation.
Meta is making SeamlessM4T publicly available under a research license to allow researchers to build on existing work.
“Building a universal language translator like the fictional Babel Fish in Flags’ Guide to the Galaxy is challenging because the existing speech-to-speech and speech-to-text systems only cover a small part of the world’s languages. But we believe that the work we announced today is a significant step forward on this journey,” states Meta .
It also said that when this model is compared to other “approaches that use separate models, SeamlessM4T’s single-system approach reduces errors and delays, which increases the efficiency and quality of the translation process. This allows people who speak different languages to communicate with each other more effectively.”
Meta also admitted that the creation of this model aims to create a “universal compiler”. And that the current model draws inspiration from some of the company’s recent models, such as No Language Left Behind and Massively Multilingual Speech.
“In the future, we want to explore how this basic model can enable new communication capabilities — ultimately bringing us closer to a world where everyone can be understood,” Meta said.
In related news, Meta also recently announced its AudiCraft AI tool, which allows users to create original soundtracks using text-based prompts. The tool is divided into three models: AudioGen, MusicGen and EnCodec. AudioGen creates audio from text prompts based on public sound effects, while MusicGen does the same but with music licensed by Meta. The EnCodec decoder enables the creation of higher quality music with fewer artifacts.