Meta Launches Open Source AI Toolkit for Generating Audio from Text Inputs
Meta, the company behind Facebook, has launched an open source AudioCraft kit that simplifies the process of audio production for artists and sound designers by utilizing artificial intelligence (AI) exclusively. This kit combines three pre-existing generative AI models, namely AudioGen, MusicGen, and EnCodec, which enable the creation of sound effects, music, and high-quality compressed sounds from text descriptions. With this comprehensive toolkit, musicians and sound designers now have all the necessary resources to compose their desired pieces.
The release includes pre-trained AudioGen models for those who want to get started quickly, and geeks have access to the full AudioCraft code and model weighting. The open source debut gives professionals and researchers the opportunity to train models with their own data, says Meta. All pre-trained models use either public or Meta-owned material, so copyright disputes are not possible.
The tech company describes AudioCraft as a way to make generative AI audio simpler and easier. Where images and text produced by artificial intelligence have been popular, Meta believes that sound has lagged “a bit behind”. Existing projects tend to be complex and often closed. In theory, the new series gives creators the opportunity to modify their own models and otherwise stretch what is possible.
This is not the only open text-to-speech AI on the market. Google open sourced its MusicLM template in May. Meta’s system is also not designed for everyday users – you still need to be technically inclined to use AudioCraft properly. This is more for research, the company says. Developers are also trying to improve the performance and control methods of these models, expanding their potential.
Even in its current state, however, AudioCraft may hint at the future role of artificial intelligence in music. While you may not see artists using AI to fully replace their own creativity (even experimentalists like Holly Herndon are still heavily involved), they’re getting more tools to create backing tracks, samples, and other elements with relatively little effort.