Meta’s MusicGen AI Generates Song Genre Mashups Using Text Input
According to The Decoder, Meta’s Audiocraft research team has launched MusicGen, an open source deep learning language model that can create fresh music by using text prompts and can also be synchronized with an existing song. Similar to ChatGPT for audio, users can specify the music style they desire, add an existing melody (if desired), and click “Generate.” After a considerable amount of time (approximately 160 seconds), MusicGen produces a brief, original music piece based on the user’s text prompts and melody.
Facebook’s Hugging Face AI demo lets you describe your music by offering a handful of examples, such as “80s driving pop song with heavy drums and synth pads in the background.” You can then “suggest” it in a specific song up to 30 seconds long with controls that let you select a specific part of the song. Then you press Create and it produces a high-quality sample lasting up to 12 seconds.
We present MusicGen: A simple and controllable music generation model. MusicGen can be prompted by both text and melody.
We release code (MIT) and models (CC-BY NC) for open research, reproducibility, and for the music community: https://t.co/OkYjL4xDN7 pic.twitter.com/h1l4LGzYgf— Felix Kreuk (@FelixKreuk) June 9, 2023
The team used 20,000 hours of licensed music for training, including 10,000 high-quality music tracks from an internal dataset, as well as Shutterstock and Pond5 tracks. To speed it up, they used Meta’s 32khz EnCodec codec to create smaller pieces of music that can be processed in parallel. “Unlike existing methods like MusicLM, MusicGen does not require a self-supervised semantic representation [and] only has 50 auto-regressive steps per second,” wrote Hugging Face ML engineer Ahsen Khaliq in a tweet.
Last month, Google released a similar music generator called MusicLM, but MusicGen seems to produce slightly better results. On the example page, the researchers compare MusicGen’s output with MusicLM and two other models, Riffusion and Musai, to prove this point. It can be run locally (a GPU with at least 16GB of RAM is recommended) and is available in four sample sizes from small (300 million parameters) to large (3.3 billion parameters) – the latter having the greatest potential for producing complex music.
As mentioned, MusicGen is open source and can even be used to create commercial music (I tried it with “Ode to Joy” and several suggested genres and the above results were… mixed). Still, it’s the latest example of the breathtaking speed of AI development over the past six months, with deep learning models threatening to invade another genre.