Meta’s Voicebox AI Offers Text-to-Speech Conversion with Dall-E Technology
Meta has brought us closer to the promised immortal celebrity future with the unveiling of Voicebox, a generative text-to-speech model that aims to revolutionize spoken word generation, similar to how ChatGPT and Dall-E transformed text and image generation. It’s basically a text-to-print generator like GPT or Dall-E – instead of generating prose or pretty pictures, it spits out audio clips. Meta defines the system as “a non-autoregressive flow matching model trained to complete speech based on audio context and text”. It has been trained on over 50,000 hours of unfiltered…
Read More