OpenAI unveils 'Voice Engine': A remarkable tool cloning voices with just a 15-second audio snippet. (AP)

OpenAI introduces ‘Voice Engine’ that replicates human speech using only 15-second audio clips

Jim HendlerMarch 30, 24 Artificial Intelligence

OpenAI, known for its groundbreaking advancements in AI technology such as Sora, a video generator, has unveiled a cutting-edge tool called ‘Voice Engine’ for voice cloning. This impressive audio model can mimic human speech intricacies like intonation and speech patterns using only a short 15-second sample of the original voice. Despite high excitement, OpenAI has chosen to keep this new feature confidential due to worries about potential misuse and the spread of fake content on the internet.

Remarkable efficiency and accuracy

“Incredibly, our Voice Engine can create emotive and lifelike sounds using just one 15-second sample,” the company stated in a recent blog post.

OpenAI’s audio engine versus industry standards

In contrast, existing AI voice platforms such as ElevenLabs typically require longer samples, and their instant voice cloning tool requires at least a minute of audio to work. To achieve the best result, we recommend about 10 minutes of continuous talk, especially for professional-level services.

OpenAI demonstrated the capabilities of the Voice Engine with various demonstrations, including a poignant example where the voice of a young patient who had lost much of his ability to speak due to a brain tumor was replicated using an older recording from a school project. Technology allowed her to communicate with her own voice, which enabled her to work with Lifespan, a non-profit affiliated with Brown University’s medical school.

In addition, OpenAI revealed partnerships with organizations such as HeyGen, demonstrating how the Voice Engine facilitates natural-sounding speech translations from one language to another.

According to OpenAI, the Voice Engine was originally developed in late 2022 and is already integrated with the preset voices available in OpenAI’s text-to-speech API, as well as ChatGPT’s Voice and Read Aloud feature. With these latest advances, the company is treading carefully before a wider release.