Microsoft Azure AI Speech was unveiled at the Ignite 2023 conference. (Microsoft)

Introducing the Future: Microsoft Unveils AI Text-to-Speech Avatar at Ignite 2023!

Microsoft has recently made efforts to integrate artificial intelligence (AI) into its range of products, including Microsoft Office for consumers and Copilot 365 for businesses. During the Ignite 2023 conference, the tech giant unveiled various new AI-driven products like Copilot Studio and Windows AI Studio. Additionally, Bing Chat has been renamed Copilot, and a text-to-speech avatar program called Azure AI Speech has been introduced, allowing the creation of talking avatar videos. This feature is currently being released in a public preview. Learn more about this exciting addition.

Microsoft Azure AI Speech

Azure AI Speech is a text-to-speech avatar that lets you convert text into a 2D video of a human-like speaking avatar. Microsoft says the Neural text-to-speech Avatar models are trained by deep neural networks based on human video recording samples, and the Avatar voice is provided by a text-to-speech voice model. Users can use text input to create training videos, product demos, customer feedback, and more, enabling more digital interaction.

How does it work

The Azure AI Speech avatar content creation workflow involves three steps – Text Analyzer, TTS Voice Synthesizer, and TTS Avatar Video Synthesizer. First, the user provides text input and the text analyzer outputs it in the form of a phoneme sequence. Then the TTS voice synthesizer predicts the acoustic properties of the input text and synthesizes the voice. Both features work with text-to-speech audio models.

Finally, a neural text-to-speech avatar model predicts lip sync with the acoustic features of the image, so a synthetic video is generated.

Azure AI Speech is offered in two tiers. The first is a pre-built neural voice that contains natural ready-made sounds. To use it, users can create an Azure account and subscribe to the voice service. They can then use the Speech SDK or visit the Speech Studio portal to select ready-made voices.

On the other hand, Microsoft also offers the ability to create custom neural sounds. This feature is called Custom Neural Voice. It’s easy-to-use self-service for creating a natural brand voice, with limited access for responsible use. Microsoft currently provides only limited access to this feature.