Microsoft assures that consumer data was not part of the leaked database which seems to have been created for training AI models.

Microsoft AI Group Accidentally Exposes 38TB of Internal Data

Adriana MaraisSeptember 19, 23 Microsoft

Microsoft has suffered a major problem this week as its AI research has accidentally leaked 38TB of Microsoft employees’ personal data.

The leak was discovered by a cybersecurity firm called Wiz, and worryingly, the leaked data includes passwords for Microsoft services, secret keys, and messages from the company’s internal Teams account of more than 30,000. Microsoft claims that no consumer data has been compromised, and even internal services are not at risk.

According to the report, the file included a link to a database that allowed any researcher to download data to train their own AI models, and the company made the decision completely clear.

The link was created using Microsoft Azure, which has a feature called SAS tokens that allows them to create links that can be shared with others through their Azure Storage account.

The problem is that Microsoft ended up sharing a link that gave access to the entire storage account, highlighting the flawed part of this leak. The Wiz team reported the issue to Microsoft on June 22, and the company had the SAS token revoked within 24 hours.

Data leakage is a concern, but the company has mentioned that care must be taken when creating such SAS credentials to avoid larger accidents in the future. In fact, Microsoft has now spoken about best practices for using SAS tags, and it’s likely that the company will use these methods themselves.

Microsoft is building a large repository of data to train various AI models, though it has used OpenAI’s expertise with ChatGPT in its consumer and enterprise products. We already have an AI chatbot in Microsoft Edge, Bing Search and even Office.