Apple clarifies that its Intelligence was not trained using stolen YouTube content
Apple, Nvidia, and other major tech companies are facing scrutiny following a report that exposed concerning details about their AI models. The report, published by Wired, revealed that these companies utilized subtitle files from over 170,000 videos of popular YouTubers like Marques Brownlee (MKBHD), PewDiePie, and MrBeast. Apple, in particular, faced backlash as it recently released the iOS 18 public beta with AI-based features generating buzz. While Apple sourced the content from a third-party non-profit organization, the company has issued a statement to protect its Apple Intelligence from any association with the situation.
How Apple ended up in this situation
Apple’s high-profile AI model called OpenELM was trained using datasets called Pile. Published by a company called EleutherAI, Pile contains unethically sourced scripts from thousands of YouTube videos. Although the company claims to help small developers and researchers train AI models, its datasets are open and available to anyone with enough computing power and space to use them. Apple is one of the companies that has reportedly used the data to train its AI model.
Why Apple Intelligence is different
While Apple has not denied using the Pile dataset to train OpenELM, it has confirmed to 9to5Mac that OpenELM does not support any of its machine learning or artificial intelligence, including Apple Intelligence. Apple says it introduced the OpenELM model to support research and promote open source development in large language models. Called the state-of-the-art open language model by Apple researchers, OpenELM is specifically designed for research purposes and is not integrated with Apple Intelligence services.
That means datasets like Pile, which contain YouTube video transcripts, aren’t part of Apple Intelligence, the iPhone’s next big thing. Instead, Apple Intelligence allegedly relies on licensed data, which includes curated and publicly available data collected using their web crawling technology. OpenELM is openly available through Apple’s machine learning research website, underscoring Apple’s commitment to expanding the broader scientific community’s understanding and capabilities in language modeling.