Google’s New Privacy Policy Allows for Use of Public Data to Train AI Models
Google has recently revised its privacy policy, specifying that it may utilize publicly accessible data to enhance its AI models. The company modified the language of its policy, replacing “AI models” with “language models.” Additionally, Google declared its intention to employ publicly available information not only for creating features but also for developing complete products such as “Google Translate, Bard, and Cloud AI capabilities.” Through this policy update, Google aims to inform users that any content they publicly share online could potentially be utilized to train Bard, future iterations of the technology, and any other generative AI products that Google may create.
The tech giant has highlighted the changes to its privacy policy in its archives, but here’s a copy of the relevant part:
Critics have raised concerns that companies are using data published online to train their large language models for generative AI use. Recently, a proposed class-action lawsuit was filed against OpenAI, accusing it of “removing massive amounts of personal data from the Internet,” including “stolen private data,” to train its GPT models without prior permission. As Search Engine Journal points out, we’re likely to see a lot of similar lawsuits in the future as more and more companies develop their own generative AI products.
Owners of websites, considered the public squares of the digital age, have also taken steps to either prevent or capitalize on the AI boom. Reddit has begun charging for access to its API network, leading to the shutdown of third-party clients over the weekend. Meanwhile, Twitter imposed a limit on how many tweets a user can see per day “to address extreme data scraping [and] system manipulation.”