Websites claim that AI startup Anthropic is circumventing their anti-scraping regulations and protocols.

Febin MathewJuly 29, 24 Anthropic, Generative AI

A freelancer has accused Anthropic, the AI startup responsible for the Claude large language models, of disregarding the “do not crawl” robots.txt protocol to scrape data from its websites. Additionally, iFixit CEO Kyle Wiens stated that Anthropic has violated the website’s policy against using its content for AI model training. Freelancer CEO Matt Barrie described Anthropic’s ClaudeBot as the most aggressive scraper, with the company’s crawler allegedly generating 3.5 million visits to his website in just four hours. Wiens also reported that Anthropic’s bot accessed iFixit’s servers a million times in a 24-hour period, causing strain on their devops resources.

In June, Wired accused another AI company, Perplexity, of crawling its website despite the Robots Exclusion Protocol, or robots.txt file. The robots.txt file usually contains instructions for the crawlers on which pages they can and cannot access. Although compliance is voluntary, it has been mostly ignored by bad bots. After Wired’s piece appeared, a startup called TollBit, which connects AI companies with content publishers, announced that it’s not just Perplexity that is bypassing robots.txt signals. Although it did not name names, Business Insider said it has learned that OpenAI and Anthropic also ignored the protocol.

Barrie said Freelancer initially tried to refuse the bot’s access requests, but eventually had to block Anthropic’s crawler entirely. “This is egregious scraping [that] slows down the site for everyone who uses it and ultimately affects our revenue,” he added. As for iFixit, Wiens said the website has set alarms for high traffic, and his people were woken up at 3 a.m. by Anthropic’s activity. The company’s crawler stopped hijacking iFixit after adding a line to its robots.txt file specifically blocking Anthropic’s bot.

The AI startup told The Information that it respects the robots.txt file and that its crawler “respected the signal when it was deployed by iFixit.” It also said it seeks to “minimize disruption by considering how quickly [it indexes] the same domains,” which is why it is now investigating the incident.

AI companies use crawlers to gather content from websites that they can use to train their generative AI techniques. They have been the subject of several lawsuits as a result, with publishers accusing them of copyright infringement. Companies like OpenAI have entered into agreements with publishers and websites to prevent new lawsuits from being filed. OpenAI’s content partners so far have included News Corp, Vox Media, Financial Times and Reddit. iFixit’s Wiens appears to be open to the idea of signing a deal for articles on the repair guide website as well, telling Anthropic in a tweet that he’s open to discussing licensing the content for commercial use.

If any of those requests accessed our terms of service, they would have told you that use of our content expressly forbidden. But don’t ask me, ask Claude!

If you want to have a conversation about licensing our content for commercial use, we’re right here. pic.twitter.com/CAkOQDnLjD

— Kyle Wiens (@kwiens) July 24, 2024

AI

Details inside: ChatGPT Voice Mode to be launched for paid subscribers next week

Febin MathewJuly 29, 24Artificial Intelligence

OpenAI’s highly anticipated Voice Mode for ChatGPT is finally being released to a select group of...
AI

New rewards, AI-powered app discovery, and more coming soon to Google Play Store: Find out more here

Adriana MaraisJuly 25, 24Artificial Intelligence, Google, Google AI, Google I/O, Play Store

The PlayStore is introducing AI technology to review apps and a FAQ feature to help users...
AI

New rewards, AI-powered app discovery, and more coming soon to Google Play Store: Find out more here

Adriana MaraisJuly 25, 24Artificial Intelligence, Google, Google AI, Google I/O, Play Store

The PlayStore is introducing AI technology to review apps and a FAQ feature to help users...

Websites claim that AI startup Anthropic is circumventing their anti-scraping regulations and protocols.

Related posts

Leave a Comment Cancel reply

The Bioshock movie is moving forward with a smaller budget

Amazon releases the first teaser for its upcoming Yakuza adaptation

Modders create miniature Nintendo Wii that can be used as a keychain

Modders create miniature Nintendo Wii that can be used as a keychain

Respawn is reversing its unpopular alterations to the Apex Legends battle pass

World of Warcraft employees achieve ‘form a union’ milestone