Reddit sues AI company Anthropic for allegedly ‘scraping’ user comments to train chatbot Claude

Read Time:2 Minute, 47 Second

Social media platform Reddit sued the artificial intelligence company Anthropic on Wednesday, alleging that it is illegally “scraping” the comments of millions of Reddit users to train its chatbot Claude. Reddit claims that Anthropic has used automated bots to access Reddit’s content despite being asked not to do so, and “intentionally trained on the personal data of Reddit users without ever requesting their consent.”

Anthropic said in a statement that it disagreed with Reddit’s claims “and will defend ourselves vigorously.”Reddit filed the lawsuit Wednesday in California Superior Court in San Francisco, where both companies are based.

“AI companies should not be allowed to scrape information and content from people without clear limitations on how they can use that data,” said Ben Lee, Reddit’s chief legal officer, in a statement Wednesday.

Story continues below this ad

Reddit has previously entered licensing agreements with Google, OpenAI and other companies that are paying to be able to train their AI systems on the public commentary of Reddit’s more than 100 million daily users. Those agreements “enable us to enforce meaningful protections for our users, including the right to delete your content, user privacy protections, and preventing users from being spammed using this content,” Lee said.

The licensing deals also helped the 20-year-old online platform raise money ahead of its Wall Street debut as a publicly traded company last year.

Anthropic was formed by former OpenAI executives in 2021 and its flagship Claude chatbot remains a key competitor to OpenAI’s ChatGPT. While OpenAI has close ties to Microsoft, Anthropic’s primary commercial partner is Amazon, which is using Claude to improve its widely used Alexa voice assistant.

Much like other AI companies, Anthropic has relied heavily on websites such as Wikipedia and Reddit that are deep troves of written materials that can help teach an AI assistant the patterns of human language. In a 2021 paper co-authored by Anthropic CEO Dario Amodei — cited in the lawsuit — researchers at the company identified the subreddits, or subject-matter forums, that contained the highest quality AI training data, such as those focused on gardening, history, relationship advice or thoughts people have in the shower.

Story continues below this ad

Anthropic in 2023 argued in a letter to the U.S. Copyright Office that the “way Claude was trained qualifies as a quintessentially lawful use of materials,” by making copies of information to perform a statistical analysis of a large body of data. It is already battling a lawsuit from major music publishers alleging that Claude regurgitates the lyrics of copyrighted songs.

But Reddit’s lawsuit is different from others brought against AI companies because it doesn’t allege copyright infringement. Instead, it focuses on the alleged breach of Reddit’s terms of use, and the unfair competition, it says, was created.

Source link