Reddit Sues Perplexity and Three Other Firms Over Unauthorized Data Scraping

Reddit has filed a lawsuit against AI startup Perplexity and three data‑scraping companies—SerApi, OxyLabs and AWMProxy—accusing them of extracting Reddit content from search results without a license. The complaint alleges that the defendants used the scraped material to power AI answer engines, violating Reddit’s licensing terms. Reddit, which has begun licensing its data to major tech firms, is seeking damages and an injunction to stop further unauthorized use. The case underscores the growing tension between online platforms and AI developers over the use of publicly available content for training models.

Background

Reddit, a major online community platform, has increasingly sought to monetize its vast repository of user‑generated posts by licensing the data to technology companies. The platform has entered agreements with prominent AI developers and has also experimented with its own AI answer tool that draws on Reddit content. To protect its intellectual property, Reddit has taken steps to limit unauthorized crawling and scraping of its site.

The Lawsuit

In a new legal action, Reddit alleges that four companies—Perplexity, SerApi, OxyLabs and AWMProxy—scraped Reddit posts from search‑engine results and incorporated that material into AI services without obtaining a license. The complaint claims the defendants bypassed Reddit’s licensing system, thereby depriving the platform of revenue and violating its terms of use. Reddit is seeking financial damages and a permanent injunction to prevent the defendants from selling or using the scraped content in the future.

Companies Involved

Perplexity, an AI answer‑engine startup, relies on large datasets to train its models. The lawsuit asserts that Perplexity quickly reproduced a test Reddit post that was deliberately placed on the web to be indexed only by search engines, demonstrating that the content was obtained through scraping. The other three defendants—SerApi, OxyLabs and AWMProxy—are described as firms whose business models center on collecting data from search results and reselling it to clients, including AI developers.

Reddit’s Response

Reddit says it provided a cease‑and‑desist notice to Perplexity, which claimed it did not use Reddit data but continued to cite the platform in its answers. Reddit’s legal team presented evidence that the test post was reproduced by the defendants’ systems shortly after it was indexed, supporting the claim of unauthorized scraping. The company has also taken technical measures such as rate‑limiting unknown bots and restricting access by certain web archives.

Implications for the AI Industry

The lawsuit highlights a broader conflict between online platforms that generate large volumes of user content and AI companies that need that content to train models. As platforms like Reddit move toward licensing agreements, they are asserting greater control over how their data is used. The outcome of this case could set precedents for how AI developers must obtain and pay for data, and may encourage stricter compliance with robots.txt and other web‑crawling standards.