Cloudflare Accuses Perplexity AI of Unauthorized Web Scraping; Startup Denies Allegations
2025-08-06
Cloudflare, a leading internet infrastructure provider, has accused AI startup Perplexity AI, led by CEO Aravind Srinivas, of covertly scraping data from websites that explicitly prohibit such actions. According to Cloudflare’s research blog, Perplexity allegedly used deceptive methods to bypass website restrictions, raising serious concerns about ethical AI practices and compliance with web standards.
What Did Cloudflare Discover?
Cloudflare stated that Perplexity initially uses its declared user agent for crawling, but when blocked, it masks its identity to continue accessing content. This activity was reportedly detected across tens of thousands of domains and millions of daily requests. Cloudflare said its customers reported that Perplexity’s bots were bypassing robots.txt restrictions, even after blocking them. Using machine learning and network analysis, Cloudflare claims it successfully fingerprinted Perplexity’s crawlers engaging in these activities.
The report emphasized that AI models like Perplexity rely heavily on large-scale data scraping of text, images, and videos. However, Cloudflare alleges that Perplexity’s methods circumvent publishers’ preferences, potentially violating ethical and legal boundaries.
Perplexity AI Responds
In response, Perplexity denied the allegations via X (formerly Twitter), calling Cloudflare’s claims misleading and exaggerated. The startup clarified that its approach differs from traditional web crawling, explaining that Perplexity uses user-driven agents — these only fetch content when specifically requested by a real user and do not store or train on the retrieved data.
Perplexity stated, “Unlike mass crawlers that systematically index web pages, our agents fetch content only for immediate use, ensuring user queries are addressed without large-scale data hoarding.”
This controversy underscores the growing tension between AI companies’ data collection practices and publishers’ rights to control their content. As AI-driven platforms like Perplexity continue to expand, ethical data usage, transparency, and compliance with web standards are becoming critical industry discussions. With AI web scraping at the center of regulatory debates, this clash between Cloudflare and Perplexity highlights the urgent need for clearer guidelines on AI data collection and usage policies.
See What’s Next in Tech With the Fast Forward Newsletter
Tweets From @varindiamag
Nothing to see here - yet
When they Tweet, their Tweets will show up here.