Cloudflare Accuses Perplexity AI of Evading Blocks and Impersonating Browsers to Scrape Websites
Cloudflare, the company safeguarding millions of websites from digital threats, has issued grave accusations against the AI search engine Perplexity AI. According to their investigation, this popular AI assistant employs covert methods to extract information from websites, flouting established norms and blatantly disregarding site owners’ directives.
At the heart of the controversy lies the manner in which Perplexity collects data to generate its responses. When a website explicitly blocks automated access using the robots.txt
file—a virtual “No Entry” sign for bots—Perplexity refuses to heed the warning. Instead, it disguises itself as a standard Chrome browser running on macOS, attempting to bypass protective barriers through deception.
To validate their suspicions, Cloudflare’s experts devised a clever experiment. They created several brand-new, undisclosed websites that were entirely off-limits to crawlers. Then, they queried Perplexity about the contents of these hidden sites. To their astonishment, the AI returned detailed summaries—despite having no legitimate access to such information.
It was revealed that Perplexity operates through a dual-pronged approach. Initially, it presents itself transparently as “Perplexity-User,” making approximately 20–25 million requests daily. But when this official bot is blocked, a shadow agent takes over—one that masquerades as a regular browser, issuing an additional 3–6 million requests per day under false pretenses.
Worse still, when even this disguise is thwarted, the system resorts to cycling through various IP addresses and internet service providers to circumvent restrictions. It’s akin to a barred intruder who, when denied entry at the front door, attempts to sneak in through a window—or perhaps the back entrance.
Cloudflare drew a stark contrast between this behavior and that of OpenAI, the creator of ChatGPT. When subjected to the same test, OpenAI’s crawler, “ChatGPT-User,” reviewed the site’s robots.txt
, recognized the restriction, and ceased its attempts—no subterfuge, no evasion, just adherence to the publisher’s wishes.
For over thirty years, the internet has been built on a foundation of trust. There exists an unwritten code: bots should be transparent, clearly identify themselves, perform well-defined tasks, and—most critically—respect the intentions of website owners. Perplexity’s conduct, Cloudflare contends, runs counter to these foundational principles.
In response to these violations, Cloudflare has removed Perplexity from its list of verified bots and implemented specialized algorithms to detect and block its covert crawling activity. Moreover, the company has introduced new protective measures, empowering website operators to effortlessly block undesired AI traffic.
This incident has brought to light a pressing issue in today’s digital ecosystem. On one hand, AI systems require vast datasets to function and evolve. On the other, content creators must retain sovereignty over how their work is used. Already, more than 2.5 million websites have opted out of having their content used for AI training.
Cloudflare warns that evasive tactics will likely grow more sophisticated, but assures it is prepared to evolve its defenses accordingly. The company is currently collaborating with international technical organizations to establish clear, enforceable standards for AI bot behavior across the web.