AI Bots Are Now Flooding the Web, Straining Servers and Skewing Analytics
In its latest Fastly Threat Insights report, researchers analyzed more than 6.5 trillion monthly web requests to uncover emerging patterns in AI bot traffic. This rapidly expanding segment of automated systems is already exerting a profound impact on internet infrastructure, website performance, and methods of content access.
According to the findings, the peak activity of individual bots can reach 39,000 requests per minute to a single resource—enough to overwhelm even large-scale servers and trigger effects akin to DDoS attacks. The most frequent targets include platforms in e-commerce, entertainment, and high-tech industries, whose dynamic databases and constantly updated catalogs are particularly valuable to developers of language models. As a result, site owners face rising operational costs, skewed analytics, and diminished performance.
The bulk of this traffic—roughly 80%—comes from crawlers harvesting content for training AI models. Of these, Meta accounts for over half, followed by Google at 23% and OpenAI at 20%. By contrast, fetchers—bots that retrieve pages in real time during a user query—constitute only one-fifth of activity. Yet they are the most disruptive: nearly the entire segment belongs to OpenAI, with ChatGPT and OAI-SearchBot generating 98% of live queries. Competitors such as Perplexity currently contribute smaller volumes but are steadily gaining ground.
The geography of data sources further shapes the ecosystem. Most training content originates in North America, embedding cultural and political biases into many models. By contrast, Diffbot and ICC Crawler demonstrate broader coverage, drawing from Europe, the Middle East, and Africa, while Japanese actors like SoftBank and NICT focus on local internet domains.
Regional and sectoral differences are striking. In North America, nearly 90% of traffic comes from crawlers, while in Europe fetchers dominate with 59%. For the education sector, fetchers are the primary concern, as students and researchers heavily rely on ChatGPT, translating directly into higher resource loads. The media and entertainment industries experience similar spikes as bots repeatedly access fresh publications and breaking news. Meanwhile, in healthcare, government, and e-commerce, crawlers account for up to 96% of activity.
Fastly underscores that 87% of overall bot traffic is malicious, ranging from credential theft to ad fraud. With AI bots, the risks extend to unchecked content exploitation and covert monetization of third-party resources. To mitigate these threats, the company recommends a multi-layered defense: employing standards such as robots.txt and X-Robots-Tag, implementing CAPTCHAs, applying rate limits, and deploying specialized bot-management solutions. Redirecting traffic to licensed platforms is also suggested, enabling both access control and revenue generation when content is used for model training.
Finally, the report emphasizes the role of responsible operators. Researchers urge them to adopt transparency measures: publish IP ranges, use clearly identifiable User-Agents, honor robots.txt rules, and regulate request frequency. OpenAI serves as a positive example by publicly disclosing its bot IP ranges, while Common Crawl adheres to a predictable scanning schedule, allowing site owners to prepare. In contrast, failing to follow such practices leads to blocking and growing mistrust, whereas transparency fosters a more sustainable relationship between AI developers and the broader internet community.