Cloudflare Just Changed the Rules for AI Web Crawlers

July 3, 20265 min read

What Cloudflare Actually Did

On July 1 and 2, 2026, Cloudflare published a press release and accompanying blog posts laying out a significant shift in how it handles AI crawler traffic by default. The short version: starting September 15, 2026, new Cloudflare customers, newly added domains, and anyone on the free plan will have mixed-use AI crawlers blocked automatically on ad-supported pages, unless those crawlers separate their search indexing function from their AI training and agent functions.

That distinction matters, and it is easy to miss in the initial wave of coverage. Cloudflare is not announcing a blanket block on all AI bots. Search crawlers that index content for search results are still allowed by default. What gets blocked under the new defaults are crawlers that bundle search indexing together with AI training data collection or agentic use, specifically on pages where the site owner is running ads. The logic is direct: if your content is generating ad revenue for you, a crawler that simultaneously scrapes it for model training or autonomous agent use is extracting value without any exchange.

Cloudflare has been building toward this position for a while. Since implementing optional blocking tools starting in 2024, the company has blocked over 416 billion AI bot requests. What changes in September is that blocking mixed-use crawlers stops being something site owners have to opt into and becomes the starting position.

The Three-Category Split

The classification system itself is worth understanding in detail, because how Cloudflare draws the lines determines who gets blocked and who doesn't.

Search crawlers get the most permissive treatment under the new defaults. These are bots whose declared and verified function is indexing content for search results — the kind of crawling that, in theory, sends traffic back to your site. Cloudflare allows these by default because the exchange has some logic to it: you get indexed, you get found, someone visits.

Training crawlers are at the other end. Bots that exist to harvest content for model training data get blocked by default under the new policy on ad-supported pages. No traffic in return, no payment, no negotiation — just extraction. Cloudflare's position is that site owners running ads on their content are losing direct value every time a training crawler pulls that content without compensation.

Agent crawlers — bots acting autonomously on behalf of users or AI systems to retrieve and act on content in real time — also get blocked by default in the same context.

The harder case is the mixed-use crawler, which is really where the September 15 change bites. If a single crawler combines search indexing with training or agent functions, the new defaults treat it as a training or agent crawler on ad pages, not a search crawler. Crawlers that want search-level access need to separate those functions cleanly.

The pay-per-crawl beta fits here as a middle path. Publishers can issue a 402 response — the HTTP status code for payment required — and negotiate access with AI companies rather than simply blocking them outright. It is not a hard wall. It is a tollgate.

Who Gets Hit Hardest

The operators with the most exposure here are the ones running crawlers that do multiple jobs at once. That is a common architecture. A single bot that indexes content for search, collects training data, and handles agentic retrieval is more efficient to operate than three separate crawlers with distinct identities and declared purposes. Under the current system, that bundled approach has worked. After September 15, it stops working on ad-supported pages for anyone on Cloudflare's new defaults.

That means the companies most immediately disrupted are the ones who have not yet built the infrastructure to separate these functions cleanly. Larger AI labs with dedicated engineering resources can make that architectural change. Smaller operators running mixed-use crawlers on leaner budgets face a harder decision: redesign the crawler, negotiate access through the pay-per-crawl beta, or accept that a significant portion of the web just got harder to reach.

The 416 billion blocked requests figure is worth pausing on. That number accumulated since Cloudflare introduced optional blocking tools starting in 2024, and it reflects sites that chose to opt in. The September change flips the logic for new customers and free plan users — opt-out replaces opt-in. If 416 billion requests were blocked when blocking was still a choice most site owners had to actively make, the volume after September will be a different order of magnitude entirely.

What Site Owners Should Do Now

September 15 is close enough that the decisions you make in the next few weeks will determine whether the change affects you at all.

The first thing to do is open your Cloudflare dashboard and check which plan you are on and whether any of your domains were added after September 15 or will be. If you are on the free plan, the new defaults apply to you automatically when the date hits. If you are an existing paid customer with established domains, Cloudflare has indicated the change targets new customers and newly added domains first — but that window will not stay narrow indefinitely, and building a plan now costs nothing.

From there, pull up the bot management and crawler settings for your active domains. Cloudflare's granular controls let you make separate decisions for Search, Agent, and Training categories. Review each independently. If you want to allow search indexing but block training and agent crawlers across your ad-supported pages, you can set that explicitly rather than relying on the defaults to do it for you.

If you are running a publication with significant ad revenue and want to explore the pay-per-crawl beta instead of a hard block, that option exists as a middle path. It converts the access question from yes or no into a negotiation.

The one thing that does not make sense is waiting. The default is shifting, and your current settings will either match your intentions or they will not.

Share:Post Share