Cloudflare Redraws the Line on AI Crawlers
Cloudflare will sort AI bots into three behaviors — Search, Agent, and Training — and from September 15 block Training and Agent crawlers by default on ad-supported pages. Multi-purpose crawlers like Googlebot get judged by all their behaviors, forcing operators to separate indexing from ingestion.
Cloudflare, which sits in front of roughly a fifth of the web, is redrawing the rules for how AI bots reach the sites it protects. On Tuesday, July 1, the company unveiled a new framework that stops treating "AI crawlers" as one undifferentiated swarm and instead sorts them into three named behaviors — and it will start blocking two of them by default on pages that carry ads.
The taxonomy is the heart of the change. Cloudflare now splits automated traffic into Search ("any behavior that collects or indexes your content, so it can answer questions about it later"), Agent ("automated behavior that is acting, usually in real time, on a person's behalf, to get something done right now"), and Training ("a crawler taking your content to train or fine-tune a model" where "your data is permanently absorbed"). Website owners can now allow, block, or eventually charge each category independently rather than making one all-or-nothing choice.
Starting September 15, new domains onboarding to Cloudflare will get a default posture: Search stays allowed everywhere, while Training and Agent bots are blocked by default on ad-supported pages. The reasoning, Cloudflare says, is that "an ad is a signal that a website owner meant for a person to land there and see it — something monetizable that fuels the business." Existing customers keep their current settings, and everyone can opt out of the new defaults in their Security settings before the deadline. As the company puts it, "customer choice is paramount."
The sharpest edge of the policy is aimed at multi-purpose crawlers. Bots like Googlebot, Bingbot, and Applebot fetch content for both search indexing and model training, and Cloudflare will now judge them "according to all of their behaviors" — meaning a site that opts to block Training will also block those crawlers, at the risk of vanishing from search results. That is the crux of the fight, as NBC News and The Register both noted: publishers have long been forced to accept training scrapes as the price of being found. Cloudflare's explicit hope is that the pressure pushes crawler operators to separate their search, agent, and training activity so sites can say yes to one without saying yes to all.
The move extends Cloudflare's year-long campaign to build a business model for content on an AI-saturated web, following its "pay per crawl" experiments and the agent-provisioning work it did with Stripe. It arrives as publishers watch referral traffic erode while chatbots answer questions using material scraped from their pages, and it hands them the clearest technical lever yet to draw a line between being indexed and being ingested.
Want AI news before everyone else?
The morning's most important AI stories, straight to your inbox. No fluff.