The llms.txt debate has split into two camps, and both are selling certainty. One camp calls it "the new robots.txt" and the next SEO land grab: add one file, win AI traffic. The other quotes Google and declares the whole idea dead on arrival. Both readings skip the data — and by mid-2026 the data is good enough to settle most of the argument.
Here is the honest version up front. Server logs show AI crawlers barely request the file. Google's Search guidance does not use it. And yet Google's own Chrome team now audits for it, the agentic-browsing layer it serves is real, and the cost of shipping one is roughly twenty minutes. We publish our own — /llms.txt and /llms-full.txt — and later in this guide we annotate it line by line, so you can see what a working file looks like and why we bothered.
What follows: the spec, the May 2026 Google contradiction nobody reconciles, what 515 million bot events say about adoption, the full crawler-permission stack, the crawl-to-click economics behind the blocking debate, and a block-or-open decision matrix by business type.
What llms.txt is — and what it is not
llms.txt is a plain-markdown file at your domain root that gives language models a curated index of your site: who you are, which pages are canonical, where the authoritative answers live. Jeremy Howard, co-founder of Answer.AI and fast.ai, proposed the spec on September 3, 2024. The premise is practical rather than visionary: HTML built for humans is noisy — navigation, scripts, consent banners — and model context windows are finite, so hand the machine a clean map instead of making it excavate one.
The spec has two tiers. /llms.txt is the short index: a summary plus curated links. /llms-full.txt is the maximalist variant: the full content inlined into one machine-readable document, so an agent can load everything about you in a single request.
Equally important is what the file is not. It is not robots.txt — it grants nothing and forbids nothing, and no enforcement mechanism exists behind it. It is not a ranking signal; no search engine has said it reads the file for ranking. And it is not access control: a crawler that ignores it loses nothing. robots.txt says "here is what you may fetch." llms.txt says "here is what is worth reading." Those are different jobs, and conflating them produces most of the bad takes.
The Google contradiction of May 2026
Within ten days in May 2026, Google made two moves that point in opposite directions — which is why both camps can quote Google with a straight face.
Move one: on May 5, 2026, Google added an llms.txt audit to Lighthouse, its site-quality tooling, under a new agentic browsing category. The audit flags your site if fetching /llms.txt returns a server error, and the documentation states the rationale plainly: "Without this file, agents may spend more time crawling the site to understand its high-level structure and primary content" (ppc.land).
Move two: on May 15, 2026, Google published its official guidance on optimizing websites for generative AI features in Search — AI Overviews and AI Mode. llms.txt is absent from it. The guidance reaffirms what Google's search representatives have said since the spec appeared: standard technical SEO is what counts for AI features in Search, and the file is unnecessary for that purpose.
The reconciliation is that there is no contradiction — there are two layers. Google Search, including AI Overviews, ranks and cites content from its existing HTML index; llms.txt plays no role there today, and Google has been consistent about it. Agentic browsing — an AI agent visiting your site to complete a task on a user's behalf — is a different consumption pattern with different needs, and that is the layer the Chrome team started auditing. Anyone telling you "Google requires it" or "Google killed it" is quoting one layer and ignoring the other.
What server logs show: adoption reality
The adoption story has two halves: publishers increasingly ship the file, and crawlers mostly ignore it.
An aggregate analysis of 515 million bot events found that requests for /llms.txt amount to a negligible share of AI-crawler traffic — a rounding error against the volume of page fetches (aeo.press). GPTBot, ClaudeBot, and PerplexityBot overwhelmingly request HTML pages, the way search crawlers always have. The pipelines that feed training corpora and retrieval indexes are engineered for HTML at web scale; a parallel markdown file is an optimization those pipelines have not adopted.
On the publishing side, 7.4 percent of Fortune 500 companies — 37 of 500 — had shipped an llms.txt by March 31, 2026 (ppc.land). Developer-documentation sites adopted it far faster, because coding agents are the one consumer that demonstrably reads these files today.
So the honest summary of the logs: shipping llms.txt does not measurably change how major AI crawlers read your site in 2026. Anyone selling it as an AI-traffic unlock is selling ahead of the evidence.
The 2026 AI-crawler permission stack
llms.txt gets debated in isolation, but it is one instrument in a five-layer stack that controls — or tries to control — what AI systems do with your content.
| Layer | What it controls | Who enforces it | Compliance reality | Our verdict |
|---|---|---|---|---|
| robots.txt directives (GPTBot, ClaudeBot, PerplexityBot, Google-Extended) | Whether declared crawlers may fetch your pages; Google-Extended governs Gemini training, not Search | Nobody — a voluntary protocol | Major labs honor their declared bots; disputes exist — Cloudflare accused Perplexity of using undeclared crawlers to evade blocks in 2025 | Your actual on/off switch — configure it deliberately |
| Content Signals Policy (contentsignals.org) | Declares how fetched content may be used: search, ai-input, ai-train | Nobody technically; frames the signals as a reservation of rights | Too new to measure; spreading via Cloudflare-managed robots.txt | Costs nothing; speaks to lawyers more than to bots |
| Cloudflare default block (Nieman Lab) | Blocks known AI crawlers at the network edge for new domains, since July 1, 2025 | Cloudflare — blocked requests never reach your server | Actually enforced across a large share of the web | The only layer with teeth; flip it consciously, not by inheritance |
| Pay-per-crawl | Charges AI crawlers per request instead of blocking outright | Cloudflare's marketplace, in beta | Early-stage; depends on labs agreeing to pay | Relevant to large publishers, not to B2B sites |
| llms.txt / llms-full.txt | Nothing — an advisory reading list for models and agents | Nobody | Negligible fetch rates in server logs; Lighthouse now audits for its presence | Cheap insurance for the agentic web; zero SEO effect today |
Notice the pattern. The layers people argue about — llms.txt, Content Signals — are advisory. The layer that changed crawler behavior overnight is Cloudflare's edge, and it is the one most site owners never consciously configured.
Crawl-to-click economics: what "giving content to AI" returns
The block-by-default instinct rests on an economic fact, so state it plainly. Cloudflare Radar's Q1 2026 data put the crawl-to-refer ratio — pages fetched per referred human click — at roughly 1,276:1 for OpenAI's GPTBot and roughly 23,951:1 for Anthropic's ClaudeBot (Cloudflare). Traditional search crawling repaid sites at ratios orders of magnitude lower. AI systems consume content at industrial scale and return almost no direct traffic.
If your business monetizes pageviews, that asymmetry is close to existential, and the publisher revolt — along with Cloudflare's pay-per-crawl experiment — follows logically from it.
But the ratio measures clicks, and clicks are not the only return. The crawl that never sends a visitor still determines whether the model knows you exist, describes you accurately, and names you when a buyer asks for a shortlist. For a B2B company, the AI answer often is the touchpoint: a prospect asks ChatGPT to compare vendors, gets an answer synthesized from whatever the crawlers could read, and your analytics never register the encounter. We unpacked that shift in AEO vs GEO vs SEO — the goal moves from winning the click to being the retrieved, accurately cited answer.
Block or open? A decision matrix by business type
There is no universal answer, because the crawl-to-click math cuts differently depending on what your content is for.
| Business type | Revenue logic | AI crawlers | llms.txt | Reasoning |
|---|---|---|---|---|
| Publisher / media | Pageviews and subscriptions are the product | Block or negotiate via pay-per-crawl | Skip | At 1,276:1 and worse, open access is a subsidy to someone else's product |
| B2B brand / services | The site is a sales asset; being known beats being visited | Open | Ship it | You want to be retrievable when buyers ask AI for vendors |
| E-commerce | Product data drives discovery; agents increasingly assist purchases | Open; watch infrastructure costs | Ship it, with product and policy URLs | Absence at the moment an agent compares options is lost revenue |
| Content licensing | The content itself is the asset being priced | Block, then negotiate | Skip | Scarcity is the negotiating leverage |
For most B2B companies — our clients, and us — the answer is open. Your marketing site exists so the market knows what you do. An AI system that reads it and repeats it accurately to a prospect is doing the site's job for free. Blocking GPTBot to protect content whose entire purpose is to be known inverts the strategy. And as purchasing shifts toward business-to-agent (B2A) interactions, where software completes tasks a human used to do, retrievability compounds: the asset worth protecting is accuracy, not access. Keeping that accuracy across AI platforms is the core of AI visibility work.
The closing-web consequence: blocked sites make Wikipedia your proxy
Now the second-order effect almost nobody prices in. Cloudflare blocks AI crawlers by default for new domains. Publishers block or meter access. Pay-per-crawl prices what used to be free. The open web, as an AI crawler sees it, is shrinking.
Models still need grounding sources, so retrieval concentrates on the high-authority corpora that remain open by design: Wikipedia, Wikidata, public registries, academic repositories. Wikipedia's free license permits reuse, its content is structured and cited, and it sits behind no crawl wall. Every site that closes makes the sources that stay open weigh more in what AI systems know and say.
The consequence for a brand is direct. If your own site is dark to crawlers — by choice or by your CDN's defaults — then your Wikipedia article, your Wikidata entity, and the other open sources become the de facto record AI reads about you. That is the strategic tie the llms.txt debates miss, and it is why we treat encyclopedic presence as infrastructure rather than vanity: it is the part of your record that stays retrievable no matter how the permission stack evolves. The mechanics are covered in Wikipedia AEO and our Wikidata and knowledge-graph service; the broader tactical picture is in Wikipedia SEO tactics for 2026.
Our own llms.txt, annotated
We publish both tiers — wikibusines.net/llms.txt and wikibusines.net/llms-full.txt — regenerated from the site's canonical data, and you can read them live. Here are real lines from the short file, with the reasoning behind each choice:
# WikiBusines — LLM-readable summary
WikiBusines is a trust-infrastructure and AI-visibility company.
Full machine-readable profile (all services, prices, FAQ, blog index):
https://www.wikibusines.net/llms-full.txt
- Founded: 2010 — operating 15+ years
- Publication success rate (past year): 93%
…
- Wikipedia Notability Audit (€490 / €750 / €1,900, credited toward
project): https://www.wikibusines.net/wikipedia-notability-audit
…
## What we do not claim
- We do not guarantee Wikipedia publication. We run a risk-managed,
source-first process and recommend alternative routes when notability
is insufficient.
The first sentence defines the entity in one line. If a model reads only twenty tokens of your file, those tokens should say what you are. Write it like a dictionary definition, not a slogan.
The full-profile pointer implements the spec's two-tier design. The index stays skimmable; an agent that wants everything follows one link and gets every service, price, and FAQ answer in a single fetch.
Facts carry numbers and dates. "Founded: 2010" and "93%" are claims a model can retrieve and repeat precisely. Adjectives are not.
Service lines pair canonical URLs with prices. When an agent is asked what a notability audit costs, the answer and the destination sit on the same line.
The "What we do not claim" section is the part most companies would never write. Models echo their sources; if your file overclaims, the AI answer overclaims, and the prospect's first call begins with a correction. Stating the limits of your own service is accuracy insurance — the same honest-difference logic we apply on every page.
Total effort: about twenty minutes, plus regeneration when facts change. The realistic payoff in 2026 is agent readability and a clean Lighthouse audit, not rankings. We treat it as cheap insurance, priced accordingly.
FAQ
Does llms.txt help SEO?
There is no evidence that it does. Google's May 2026 Search guidance does not use the file, and no search engine has announced reading it for ranking or for AI Overviews. If AI-search citations are the goal, the work remains conventional: crawlable HTML, structured data, and authoritative third-party sources about you.
Will ChatGPT actually read my llms.txt?
Rarely, on current evidence. Analyses spanning hundreds of millions of bot events show GPTBot and its peers fetching HTML and largely ignoring /llms.txt. The file's near-term consumers are agentic browsers and coding tools — plus Lighthouse, whose audit signals where Google's Chrome team thinks this is heading.
Should a small company bother?
It costs about twenty minutes and changes nothing you can measure today, so treat it as optional, low-cost insurance. Skipping it is reasonable; doing it properly is cheap. If you ship one, keep it accurate and regenerate it when facts change — a stale file that misstates your prices is worse than no file.
Should I block AI crawlers while I decide?
First check whether you already are. If your domain joined Cloudflare after July 1, 2025, AI crawlers may be blocked by default without anyone at your company having decided anything. Whatever your position, make it a decision rather than an inherited setting.
llms.txt is the cheapest and least consequential layer of AI readability. The consequential layers are whether the sources AI systems trust — Wikipedia, Wikidata, the knowledge platforms — describe you accurately, and whether a machine-readable record of your company exists at all. That stack is what we build: see the LLM Hub for the full architecture, or start by opening our llms.txt next to your own domain's. If yours returns a 404, you now know precisely what that is and is not costing you.