How to Make a Website AI-Readable: A 2026 Step-by-Step

May 21, 2026·9 min read·Last updated May 28, 2026

If you want ChatGPT, Perplexity, and Claude to cite your site, you have to make it readable to them first. "Readable" doesn't mean keyword-stuffed — it means parseable without a browser, structured without ambiguity, and explicit about what each page is. Here's the order to do it in.

Step 1 — Server-render the HTML

LLM crawlers don't run your JavaScript. If your homepage is a React shell that hydrates with content, agents see an empty div. Use SSR, SSG, or a framework that ships HTML on first byte. Test with: curl -sL https://yoursite.com | grep -i "your headline". If your headline isn't in the output, no agent will ever read it.

Step 2 — Use semantic HTML

Replace div soup with real landmarks: header, nav, main, article, section, footer. Use one h1 per page, then h2/h3 in order. Lists are ul/ol. Quotes are blockquote. Forms have labels. This isn't pedantry — it's the only signal an LLM has about the shape of your page.

Step 3 — Add JSON-LD on every page type

Minimum viable set:

Organization on the root layout — name, url, logo, sameAs links
WebSite with a SearchAction if you have search
Article on every blog post — headline, datePublished, author, image
Product or Service on commercial pages — name, description, offers
FAQPage where you have Q&A
BreadcrumbList on deep routes

Validate with Google's Rich Results test. Invalid JSON-LD is worse than none — it gets silently ignored.

Step 4 — Ship a /llms.txt

A markdown file at the root summarizing the site: a one-line tagline as a blockquote, then ## sections for Services, Pricing, Pages, Contact. Pricing in plain numbers. Every public URL listed. No HTML, no JS, no images. Serve it as text/plain.

Step 5 — Allow the AI crawlers explicitly in robots.txt

Named Allow directives for GPTBot, OAI-SearchBot, ChatGPT-User, PerplexityBot, ClaudeBot, anthropic-ai, Google-Extended. Default Allow: / isn't enough — some bots honor only their named block.

Step 6 — Keep your sitemap fresh

One entry per public route, with accurate lastmod. Agents use sitemap to discover, llms.txt to summarize, and crawled HTML to verify. All three have to agree.

Step 7 — Audit and re-audit

Fetch any page with curl. Read the raw HTML. Can you answer "what is this page about, what does it offer, what does it cost?" without rendering anything? If yes, an agent can too. If not, work backwards through the steps above.

The shortcut: score it

Run your URL through our free Agent Readability Checker at /check. It crawls the page the same way LLMs do and gives you a weighted score across all of the above, with the exact fixes prioritized.