If you want ChatGPT, Perplexity, and Claude to cite your site, you have to make it readable to them first. "Readable" doesn't mean keyword-stuffed — it means parseable without a browser, structured without ambiguity, and explicit about what each page is. Here's the order to do it in.
Step 1 — Server-render the HTML
LLM crawlers don't run your JavaScript. If your homepage is a React shell that hydrates with content, agents see an empty div. Use SSR, SSG, or a framework that ships HTML on first byte. Test with: curl -sL https://yoursite.com | grep -i "your headline". If your headline isn't in the output, no agent will ever read it.
Step 2 — Use semantic HTML
Replace div soup with real landmarks: header, nav, main, article, section, footer. Use one h1 per page, then h2/h3 in order. Lists are ul/ol. Quotes are blockquote. Forms have labels. This isn't pedantry — it's the only signal an LLM has about the shape of your page.
Step 3 — Add JSON-LD on every page type
Minimum viable set:
- Organization on the root layout — name, url, logo, sameAs links
- WebSite with a SearchAction if you have search
- Article on every blog post — headline, datePublished, author, image
- Product or Service on commercial pages — name, description, offers
- FAQPage where you have Q&A
- BreadcrumbList on deep routes
Validate with Google's Rich Results test. Invalid JSON-LD is worse than none — it gets silently ignored.
Step 4 — Ship a /llms.txt
A markdown file at the root summarizing the site: a one-line tagline as a blockquote, then ## sections for Services, Pricing, Pages, Contact. Pricing in plain numbers. Every public URL listed. No HTML, no JS, no images. Serve it as text/plain.
Step 5 — Allow the AI crawlers explicitly in robots.txt
Named Allow directives for GPTBot, OAI-SearchBot, ChatGPT-User, PerplexityBot, ClaudeBot, anthropic-ai, Google-Extended. Default Allow: / isn't enough — some bots honor only their named block.
Step 6 — Keep your sitemap fresh
One entry per public route, with accurate lastmod. Agents use sitemap to discover, llms.txt to summarize, and crawled HTML to verify. All three have to agree.
Step 7 — Audit and re-audit
Fetch any page with curl. Read the raw HTML. Can you answer "what is this page about, what does it offer, what does it cost?" without rendering anything? If yes, an agent can too. If not, work backwards through the steps above.
The shortcut: score it
Run your URL through our free Agent Readability Checker at /check. It crawls the page the same way LLMs do and gives you a weighted score across all of the above, with the exact fixes prioritized.