// Reference
GEO Glossary
22 terms covering the agent-native web — llms.txt, MCP, JSON-LD, the major crawlers, and the metrics that decide whether AI engines cite a site. Each term is a citable canonical answer with DefinedTerm JSON-LD.
// Crawlers
- robots.txtA text file at /robots.txt that tells crawlers which paths they may or may not fetch.
- OAI-SearchBotOpenAI's crawler for ChatGPT search citations — distinct from GPTBot (training).
- PerplexityBotPerplexity's primary crawler for live search citations — bursty traffic, listicle-favoring.
- ClaudeBot / Claude-SearchBotAnthropic's crawlers — ClaudeBot for general fetches, Claude-SearchBot for Claude's live web search.
- GPTBotOpenAI's training-data crawler — blocking it opts out of model training but does NOT block ChatGPT citations.
- Google-ExtendedGoogle's training-data opt-out user-agent — does NOT affect Googlebot or AI Overviews.
// Infrastructure
// Metrics
// Optimization
- Generative Engine Optimization (GEO)The practice of optimizing content and infrastructure to be cited by AI engines like ChatGPT, Perplexity, and Google AI Overviews.
- Information GainHow much novel information a page adds to what an AI engine already knows from training data.
- CitabilityThe structural and stylistic properties that make a passage easy for an LLM to lift verbatim.
- Agent-NativeBuilt from the ground up to be operated by AI agents, not just readable by them.
- AI OverviewsGoogle's AI-generated answer panel that appears above traditional search results on 48% of queries.
// Protocols
- llms.txtA markdown file at /llms.txt that gives LLMs a curated map of a website's most important pages.
- llms-full.txtA single markdown file containing a site's full content, sized for an LLM context window.
- Model Context Protocol (MCP)An open standard from Anthropic for connecting LLMs to external tools, data sources, and APIs.
// Schema
- JSON-LDJSON-based structured data format used by schema.org to describe entities for search engines and LLMs.
- schema.orgA collaborative vocabulary of structured-data types maintained by Google, Microsoft, Yahoo, and Yandex.
- DefinedTerm (schema.org)A schema.org type for marking up a single glossary definition so LLMs can lift it as a definitional citation.
- FAQPageA schema.org type for marking up question-and-answer pairs so search engines and LLMs can extract them directly.
Score your own site
Run the same six-signal check we use to measure every term in this glossary against any URL — free, no signup.
Run a free scan