Open the source of most marketing sites and you'll find a forest of div and span tags. It renders fine. It also tells crawlers and AI agents nothing about what's actually on the page. That's the bug semantic HTML fixes — and why a small wave of agencies are rebuilding sites around it.
What semantic HTML actually means
It's the difference between a styled div and an h1. Between a card div and an article. Between a wall of divs and a clear outline: header, nav, main, section, article, aside, footer.
Those tags are not decoration. They're the API your site exposes to every non-human reader: screen readers, Google, ChatGPT, Perplexity. When the tags match the meaning, machines understand the page. When they don't, machines guess — and guesses don't get cited.
Why this is back
For a decade, semantic HTML was a nice-to-have because Google was good enough at inferring structure from CSS classes. AI agents are not. They tokenize raw HTML and look for landmarks. A page built on h1 → h2 → p is a clean signal. A page built on styled divs is noise.
The sites that get cited by ChatGPT, Perplexity, and Claude share a common trait: a parser can extract the structure without rendering. Semantic HTML is what makes that possible.
The semantic checklist
- One h1 per page, matching the page topic
- Headings nest correctly — no skipping from h1 to h4
- main wraps the unique page content
- Lists are ul or ol, not divs with bullets
- Articles use article with a clear h2 headline
- Forms have labels tied to inputs by for/id
- Images have meaningful alt text — or empty alt if decorative
What an agency does differently
A semantic-first agency builds the document outline before the visual design. The wireframe is an HTML tree, not a Figma frame. Tailwind classes go on top of the right element, not in place of it. The result reads as well in a browser's reader mode as it does in production.
This discipline is what separates an "agent-native" build from a pretty template. Both look fine. Only one gets cited.
The compounding return
Semantic HTML pays off three times: accessibility scores rise, organic search picks up, and AI agents start naming you in answers. None of those are line items. All of them are leverage.