MMatt Goren
← AI hub
GuideAI Search & AEOAEO & AI Search

Schema & JSON-LD for AI Search: A Practical Setup

The schema types that actually help answer engines extract and cite you, with copy-pasteable JSON-LD, where to put it, and the mistakes that quietly break it.

By Matt Goren · Updated June 25, 2026 · 8 min read

Schema markup is the most under-used lever in AI search, and I think it's because people file it under "technical SEO chore" instead of "telling the machine exactly what to lift." When an answer engine reads your page, it's reconstructing meaning from HTML soup. JSON-LD hands it a clean, labeled summary: this is an article, here's the author, here's the published date, here are the questions and their answers. You're doing the extraction work for the engine, and engines reward content they can parse with confidence.

This is a practical setup guide. I'll cover which schema types matter for answer engines, give you copy-pasteable JSON-LD for each, tell you where to put it, and walk through the mistakes that quietly break the whole thing.

1. Why schema helps AI extract and cite

An answer engine has to answer two questions about your page before it can use you: what is this? and can I trust the specifics? Rendered HTML is ambiguous — is that bold line a heading, an author name, or a product title? Schema removes the guesswork by stating it in a structured format the machine reads natively.

That matters for citation in three concrete ways. First, identity: Organization and author markup tell the engine who's behind the claim, which feeds the authority signal that decides whether you're a citable source at all. Second, extraction: FAQPage and HowTo markup pre-package your content into the exact question-answer and step shapes that answer engines emit. Third, disambiguation: dates, types, and entities let the engine resolve "which thing is this" without guessing.

Think of it from the engine's side for a second. It crawls thousands of pages an hour, each one a wall of divs and spans. The pages it can act on confidently are the ones that announce themselves. Schema is your page raising its hand and saying, in a language the parser speaks natively, "I am an article, by this author, answering these specific questions." You're not hoping the engine infers correctly — you're stating it. That shift from inference to declaration is the entire value, and it's why schema punches above its weight relative to how little effort it takes to add.

Schema doesn't force a citation — anyone promising that is overselling. What it does is make your content the path of least resistance. Pair it with crawlability (the technical side I cover in llms.txt and AI crawlers) and you've cleared both gates: the engine can fetch you, and it can understand you.

2. The schema types that matter

There are hundreds of schema.org types. For answer engines, six carry almost all the weight. Here they are in priority order, with markup you can adapt.

Article

Use on every editorial page — guides, blog posts, analyses. It establishes the page as a piece of authored content with a publisher and dates, which feeds both identity and freshness signals.

{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "Schema & JSON-LD for AI Search: A Practical Setup",
  "author": { "@type": "Person", "name": "Matt Goren" },
  "publisher": {
    "@type": "Organization",
    "name": "Matt Goren",
    "logo": { "@type": "ImageObject", "url": "https://mattgoren.com/logo.png" }
  },
  "datePublished": "2026-06-25",
  "dateModified": "2026-06-25",
  "mainEntityOfPage": "https://mattgoren.com/ai/schema-jsonld-for-ai-search"
}

Keep headline matched to your visible H1, and keep dateModified honest — engines use it to judge freshness, and stale-but-claimed-fresh is a trust hit.

FAQPage

This is the highest-leverage type for AEO. It maps your content directly onto the question-answer shape answer engines produce, so a well-built FAQ block is often the easiest thing on your page to lift verbatim.

{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "Does schema markup directly make AI cite me?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Not directly. Schema removes ambiguity so the engine can identify what your content is and which facts are answers, which makes citation more likely."
      }
    }
  ]
}

The hard rule: every Q&A in the markup must appear visibly on the page. FAQPage schema for invisible questions is exactly the kind of mismatch engines penalize.

HowTo

For any step-by-step or procedural content. It packages your process into discrete, ordered steps — the shape an engine wants when a user asks "how do I do X?"

{
  "@context": "https://schema.org",
  "@type": "HowTo",
  "name": "How to add JSON-LD to a page",
  "step": [
    { "@type": "HowToStep", "position": 1, "name": "Pick the type", "text": "Choose the schema type that matches the page content." },
    { "@type": "HowToStep", "position": 2, "name": "Generate the JSON", "text": "Build the JSON-LD block with real, visible values." },
    { "@type": "HowToStep", "position": 3, "name": "Validate", "text": "Run it through a schema validator before shipping." }
  ]
}

Breadcrumb

BreadcrumbList tells the engine where a page sits in your site's hierarchy. It's small but it helps with context and entity grouping — the engine understands this page is part of an "AI" hub, not a stray URL.

{
  "@context": "https://schema.org",
  "@type": "BreadcrumbList",
  "itemListElement": [
    { "@type": "ListItem", "position": 1, "name": "AI", "item": "https://mattgoren.com/ai" },
    { "@type": "ListItem", "position": 2, "name": "Schema & JSON-LD for AI Search", "item": "https://mattgoren.com/ai/schema-jsonld-for-ai-search" }
  ]
}

Organization

Put this once, sitewide (your homepage is the natural home). It's your identity anchor — name, logo, URL, and sameAs links to your profiles. This is what lets an engine connect a page to a known entity it can attribute claims to.

{
  "@context": "https://schema.org",
  "@type": "Organization",
  "name": "Matt Goren",
  "url": "https://mattgoren.com",
  "logo": "https://mattgoren.com/logo.png",
  "sameAs": [
    "https://www.linkedin.com/in/mattgoren",
    "https://www.youtube.com/@mattgoren"
  ]
}

Product

For commerce or tool pages. It exposes the facts an engine needs to answer buying questions — name, description, price, availability, and rating. The catch on rating is below in the mistakes section, because it's the most-abused field in all of schema.

{
  "@context": "https://schema.org",
  "@type": "Product",
  "name": "Example Product",
  "description": "What it is and who it's for.",
  "brand": { "@type": "Brand", "name": "Example" },
  "offers": {
    "@type": "Offer",
    "price": "49.00",
    "priceCurrency": "USD",
    "availability": "https://schema.org/InStock"
  }
}

3. Where to put it

Use a <script type="application/ld+json"> block. It can live in the <head> or anywhere in the <body> — placement doesn't change how engines read it, so put it wherever your stack generates pages most cleanly.

<script type="application/ld+json">
{ "@context": "https://schema.org", "@type": "Article", "headline": "..." }
</script>

Two practical patterns. Combine types per page by using an array or a @graph so one block describes the Article, its FAQPage, and the Breadcrumb together. And generate it from the same data that renders the page, not by hand — hand-maintained schema drifts out of sync with content, and drift is what breaks it.

4. Common mistakes that break it

I've audited a lot of schema. The failures cluster into a short list.

  • Marking up invisible content. FAQ or HowTo schema for text that isn't on the rendered page. Engines cross-check, and this reads as manipulation. Only mark up what a user can see.
  • Inflated or fake ratings. Stuffing aggregateRating with numbers you don't have is the classic abuse. If you don't have real, verifiable reviews backing it, leave it out — a caught fabrication poisons trust in all your other markup.
  • Stale dateModified. Auto-bumping the modified date on pages you didn't actually touch. Engines lean on this for freshness; if they catch the lie, the signal inverts.
  • Mismatched headline. Schema headline that doesn't match the visible H1 creates exactly the ambiguity schema is supposed to remove.
  • Invalid JSON. A single trailing comma or unescaped quote and the whole block is silently ignored. Always run it through a validator before shipping.
  • Orphan entities. Author and publisher with no consistent identity across the site. Your Organization and Person markup should be stable everywhere they appear so the engine resolves them to one entity.

5. The order of operations

If you're starting from zero, do this in sequence. Add Organization sitewide first — that's your identity anchor. Then Article + Breadcrumb on every content page as a template. Then FAQPage on anything with a Q&A section, which is where you'll see the most AEO lift. Layer in HowTo on procedural pages and Product on commerce pages as needed. Validate every template once, then trust the generator.

Schema is the connective tissue between good content and confident extraction. It won't rescue thin material — but for content that already answers the question well, it's the difference between an engine guessing about you and an engine quoting you. For how this fits the larger picture, the answer engine optimization playbook puts schema in its place alongside structure, authority, and crawlability.

FAQ

Does schema markup directly make AI cite me? Not directly. Schema doesn't force a citation. It removes ambiguity so the engine can confidently identify what your content is, who published it, and which facts are answers. Clear extraction makes citation more likely, but the content still has to be good.

What schema types matter most for AI search? Article, FAQPage, and HowTo do the heavy lifting because they map cleanly to question-and-answer extraction. Organization, Breadcrumb, and Product add identity, context, and commerce facts. Start with the first three.

Should I use JSON-LD or microdata? JSON-LD. It's Google's recommended format, it lives in one clean block in the head or body, and it's far easier to generate, validate, and maintain than inline microdata scattered through your HTML.

Will fake or mismatched schema hurt me? Yes. Marking up content that isn't visible on the page, or inflating ratings and facts, is a real risk. Engines cross-check schema against rendered content, and mismatches erode trust. Only mark up what a user can actually see.

Where do I put the JSON-LD on the page? In a script tag with type "application/ld+json", either in the head or anywhere in the body. Placement doesn't affect how engines read it, so put it wherever your stack makes it easiest to generate per-page.

#schema#json-ld
Want to apply this right now?

Use the free, no-API prompt generators to put it into practice.

Open Prompt Studio →
Keep reading