Knowledge base Everything we know about
Everything we know about
getting cited by AI engines.
85 reference articles. Methodology, criteria, technical setup, engine-specific tactics. Updated as AI engines evolve.
Foundations
- AEO Site Rank: How We Calculate Your 0-100 Rating Your AEO Site Rank is not a guess. It is a deterministic score built from 48 criteria grouped into five fixed-weight pillars. Each criterion is scored 0-10, converted into an effective weight after confidence and overlap controls, and rolled into a base site score. AEORank then applies the topic-coherence gate and blends that result with a page-fleet score so thin templates cannot hide behind strong sitewide infrastructure.
- AEO Site Rank: How Benchmarks and Peer Comparison Work Your AEO Site Rank tells you how ready your site is. Your AEO rank tells you where you stand. We rank every audited domain globally and within its sector, calculate category averages from best-per-domain scores, and flag sites as Above Average, Average, or Below Average using a +/-5 point threshold. This page explains exactly how rankings are calculated, what sector averages actually measure, and why your rank might matter more than your score.
- AEO Page Rank: How Individual Pages Are Scored for Citation Readiness Your AEO Site Rank tells you whether your website is built for AI visibility. Your AEO Page Rank tells you whether a specific page is ready to be cited. They answer different questions, use different criteria, and produce different numbers. AEO Site Rank evaluates 48 criteria across your entire domain - infrastructure, discovery, content patterns. AEO Page Rank evaluates 17 checks on a single page - content originality, content uniqueness, extractability, entity/data richness, and structural signals. This page explains exactly how AEO Page Rank works, how it differs from AEO Site Rank, and how to use both scores together in the Studio dashboard.
- Product & Offer Schema Someone asks ChatGPT "What are the best live chat tools under $50/month?" Your product has no Product schema. You're not in the comparison. Period.
- Speakable Schema Markup Voice assistants need to pick which paragraph to read aloud from your page. Without Speakable markup, they guess -and they frequently guess wrong, reading your cookie notice.
- Hreflang & Multilanguage Support Your French page exists. But does Perplexity know it's French? Broken hreflang means AI engines cite the wrong language version -or miss your localization entirely.
- ai.txt & TDM Policy robots.txt controls crawling. llms.txt describes your content. But neither answers the question AI companies actually care about: "Are we allowed to use this?"
- Brand Mention Monitoring Your competitors get cited on 40 high-authority pages. You appear on 6. The gap isn't content quality - it's off-page presence. Here's how we find and close it.
Criteria
- llms.txt -The File That Separates 63 from 34 Tidio has a 251-line llms.txt. Crisp has zero. The score gap: +29 points. This single file tells AI assistants exactly what your site does -and without it, they're guessing.
- Schema.org JSON-LD: The Scoreboard AI Actually Reads Tidio runs 4 JSON-LD schema types. Crisp runs zero. That's not a coincidence -it's the difference between a 63 and a 34. Structured data is the machine-readable layer AI trusts most.
- Q&A Content Format: Write Like AI Reads AI assistants are question-answering machines. When your content is already shaped as questions and answers, you're handing AI a pre-formatted citation. Sites that do this right get extracted -sites that don't get skipped.
- Clean HTML: If Crawlers Can't See It, It Doesn't Exist Most AI crawlers don't run JavaScript. If your content loads after page render -behind accordions, SPAs, or API calls -you're invisible. We've seen entire FAQ sections vanish from AI's perspective because of one accordion widget.
- Entity Authority: Why AI Cites Some Sites and Ignores Others AI systems don't cite websites -they cite entities. A verifiable business with an address, named authors, and social proof. Our self-audit (88/100) still loses points here because we lack a physical address. That's how strict this criterion is.
- robots.txt for AI: Rolling Out the Red Carpet (or Slamming the Door) Most sites run default platform robots.txt with zero AI-specific rules. That's not a strategy -it's an accident. Explicit Allow rules for GPTBot, ClaudeBot, and PerplexityBot signal that your content is open for citation.
- FAQ Sections: 87 Questions Turned Our Score Into 88 Our site runs 87 FAQ items across 9 categories with FAQPage schema on every one. That's not excessive -it's how we hit 88/100. Each Q&A pair is a citation opportunity AI can extract in seconds.
- Original Data: The Content AI Can't Find Anywhere Else AI has a trust hierarchy for sources. At the top: proprietary data and first-hand expert analysis. At the bottom: rewritten Wikipedia articles. We've watched AI preferentially cite sites with original benchmarks -even over bigger competitors.
- Internal Linking: The Web AI Uses to Map Your Expertise AI crawlers follow internal links to discover and contextualize content. A page with zero inbound links is a page AI will never find. Hub pages linking to 20+ related articles signal topical authority that both engines reward.
- Semantic HTML5: The Difference Between a Page and a Page AI Can Parse A page built with <div> everywhere looks the same to AI as a page with no structure at all. Semantic elements -<main>, <article>, <section>, <time> -are the markup that tells AI where your content starts, what it means, and how it's organized.
- Schema Coverage Ratio Your homepage has perfect JSON-LD. Your other 200 pages? Zero. Here's how we measure the gap -and why AI engines judge your whole domain by it.
- Content Freshness Signals No date on your page? AI engines treat it like a rumor -undated and deprioritized. Here's how we audit whether your timestamps are actually machine-readable.
- RSS Feed Presence & Quality Sitemaps tell crawlers what exists. RSS feeds tell them what changed. If you don't have one, your new content waits days -or weeks -to be discovered.
- Sitemap Completeness Your sitemap says 500 pages exist. Our crawl finds 700. Those 200 missing URLs? AI crawlers will never know they exist.
- Canonical URL Strategy Same content, three URLs, zero canonical tags. Congratulations -you just split your authority three ways and gave AI crawlers a headache.
- Content Licensing Signals You want AI engines to cite your content. But have you actually told them they're allowed to? Most sites haven't -and AI systems default to conservative behavior.
- Fact Density Measurement AI engines are citation machines -they need specific facts to quote. A page full of general advice with zero data points gives them nothing to work with.
- Definition Pattern Detection "What is AEO?" -14% of all AI queries start with "What is." If your content doesn't answer with a clean definition sentence, someone else's will.
- Table & List Extractability Your comparison table looks great in the browser. But it's built with divs and CSS Grid, so ChatGPT sees a blob of text. Here's what that costs you.
- Content Publishing Velocity You published a great blog post in January. It's now February and nothing else has appeared. AI engines notice -and they're crawling less often because of it.
- Direct Answer Paragraphs: The First Sentence AI Steals AI engines don't read your article top to bottom. They scan headings, grab the first 1-2 sentences underneath, and move on. If those sentences are throat-clearing preamble instead of a direct answer - you just lost a citation to someone who leads with the point.
- Author & Expert Schema: The Byline AI Actually Checks AI doesn't take your word for it. When you publish content as "Admin" or "Staff Writer," you're telling AI engines that nobody at your company was willing to put their name on it. Person schema, author bios, and credential markup are how you prove a real expert wrote your content.
- Topic Coherence: The 14% Criterion That Gates Your Entire Score Your blog covers AI, recipes, fitness, and office furniture. AI engines see a site that doesn't know what it is - and they're right. Topic Coherence is the single heaviest criterion in the AEO scoring framework, and it can cap your entire score when it's low.
- Content Depth: Why Thin Pages Are Invisible to AI A 300-word page cannot compete with a 3,000-word deep dive for an AI citation. Content Depth measures the substance on your pages - word counts, heading structure, and the ratio of deep pages to thin ones across your site.
- Query-Answer Alignment: Does Your Content Match What People Actually Ask? You wrote a 3,000-word guide on "customer support automation." But people ask AI "how do I reduce support ticket volume?" If your content doesn't match the actual queries, AI can't connect the dots.
- Content Cannibalization: When Your Own Pages Compete Against Each Other You have three pages about "live chat pricing." AI doesn't know which one to cite - so it cites none. Content cannibalization is when your own content competes with itself, splitting the authority signal and confusing AI engines.
- Visible Date Signal: The Freshness Cue AI Checks Before Citing You AI engines are biased toward recency. An article with no visible date is an article AI can't trust to be current. Visible Date Signal measures whether your pages show clear publication and update dates that both humans and machines can read.
- Citation-Ready Writing: The Sentences AI Engines Steal First LiveChat packs citation-ready sentences into 38% of its paragraphs. HelpSquad hits 12%. The AI visibility gap between them is not a coincidence - it is a direct consequence of how many sentences AI can lift and quote without rewriting.
- Answer-First Placement: The 300-Word Window That Decides Everything AI engines give your page about 300 words to prove it has the answer. Sites that open with "In today's rapidly evolving landscape..." waste that window. Sites that open with the answer win the citation. We track this across every audit - and the gap is brutal.
- Evidence Packaging: Why AI Trusts Some Claims and Ignores Others A statistic without a source is a liability. A claim without attribution is invisible. We track evidence packaging across every audit and the pattern is clear - sites that cite their sources get cited by AI. Sites that state facts without attribution get skipped.
- Internationalization Signals: Why AI Engines Cite the Wrong Language Version ISPsystem has 452 pages in 3 languages. Without hreflang, AI engines guess which version to cite. A Spanish user asks about VMmanager and gets an English answer - or worse, Russian. One tag set fixes it.
- How To Structure AEO Content That AI Engines Actually Cite Most web content never gets cited by AI engines - not because the information is wrong, but because the structure is invisible to them. ChatGPT, Perplexity, Gemini, and Google AI Overviews select sources based on how efficiently they can extract answers from a page's HTML. This guide reveals the 6-part article architecture behind the highest-scoring pages in the AEO Content AI Studio database of 11,000+ audited domains - and shows you exactly how to replicate it in your own content.
- Entity Disambiguation: Stop Making AI Guess What You Mean You say "Mercury" on your homepage. The planet? The element? The banking startup? AI doesn't know - and when it can't tell, it doesn't cite. Entity disambiguation is how you draw clear lines around what your business is and isn't.
- Extraction Friction: The Invisible Wall Between Your Content and AI Your content is brilliant. AI can't read it. Sentences averaging 35 words, jargon-packed leads, and hidden content behind toggles create friction that makes AI skip you for a simpler source. Extraction Friction measures how hard AI has to work to pull answers from your pages.
- Image Context for AI: Your Pictures Are Worth Zero Words Without Markup AI can't see your images. It reads the markup around them. A product photo with alt="image1.jpg" tells AI nothing. A figure with a descriptive figcaption and 8-word alt text tells AI exactly what it's looking at - and that context feeds directly into citation decisions.
- Response Efficiency: Why Bloated Pages Make AI Skip You When your HTML response exceeds 250KB, AI crawlers spend more time downloading than reading. Response Efficiency measures the payload size of your pages - the raw bytes AI has to fetch before it can even begin parsing your content. Lighter pages get crawled more often and processed faster.
- Critical Path Efficiency: Stop Blocking AI with Your Own Scripts Every blocking script and stylesheet in your HTML head forces AI crawlers to wait before they can access your content. Critical Path Efficiency measures how many render-blocking resources sit between the crawler and your page content. Zero blocking scripts and minimal blocking stylesheets earns a perfect score.
- Document Weight: How Heavy Is Your Page for AI to Lift? A 100KB HTML page with 600 DOM nodes loads fast for AI crawlers. A 500KB page with 2,000 DOM nodes and 100KB of inline CSS/JS is a burden. Document Weight measures the overall heft of your HTML document - total size, DOM complexity, inline code, and embedded payloads - to determine how efficiently AI can process it.
- Helpful Purpose Alignment: Does Your Page Actually Help or Just Sell? AI engines are trained to surface content that helps users accomplish something. Pages that open with "Welcome to our innovative platform" get skipped. Pages that open with a concrete answer get cited. Helpful Purpose Alignment measures whether your content delivers practical value or just occupies space.
- First-Hand Experience Signals: Prove You Actually Did the Thing AI engines increasingly prioritize content from people who have direct experience with their topic. Generic "top 10 best" listicles written from secondhand research score poorly. Pages with specific actions taken, limitations discovered, and numeric results earned score well. First-Hand Experience Signals measure whether your content shows evidence of real-world involvement.
- Creator Transparency: Show Who Wrote This and Why They Are Qualified Anonymous content is untrusted content. AI engines check for author bylines, bio links, reviewer credits, and Person schema to decide whether a page comes from a credible creator. Creator Transparency measures how clearly your content identifies its authors and their qualifications.
- Methodology Transparency: Show Your Work So AI Trusts Your Claims AI engines distrust content that makes claims without explaining how those claims were derived. "Our product is rated #1" means nothing without methodology. "We tested 12 platforms over 6 weeks using 4 criteria" means everything. Methodology Transparency measures whether your content shows the process behind its conclusions.
- Duplicate Content Blocks: When You Copy Yourself, AI Stops Trusting You Repeating the same paragraph in multiple sections of the same page is a quality signal that AI engines detect and penalize. Duplicate Content Blocks measures within-page content repetition. When more than 5% of your content is duplicated across sections, AI engines reduce citation confidence because the page appears auto-generated or padded.
- Cross-Page Duplication: When Different Pages Say the Same Thing Different URLs with substantially similar content confuse AI engines about which page to cite. Cross-Page Duplication measures content repetition across your site by comparing paragraphs from different pages. When AI encounters the same claims on multiple pages, it either picks one arbitrarily or skips your site entirely.
- Content Cluster Strategy: How Pillar + Child Articles Build Topic Authority AI Trusts A single great article is a spark. A content cluster is a bonfire. AI engines evaluate topic authority across entire sites - not page by page. Understoodcare.com jumped from 37 to 82 partly by building interlinked content clusters with proprietary data in every article.
- Reddit as an AI Discovery Channel: Why Threads Shape AI Answers About Your Brand Perplexity cites Reddit threads directly. ChatGPT pulls Reddit discussions via web search. When someone asks AI about your industry and a Reddit thread mentions your competitor but not you - that absence is the answer. Reddit has become the live opinion layer AI engines trust most.
Intelligence
- Content Depth Score: What AI Engines Actually Want to Read AI-evaluated measurement of whether your content goes deep enough for engines to treat you as a primary source -not just a page that technically exists.
- Citation-Ready Content Patterns: Writing Sentences AI Can Actually Use AI analysis of whether your content contains the specific sentence structures and fact patterns that make it extractable as an AI citation -the difference between being found and being quoted.
- Topic Authority Clustering: Why One Good Page Isn't Enough AI-powered analysis of whether your content forms coherent topic clusters that establish you as the definitive source -because AI engines don't evaluate authority one page at a time.
- Content Uniqueness Analysis: Do You Have Anything AI Doesn't Already Know? AI evaluation of how much of your content provides genuinely novel information versus restating what's already in every AI model's training data.
- Author & Person Schema Depth: From Name String to Verified Expert AI evaluation of whether your author markup gives engines enough detail to verify credentials, establish expertise chains, and connect authors to their body of work -not just confirm a name exists.
- Wikidata & Knowledge Graph Presence: The External Trust Loop Whether your business entity exists in the public knowledge databases AI engines consult to cross-reference your identity -the verification step most businesses skip entirely.
- Social Profile Verification: When sameAs Links Backfire AI-powered verification that your claimed social profiles exist, are active, and contain consistent information -because dead links in your schema actively damage trust.
- AI Hallucination Audit: What AI Engines Are Making Up About You Testing what ChatGPT, Claude, and Perplexity say about your business when asked directly -and cataloging every fabricated fact, outdated claim, and competitor mix-up.
- Live Citation Test: Does AI Actually Mention You? Running real queries against ChatGPT, Claude, and Perplexity to test whether they cite your content -the only metric that directly answers "Am I visible to AI?"
- Cross-Engine Consistency Score: The 10-Point Gap That Changes Everything Measuring how consistently your domain performs across ChatGPT, Claude, and other AI engines -and why a gap above 10 points signals your biggest optimization opportunity.
- Original Data Pipeline: How AI Collects Proprietary Evidence for Every Article The automated system that scrapes 15 live source types, synthesizes structured artifacts, and injects source-grounded claims into article blocks - so every piece you publish contains data AI engines cannot find on a competitor's site.
Engine deep dives
- ChatGPT Conversational Query Matching -Why Keywords Don't Cut It ChatGPT matches user questions through conversational NLP, not keyword density. We've seen sites with perfect SEO get ignored because they sound like brochures -while competitors writing like humans get cited. Here's what ChatGPT's glasses actually reveal about your content.
- ChatGPT Direct Answer Paragraphs -The Unit of Citation ChatGPT doesn't cite pages. It extracts paragraphs. We've found that pages with 5+ self-contained answer paragraphs get cited at 3x the rate of pages without them. Here's the exact formula ChatGPT's extraction model looks for.
- ChatGPT & Bing Indexation -The Gate You Didn't Know Existed ChatGPT retrieves web content through Bing. Full stop. If Bing hasn't indexed your pages, ChatGPT can't find them -no matter how good your content is. We've seen sites with perfect AEO Site Ranks get zero ChatGPT citations because they never submitted a sitemap to Bing.
- ChatGPT Content Retrievability -Indexed Doesn't Mean Found Being in Bing's index is step one. Actually getting retrieved by ChatGPT is step two -and the gap between them is massive. We've found sites with 90%+ Bing indexation but under 30% retrieval rates. Here's what's going wrong.
- ChatGPT Q&A Distribution -How Many Questions Are You Missing? We mapped the question spaces for six live chat competitors. Tidio (63) covers an estimated 80% of questions ChatGPT users ask. HelpSquad (47) has massive gaps in comparisons, pricing, and technical setup. Every unanswered question is a citation you're handing to competitors.
- ChatGPT Recency Bias -Stale Content Disappears ChatGPT inherits Bing's freshness bias. Content updated within 90 days gets a measurable retrieval boost. We've watched pages with perfect conversational tone drop out of ChatGPT's results because they went stale. Tidio (63) publishes and updates constantly. Crisp (34) had content over a year old gathering dust.
- ChatGPT Comparison Table Extraction -Your Secret Weapon for "vs" Queries ChatGPT actively hunts for HTML tables when users ask comparison questions -and restructures them into its answers. We've found that proper `<thead>`/`<tbody>` markup is the difference between extraction and invisibility. Div-based "tables" are invisible to ChatGPT's extraction model.
- How Claude Scores Your llms.txt (It's Not Pass/Fail) Claude doesn't just check if llms.txt exists - it grades it on a 4-level rubric. Tidio's 251-line llms.txt helped earn a +14 Claude bonus. Crisp's zero governance signals? That's a 34.
- The ClaudeBot Directive: Two Lines That Change Your Score Claude applies a trust penalty when ClaudeBot isn't in your robots.txt - even if you allow every other AI crawler. Sites with explicit ClaudeBot directives scored 8 points higher on Claude across our 20-site cohort.
- Claude's Compound Trust Multiplier for JSON-LD One schema type is baseline. Two gives a nudge. Three crosses Claude's trust threshold. Four-plus? Compound scoring kicks in. This is why Tidio (+14) and LiveChat (+12) outperform HelpSquad (-5) on Claude by margins ChatGPT can't explain.
- Entity Disambiguation: Why Claude Skips You for Competitors Six live chat companies. Similar names. Overlapping descriptions. Claude has to figure out who's who - and if it can't, it won't cite any of them. HelpSquad's weak entity signals cost it 5 points. LiveChat's 15+ Organization properties earned +12.
- Content Licensing: The Permission Signal Claude Checks Before Citing You Claude's more conservative about citation than any other engine. Without explicit licensing signals - ai.txt, CreativeCommons metadata, TDM headers - Claude dials back how freely it quotes you. This is the most Claude-specific lever in AEO.
- Semantic HTML: The Signal Claude Reads That ChatGPT Ignores ChatGPT strips HTML and reads the text. Claude reads the structure. Proper heading hierarchy, ARIA landmarks, semantic sectioning, content-to-boilerplate ratio - these directly influence Claude's quality assessment. LiveChat's 60% content ratio earned +12. HelpSquad's 25% ratio contributed to -5.
- Fact Blocks: The Content Pattern Claude Cites First Claude doesn't cite prose - it cites facts. Named statistics with sources, definition-then-evidence sequences, comparison data blocks. LiveChat averaged 4-6 extractable fact blocks per page and earned +12. HelpSquad averaged 0.5 and got -5.
Startup
- AEO for Startups: Why AI Visibility Matters Before Product-Market Fit Most YC startups launch with zero AI visibility. No llms.txt. No schema. No FAQ. The average score across 2,500+ audited startups is 38. Meanwhile the companies AI actually cites? They did four things in their first week.
- The 5 Quick Wins That Move a Startup from 30 to 60 We have audited 2,500+ startups. The ones stuck at 30 are missing the same five things. The ones at 60 did them all in a single sprint. No budget. No agency. Just five changes any developer can ship in a day.
- What 2,500 YC Startup Audits Reveal About AI Readiness We audited every recent Y Combinator batch - W22 through W26. The data tells a story nobody in the ecosystem is talking about: the vast majority of funded startups are invisible to AI. Here is what the numbers say.
- AEO vs SEO for Startups: What Early-Stage Founders Get Wrong SEO takes 6-12 months. AEO takes an afternoon. Most founders skip both - but if you had to pick one to ship on launch day, AEO gives you faster returns with less ongoing effort. Here is why.
- Your Series A Pitch Deck Needs an AEO Site Rank VCs are using ChatGPT to research markets and evaluate startups. When an investor asks "What are the best [category] startups?" - you are either in the answer or you are not. AI visibility is becoming a due diligence signal.
- How Top YC Startups Get Discovered by AI Engines Stripe scores 82. Notion scores 78. Airbnb scores 71. These companies did not get there by accident. We reverse-engineered what the highest-scoring YC alumni do for AI visibility - and most of it is surprisingly simple.