← BACK TO BLOG
GEO

AEO Audit Checklist: 16 Things Blocking Your AI Visibility

Most websites are invisible to AI engines. This 16-point audit checklist covers everything from robots.txt to schema markup — so you can fix what's blocking your brand from being cited.

Your Website Is Probably Invisible to AI

Here is a number that should concern every marketing team: Gartner predicted that traditional search engine volume would drop 25% by 2026 as users shifted to AI-powered alternatives (Gartner, 2024). That shift is no longer a forecast. It is happening now.

ChatGPT surpassed 200 million weekly active users in late 2024 (OpenAI, 2024). Perplexity processes tens of millions of queries monthly. Google AI Overviews now dominate the top of search results pages, reducing clicks to organic listings. Meanwhile, research from SparkToro found that roughly 60% of Google searches already ended without a click to any external site (SparkToro, 2024).

The brands that thrive in this environment are the ones AI engines can find, understand, and trust enough to cite. The brands that struggle are the ones with invisible technical barriers, missing structured data, poorly formatted content, and weak off-site authority signals.

This is where answer engine optimization (AEO) comes in. AEO is the practice of ensuring your brand is structured, accessible, and authoritative enough that AI-powered engines cite you in their generated responses. If you're new to this space, our complete guide to generative engine optimization covers the foundations.

This checklist gives you 16 specific things to audit. Each one is something we see blocking AI visibility for real brands. Fix them, and you remove the barriers between your content and the AI engines your buyers are already using.

How to Use This Checklist

Work through each of the 16 items below. For every item, mark it as Pass or Fail. At the end, tally your score out of 16.

Score Rating What It Means
14-16 Excellent Your site is well-positioned for AI visibility. Focus on content velocity and authority building.
10-13 Good Solid foundation with gaps. Prioritize the failed items — each one is costing you citations.
6-9 Needs Work Significant barriers exist. Start with Technical Foundation and Schema fixes before investing in content.
0-5 Critical Your brand is likely invisible to AI engines. Treat this as an urgent project.

Category 1: Technical Foundation

AI crawlers — including GPTBot, Google-Extended, Anthropic's ClaudeBot, PerplexityBot, and others — need to access your content before they can cite it. These four items ensure the technical infrastructure is not blocking them.

1. Robots.txt AI Crawler Access

What to check: Open your robots.txt file (yourdomain.com/robots.txt) and search for directives targeting AI-specific crawlers. Look for User-agent: GPTBot, User-agent: Google-Extended, User-agent: ClaudeBot, User-agent: PerplexityBot, User-agent: Bytespider, and User-agent: CCBot. Check whether any of these are followed by Disallow: /.

Why it matters for AI: If your robots.txt blocks AI crawlers, those engines cannot index your content. They will never cite you because they have never read you. A 2024 analysis by Originality.ai found that over 35% of the top 1,000 websites had blocked GPTBot in their robots.txt (Originality.ai, 2024). Many of those sites did this reflexively during the early AI panic without understanding the downstream cost to visibility.

How to fix it: Review every User-agent directive in your robots.txt. Remove blanket Disallow: / rules for AI crawlers you want to be discovered by. At minimum, allow GPTBot, ClaudeBot, PerplexityBot, and Google-Extended. If you have sensitive sections of your site (admin panels, staging areas), block those specifically rather than blocking the entire domain.

# Example: Allow AI crawlers with specific restrictions
User-agent: GPTBot
Disallow: /admin/
Disallow: /staging/

User-agent: ClaudeBot
Disallow: /admin/

User-agent: PerplexityBot
Allow: /

Score: Pass / Fail


2. XML Sitemap Completeness and Accessibility

What to check: Verify that your XML sitemap exists at a standard location (/sitemap.xml), that it is referenced in your robots.txt file, and that it includes every page you want AI engines to discover. Check that the sitemap returns a 200 status code and contains valid XML. Ensure lastmod dates are accurate and updated when content changes.

Why it matters for AI: AI crawlers use sitemaps to discover content efficiently. An incomplete or missing sitemap means AI engines may never find your most important pages. Unlike traditional search crawlers that follow links aggressively, some AI crawlers rely more heavily on sitemaps for content discovery. Inaccurate lastmod dates can also cause crawlers to skip content they assume has not changed.

How to fix it: Generate a comprehensive sitemap that includes all public-facing pages, blog posts, product pages, and landing pages. Add <lastmod> tags with real dates. Reference the sitemap in your robots.txt with Sitemap: https://yourdomain.com/sitemap.xml. If your site has more than 50,000 URLs, use a sitemap index file. Validate the sitemap using a tool like XML-Sitemaps.com or Google Search Console.

Score: Pass / Fail


3. Page Speed and Core Web Vitals

What to check: Run your key pages through Google PageSpeed Insights or Lighthouse. Check Largest Contentful Paint (LCP), Interaction to Next Paint (INP), and Cumulative Layout Shift (CLS). Target LCP under 2.5 seconds, INP under 200 milliseconds, and CLS under 0.1.

Why it matters for AI: Page speed affects AI visibility in two ways. First, AI crawlers have crawl budgets and timeout thresholds — slow pages may be skipped or only partially rendered. Second, Google's AI Overviews prioritize sources that already rank well in traditional search, and Core Web Vitals are a confirmed ranking factor. A study from Portent found that pages loading in under 1 second had conversion rates 3x higher than pages loading in 5 seconds (Portent, 2022), reflecting broader patterns in how speed affects user and crawler behavior.

How to fix it: Compress images and serve them in WebP or AVIF format. Implement lazy loading for below-the-fold content. Minimize render-blocking JavaScript and CSS. Use a CDN. Enable server-side caching. For JavaScript-heavy sites, ensure server-side rendering (SSR) or static site generation (SSG) so that crawlers see fully rendered HTML.

Score: Pass / Fail


4. HTTPS and Security Configuration

What to check: Confirm your entire site is served over HTTPS with a valid SSL/TLS certificate. Check for mixed content warnings. Ensure HTTP requests redirect to HTTPS with 301 redirects. Verify that your certificate is not expired and covers all subdomains.

Why it matters for AI: HTTPS is a baseline trust signal. AI engines weigh source credibility when deciding what to cite, and an insecure site is a negative trust indicator. Google confirmed HTTPS as a ranking signal in 2014, and that signal carries forward into AI Overviews. Beyond rankings, some AI crawlers may deprioritize or skip insecure pages entirely as a data quality measure.

How to fix it: Obtain an SSL certificate (free options are available through Let's Encrypt). Configure your web server to redirect all HTTP traffic to HTTPS. Fix mixed content issues by updating internal references to use HTTPS. Set up HSTS headers to enforce secure connections.

Score: Pass / Fail


Category 2: Schema and Structured Data

Structured data tells AI engines what your content is about in a machine-readable format. Without it, AI models have to infer meaning from unstructured text — and they frequently get it wrong or skip you entirely.

5. JSON-LD Implementation for Core Page Types

What to check: Inspect the <head> section of your key pages for JSON-LD structured data. At minimum, your homepage should include Organization schema, your blog posts should include Article or BlogPosting schema, and your product pages should include Product schema. Use Google's Rich Results Test or Schema.org's validator to confirm valid markup.

Why it matters for AI: JSON-LD structured data provides AI engines with explicit, unambiguous signals about what your content represents. Research from Milestone found that pages with schema markup earned 40% more clicks in search results (Milestone, 2023). For AI engines specifically, structured data helps models build accurate entity representations of your brand, connect your content to relevant queries, and cite you with correct information.

How to fix it: Implement JSON-LD in the <head> of every page. Use Organization on your homepage and about page. Use Article or BlogPosting on every blog post with author, datePublished, dateModified, and publisher properties. Use Product on product pages with name, description, offers, and review properties. Do not use Microdata or RDFa — JSON-LD is the preferred format and the one Google recommends.

Score: Pass / Fail


6. FAQPage Schema on High-Value Pages

What to check: Identify your top 10-20 pages by traffic or strategic importance. Check whether they include FAQPage schema with properly structured Question and acceptedAnswer pairs. Verify that the FAQ content in the schema matches visible content on the page (Google penalizes hidden FAQ schema).

Why it matters for AI: FAQPage schema is one of the highest-leverage structured data types for AI visibility. AI engines are fundamentally question-answering systems. When your content explicitly structures information as questions and answers, you are formatting it in the exact pattern AI models are designed to consume. Pages with FAQPage schema are more likely to surface in Google's AI Overviews and in direct AI engine responses.

How to fix it: Add 3-5 relevant FAQ pairs to your most important pages. Structure them using FAQPage JSON-LD schema. Ensure the questions reflect actual queries your audience searches for — use tools like AlsoAsked or AnswerThePublic to identify real questions. Place the FAQ content visibly on the page, not hidden behind accordions that never render for crawlers.

{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "What is answer engine optimization?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Answer engine optimization (AEO) is the practice of optimizing content so that AI-powered search engines cite your brand in their generated responses."
      }
    }
  ]
}

Score: Pass / Fail


7. Organization Schema with Complete Brand Information

What to check: Verify that your homepage includes Organization schema with name, url, logo, description, foundingDate, sameAs (linking to your social profiles, LinkedIn, Crunchbase), and contactPoint. If your brand has a specific industry, include industry or relevant properties.

Why it matters for AI: AI models build internal entity representations of brands. When ChatGPT or Perplexity answers "What is [your brand]?" the response is constructed from whatever data the model has about your entity. Organization schema feeds that entity graph directly. Without it, AI engines may have an incomplete, inaccurate, or nonexistent representation of your brand — which means they cannot recommend you even when you are relevant to a query.

How to fix it: Create a comprehensive Organization JSON-LD block for your homepage. Include every official social profile URL in the sameAs array. Add a clear, factual description that states what your company does, who it serves, and what category it belongs to. Keep this schema updated whenever company details change.

Score: Pass / Fail


8. Product or Service Schema with Reviews

What to check: If you sell products or services, verify that each product/service page includes Product or Service schema with name, description, offers (including price and priceCurrency), and aggregateRating or review if available. Check that review data is genuine and matches visible reviews on the page.

Why it matters for AI: When users ask AI engines for product recommendations or comparisons, the models rely on structured product data to generate accurate responses. Brands with complete Product schema that includes pricing, ratings, and review data are significantly more likely to be included in comparative AI responses. According to a BrightLocal survey, 98% of consumers read online reviews for local businesses (BrightLocal, 2024), and AI engines reflect that same preference for reviewed, validated products.

How to fix it: Add Product or Service schema to every relevant page. Include real review data if available. If you collect reviews through a third-party platform (G2, Capterra, Trustpilot), ensure that aggregated ratings are reflected in your schema. Never fabricate review data — AI engines cross-reference structured data against third-party sources, and inconsistencies damage trust.

Score: Pass / Fail


Category 3: Content Structure

AI engines do not read content the way humans do. They parse, chunk, and evaluate text for direct answerability. The way you structure your content determines whether an AI engine can extract a citable answer from it.

For a deeper dive into building content that AI engines prefer, see our GEO content playbook.

9. Direct Answer Blocks

What to check: Review your top pages and blog posts. Within the first 100-150 words below each H2 heading, do you provide a clear, concise answer to the question implied by that heading? Look for bold definitions, one-to-two sentence summaries, or "In short" statements that a model could extract as a standalone answer.

Why it matters for AI: AI engines construct responses by extracting and synthesizing concise passages from source material. Research on GEO published by researchers at Princeton, Georgia Tech, and IIT Delhi found that including direct, quotable statements increased content visibility in generative engine responses by up to 40% (Aggarwal et al., 2023). If your content buries the answer in the fourth paragraph, an AI engine will often cite a competitor who leads with it.

How to fix it: For every H2 section, write a bold, self-contained answer in the first 1-2 sentences. Follow it with supporting detail. Think of it as the inverted pyramid from journalism — lead with the answer, then provide context. This does not mean dumbing down your content. It means frontloading the key takeaway.

Score: Pass / Fail


10. Comparison Tables

What to check: For any page that involves product comparisons, feature breakdowns, pricing tiers, or option evaluations, check whether you include a structured HTML table. Verify that tables use proper <table>, <thead>, <tbody>, <th>, and <td> elements (not CSS grid or flexbox faking a table layout).

Why it matters for AI: AI engines heavily favor tabular data for comparison queries. When a user asks "What is the best [product category]?" or "How does X compare to Y?", models look for structured tables they can parse and reference. Pages with comparison tables are more likely to be cited in these high-intent queries. Google's AI Overviews also frequently pull from HTML tables to construct comparison cards.

How to fix it: Add comparison tables to any page where you discuss multiple options, features, or tiers. Use semantic HTML table elements. Include clear column headers. Keep cell content concise — one or two data points per cell, not paragraphs. If your CMS strips table HTML, switch to a markdown table format or use a plugin that preserves table structure.

Score: Pass / Fail


11. FAQ Sections with Natural-Language Questions

What to check: Audit your top 20 pages. Does each one include an FAQ section with 3-5 questions written in natural language (the way a user would actually ask them)? Are the answers concise (40-60 words) and self-contained?

Why it matters for AI: AI queries are conversational. Users ask "How do I fix my robots.txt for AI crawlers?" not "robots.txt AI crawler configuration." Pages with FAQ sections that mirror natural-language queries create more surface area for AI citation. When the model encounters a question-answer pair that closely matches a user's query, it is far more likely to cite that source. This is especially powerful when paired with FAQPage schema (item 6).

How to fix it: Add an FAQ section to every high-value page. Use real questions from your audience — pull them from customer support tickets, sales calls, People Also Ask boxes, and community forums. Write answers that are complete in 2-3 sentences. Avoid vague answers that require reading the full article for context.

Score: Pass / Fail


12. Data Citations and Source References

What to check: Review your blog posts and resource pages. When you make a factual claim or cite a statistic, do you include a hyperlinked source? Check for unsourced statistics, vague attributions ("studies show"), and missing links.

Why it matters for AI: AI engines evaluate source credibility partly by checking whether content cites its own sources. Well-sourced content signals expertise, trustworthiness, and rigor — all of which map to the E-E-A-T framework that influences both traditional search and AI citation. The Princeton GEO study found that adding citations and statistics to content increased its visibility in generative engine responses by up to 30% (Aggarwal et al., 2023). Content that makes claims without evidence is treated as lower-quality signal.

How to fix it: Audit every factual claim in your content. Add inline citations with hyperlinks to reputable sources. Prefer primary sources (original research, official reports, peer-reviewed studies) over secondary sources (blog posts summarizing other studies). Use a consistent citation format throughout your content.

Score: Pass / Fail


Category 4: Off-Site Authority

AI models do not only look at your website. They are trained on — and retrieve from — a wide range of internet sources. Your presence across community platforms, knowledge bases, and industry publications directly affects whether AI engines consider your brand authoritative enough to cite.

13. Reddit Presence and Engagement

What to check: Search Reddit for your brand name and your core product category. Are there threads discussing your product? Are you or your team participating authentically in relevant subreddits? Do recommendations of your brand appear in threads answering questions about your category?

Why it matters for AI: Reddit has become one of the most significant data sources for AI engine responses. Google signed a $60 million deal with Reddit in 2024 for AI training data access (Reuters, 2024). Perplexity and other AI engines frequently pull from Reddit threads when answering product recommendation queries. If your brand is absent from Reddit — or worse, only present in negative threads — AI engines will either ignore you or cite you unfavorably.

How to fix it: Identify the 5-10 subreddits where your target audience asks questions related to your category. Participate genuinely — answer questions, share expertise, and contribute to discussions without overt self-promotion (Reddit communities penalize obvious marketing). When your product is genuinely relevant to a question, mention it with context. Encourage satisfied customers to share their experiences on Reddit. Monitor brand mentions using Reddit search or a social listening tool.

Score: Pass / Fail


14. Quora Answers and Expertise Signals

What to check: Search Quora for questions related to your product category, industry, and key terms. Has anyone from your company answered relevant questions? Are those answers detailed, well-sourced, and upvoted? Does your brand appear in answers to category-level questions (e.g., "What is the best tool for X?")?

Why it matters for AI: Quora content appears frequently in AI-generated responses, particularly for "what is" and "how to" queries. AI models treat well-upvoted Quora answers from credentialed authors as meaningful authority signals. Quora profiles with verified credentials (job title, company affiliation) carry additional weight. If your competitors have active Quora presences and you do not, AI engines have more evidence to cite them over you.

How to fix it: Create Quora profiles for key team members with complete credentials. Identify 20-30 high-traffic questions in your category. Write detailed, genuinely helpful answers that include relevant data and, where natural, mention your brand as one option among several. Avoid answers that read as advertisements — Quora users and AI models both penalize overtly promotional content. Update answers periodically to keep them current.

Score: Pass / Fail


15. Wikipedia and Knowledge Base References

What to check: Search Wikipedia for your brand, your founders, and your product category. Does your brand have a Wikipedia page? If not, is your brand cited as a reference on any Wikipedia articles? Check whether your brand appears in industry-specific knowledge bases, directories, or databases (Crunchbase, G2, Product Hunt, industry wikis).

Why it matters for AI: Wikipedia is one of the most heavily weighted sources in AI training data. A 2023 study estimated that Wikipedia content appears in the training data of every major language model, often multiple times through different data pipeline stages (Dodge et al., 2021). Brands with Wikipedia pages or Wikipedia citations have substantially stronger entity presence in AI models. Beyond Wikipedia, knowledge bases like Crunchbase and G2 also feed AI entity graphs.

How to fix it: If your brand qualifies for a Wikipedia page (notable coverage in independent, reliable sources), consider working with an experienced Wikipedia editor to create one — but never create it yourself, as this violates Wikipedia's conflict-of-interest guidelines. If a page is not yet warranted, focus on getting your brand cited as a reference in existing Wikipedia articles where relevant. Ensure your Crunchbase, G2, and Product Hunt profiles are complete, accurate, and current. List your brand in every reputable industry directory.

Score: Pass / Fail


16. Industry Citations and Thought Leadership

What to check: Search for your brand in industry publications, analyst reports, and respected blogs in your space. Have you been quoted in articles? Has your data been cited? Do you publish original research, benchmarks, or surveys that others reference? Check backlink tools (Ahrefs, Moz) for referring domains from high-authority industry sites.

Why it matters for AI: AI engines build brand authority from a network of citations across the web. When multiple credible sources reference your brand, data, or expertise, AI models learn to associate your brand with authority in that domain. This is the generative equivalent of backlinks — but instead of boosting a ranking position, it increases the probability of being cited in an AI-generated response. Brands with strong citation networks across industry publications consistently outperform competitors with higher traditional domain authority but weaker cross-platform presence.

How to fix it: Invest in original research — proprietary data, surveys, benchmarks, and analyses that others want to cite. Pitch bylined articles and expert commentary to industry publications. Participate in podcasts and webinars where hosts link to your brand in show notes. Build relationships with analysts who cover your category. Create data assets (reports, indexes, calculators) that earn natural citations from other publications.

Score: Pass / Fail


Your AEO Audit Scorecard

Use the table below to tally your results.

# Audit Item Category Result
1 Robots.txt AI Crawler Access Technical Foundation Pass / Fail
2 XML Sitemap Completeness Technical Foundation Pass / Fail
3 Page Speed and Core Web Vitals Technical Foundation Pass / Fail
4 HTTPS and Security Configuration Technical Foundation Pass / Fail
5 JSON-LD for Core Page Types Schema & Structured Data Pass / Fail
6 FAQPage Schema Schema & Structured Data Pass / Fail
7 Organization Schema Schema & Structured Data Pass / Fail
8 Product/Service Schema with Reviews Schema & Structured Data Pass / Fail
9 Direct Answer Blocks Content Structure Pass / Fail
10 Comparison Tables Content Structure Pass / Fail
11 FAQ Sections Content Structure Pass / Fail
12 Data Citations and Source References Content Structure Pass / Fail
13 Reddit Presence Off-Site Authority Pass / Fail
14 Quora Answers Off-Site Authority Pass / Fail
15 Wikipedia and Knowledge Base References Off-Site Authority Pass / Fail
16 Industry Citations and Thought Leadership Off-Site Authority Pass / Fail
Total Score __ / 16

What to Do After Your Audit

A completed audit gives you a clear map of what is broken. Here is how to prioritize fixes:

If you scored 0-5 (Critical): Start with Category 1. Technical barriers are binary — if AI crawlers cannot access your site, nothing else matters. Fix robots.txt and sitemap issues first. Then move to Schema. These are one-time fixes with permanent impact.

If you scored 6-9 (Needs Work): You likely have the technical basics in place but are missing structured data and content formatting. Focus on implementing JSON-LD schema across your site and restructuring existing content with direct answer blocks and FAQ sections. See our GEO content playbook for content-specific guidance.

If you scored 10-13 (Good): Your gaps are likely in off-site authority. AI models learn about brands from the broader internet, not just your website. Invest in Reddit presence, industry citations, and original research that others want to reference.

If you scored 14-16 (Excellent): Maintain what you have and focus on velocity. Publish more citation-worthy content, expand into new topic clusters, and monitor your AI visibility across engines to catch any regressions.

The Bigger Picture

This audit is a snapshot. AI engines update their models, change their crawling behavior, and shift their citation preferences continuously. What passes today may fail in six months.

The brands winning in AI visibility treat AEO as an ongoing practice, not a one-time project. They monitor their presence across ChatGPT, Perplexity, Gemini, and Claude. They track which queries return their brand and which do not. They iterate on content, structured data, and authority signals continuously.

That is exactly what Voyage is built to do. Voyage is a GEO platform that monitors your AI visibility across every major engine, identifies gaps, generates optimized content, and delivers it directly to your site. Instead of running this audit manually every quarter, Voyage runs it continuously — and fixes what it finds.

If you want to understand the full discipline behind this, start with our guide to generative engine optimization. If you are ready to build a content strategy around AI visibility, our GEO content playbook walks through the process step by step.

The AI search shift is not coming. It is here. The only question is whether your brand is visible in it.