SEO

Programmatic SEO with AI: Build 500 Pages Google Won't Flag as Thin

Programmatic SEO with AI: Build 500 Pages Google Won't Flag as Thin
Contents

In late 2023, a SaaS client asked me to ship 500 "[Competitor] alternative for [Industry]" pages in three weeks. The competitor list was 12 names long, the industry list was 40+. The math worked out to 480+ pages. Their pitch was simple: "We've done 30 by hand, they rank. Scale it."

I told them the truth: the 30 they had done by hand were ranking because a human had actually used the competitor, formed an opinion, and written something specific. The moment a template replaces that human step, the page becomes a slot machine. Google calls those thin content (内容稀薄). Their support inbox calls them a credibility problem.

We shipped 500 pages anyway — but with a system that I wish I'd had a name for back then. The system has three layers: a data layer (a spreadsheet of facts, not blanks), a template layer (HTML, not just text), and a prompt layer (Claude instructions that pull real signals from the data layer and refuse to invent). It also has five guardrails that I'll walk through, because the guardrails are the part that separates "programmatic SEO" from "spam with extra steps."

If you only remember one thing, remember this: thin content is not a word count problem. It's a signal problem. A 1,200-word page is thin if every paragraph could be on any page. A 600-word page is not thin if it tells me something I couldn't get on the competitor's own homepage. Programmatic SEO is the art of giving 500 pages the second kind of density without writing 500 pages by hand.

The math, honestly

I need to set expectations before you build anything. Most "AI programmatic SEO" advice I read online is either depressing or dishonest. The depressing version: "Just use ChatGPT and a template!" (and the resulting pages rank for 30 days, then evaporate). The dishonest version: "I built 1,000 pages and now I get 200,000 visitors/month!" (and the author has 14 referring domains from Forbes, TechCrunch, and a paid placement on a DR 90 listicle).

The honest version looks like this. For a real B2B SaaS site in a moderately competitive vertical, a well-executed programmatic build of 300-500 pages can realistically deliver 5,000-15,000 incremental organic sessions per month within 6-9 months, assuming:

  • The site has at least DR 30-40 baseline authority
  • The template targets long-tail, intent-clear keywords (4+ words average)
  • Each generated page has 3-5 unique, non-template data points
  • Internal linking is built deliberately, not as a leftover
  • The site already has a working crawl/indexing setup and isn't losing to technical debt

That's the realistic band. The 200k/month case studies you've seen are either (a) DR 70+ sites where the marginal page ranks almost by default, (b) targeting zero-competition keywords, or (c) lying.

Build the system, set expectations against the band above, and you'll be fine.

The data layer: the part everyone skips

Most "AI + programmatic SEO" tutorials start with the template. Wrong order. The template is the easiest part. The data layer — the actual facts your template is going to render — is where the work is. If your spreadsheet is just a list of competitor names and industry names, you're going to produce thin content at scale.

For a "[Competitor] alternative for [Industry]" build, the data layer should look like this. One row per page, and one column per unique fact the page is going to claim:

page_slug competitor industry industry_team_size primary_pain missing_feature pricing_tier integration_count public_review_snippet migration_time
mailchimp-alternative-saas Mailchimp SaaS 5-50 automation limits event-based triggers $13-$800/mo 300+ "Pricing gets punitive past 25k contacts." 1-2 weeks
mailchimp-alternative-dental Mailchimp Dental 1-10 HIPAA concerns BAA availability varies 120 "Dental practices need HIPAA, not deliverability." 1 week
... ... ... ... ... ... ... ... ... ...

Notice what's in the table and what isn't. There is no intro_paragraph column. There is no body_text column. There is no AI-generated summary. Every cell is a fact — a number, a name, a quote, a concrete observation. Some of these cells are populated by hand from real research. Some are populated by Claude from sources you provide. Some you scrape from public review sites and verify.

The data layer is what makes a page about "Mailchimp alternative for dental practices" different from a page about "Mailchimp alternative for SaaS companies." Without it, your template is a costume that 480 pages wear.

How do you fill the table? Two passes. First pass: hand-fill 10-15 rows. This is the most painful part of the project. Read G2 reviews for your competitor. Read their pricing page. Read three Reddit threads from people in the industry you're targeting. Write the cells in your voice, not the template's. Second pass: once you have 10-15 well-populated rows, give those rows to Claude as few-shot examples and ask it to fill the rest of the table, one row at a time, with citations to the source it's pulling from. Discard any row where Claude can't cite a source. You'll end up with maybe 60-70% of the rows populated and the rest hand-finished.

The template layer: HTML, not English

The template should be HTML, not a Markdown outline. This sounds pedantic but it matters for two reasons. First, your template can have conditional blocks ({% if industry == "dental" %}...{% endif %}) that swap out content based on data layer values, which is what gives each page a slightly different shape. Second, your template can include schema, internal links, and structured data that are easy to forget when you're writing prose.

Here's the shape of a working template for a competitor-alternative page. I'll annotate the parts that are doing the load-bearing work.

html

{{competitor}} Alternative for {{industry}} Teams in {{year}}

{# Quick verdict — only one of these renders, picked by data layer #} {% if primary_pain == "pricing" %}

If {{competitor}}'s pricing curve has started hurting, here's the short version: most {{industry}} teams we talk to switch when they cross the {{pricing_tier}} threshold, and the migration takes about {{migration_time}}.

{% elif primary_pain == "automation limits" %}

When {{industry}} teams outgrow {{competitor}}, it's almost always the same ceiling: {{missing_feature}}. We see it constantly.

{% endif %} {# The "why teams switch" section — pulled from the data layer, not invented #}

Why {{industry}} Teams Leave {{competitor}}

{{primary_pain}} is the headline. The supporting cast: integration count sits at {{integration_count}}, which is {{relative_to_category}} the category median. A real {{industry}} workflow typically needs at least 4-6 of those to be production-ready.

{{public_review_snippet}}
{# Comparison table — every cell comes from the data layer #}

{{competitor}} vs. the {{industry}} Alternative

Dimension{{competitor}}Us
Pricing entry{{competitor_pricing_entry}}$0 (free tier)
Integrations{{integration_count}}1,200+
{{industry}} workflows{{workflows_supported}}{{our_workflows_supported}}
{# Industry-specific FAQ — populated from data layer, not made up #}

FAQs: Switching from {{competitor}} in {{industry}}

{% for faq in industry_specific_faqs %}
{{faq.question}}

{{faq.answer}}

{% endfor %}

The important things to notice. The verdict paragraph is one of three options, picked by the data layer. The comparison table has cells that come from the data layer on the competitor side and a fixed claim on your side (which you, the marketer, decide once). The FAQ is a list pulled from the data layer, not generated from scratch.

What you do not see: a generic intro paragraph, a "we are the leading provider" section, or a CTA block. Those are where thin content goes to hide.

The prompt layer: the part most teams get wrong

Now we bring Claude in. The temptation here is to write a prompt that goes "write a blog post about X alternative for Y." Don't. The prompt is the glue between the data layer and the template layer. It receives one row from the data layer as input, and it produces either a JSON object (populating the missing cells in the row) or a small block of conditional prose (for sections where the template asks for it).

Here's the system prompt I use. Adapt it, don't copy it — the specifics will change per project.

textYou are filling in one row of a programmatic SEO data table. You will receive:
1. A page slug (e.g. "mailchimp-alternative-dental")
2. A competitor name and the URL of their homepage
3. An industry name and a list of 3-5 representative companies in that industry
4. The columns already populated for this row (use them, don't contradict)
5. The columns still empty that you need to fill

Your job is to fill ONLY the empty columns. For every fact you add:
- If it's a number (pricing, integration count, etc.), cite a URL.
- If it's a quote, cite the G2/Capterra/Reddit thread URL it came from.
- If you cannot find a citation, leave the cell empty and write "NO_SOURCE" in it.
- Never invent. If a fact is not findable, the cell stays empty.

Output format: a JSON object matching the column names. No prose outside the JSON.

Then for each row, you send the prompt plus the row's context. For the conditional prose blocks in the template (the verdict paragraph, the "why teams switch" intro, the FAQ answers), you have a second prompt that takes the now-fully-populated row and writes those blocks.

textYou are writing 2-3 short prose blocks for a programmatic SEO page. You will receive:
1. The fully populated data row (JSON)
2. The HTML template the prose will be inserted into
3. The block labels to write: ["verdict_paragraph", "why_switch_intro", "faq_answers"]

Hard rules:
- Every claim must be supported by a cell in the data row. If the row says
  integration_count is 120, you can say "120 integrations." If the row doesn't
  say it, you cannot say it.
- Never use phrases like "leading provider," "cutting-edge," or "in today's
  fast-paced world." Those are template-shaped words and they mark the page
  as thin to a careful reader (and increasingly to the algorithm).
- Length: verdict_paragraph 35-55 words, why_switch_intro 50-80 words,
  each faq_answer 30-50 words.
- Output: a JSON object with keys matching the block labels.

The two-prompt structure is the load-bearing part. Prompt 1 produces facts (low temperature, cited). Prompt 2 produces prose (slightly higher temperature, but constrained to the facts from prompt 1). The fact generation and the prose generation never happen in the same call, which is what stops the model from inventing a number to make a sentence work.

The five guardrails

Even with the data layer and the prompt layer done well, a few things will go wrong at 500 pages. These five guardrails are what catch them.

1. Unique data per page. Before any page goes live, a script reads the page's rendered HTML, extracts every number, every quote, and every proper noun, and checks it against the data layer. If two pages share more than 60% of their unique data points, the script flags them and a human reviews. This is the single most effective check against the "every page reads the same" failure mode.

2. Intent-matched title and meta. "Mailchimp alternative for SaaS" is a comparison query, not a "what is" query. The page's title, h1, and meta description have to signal "we're going to answer the comparison question" — not "let me explain email marketing to you." For each programmatic pattern, write the title and meta once, in the template, and verify against the top 3 ranking pages for the pattern. If the top 3 all have a "vs" structure and you have a "guide" structure, you're wrong.

3. Internal linking that isn't an afterthought. Every programmatic page should link to: (a) the parent category page, (b) 2-3 sibling pages from the same industry (other "alternative for [industry]" pages), and (c) 1-2 pillar pages on the topic. The sibling linking is the easy win — 500 pages of "[Competitor] alternative for [Industry]" give you a natural cross-link graph that takes almost no effort to render. The pillar page is the hard part. You need 3-5 of them, and they need to be real articles, not just bigger templates.

4. Schema, every time. Each page gets a SoftwareApplication schema (with aggregateRating if you can ethically source one — and yes, that's a real consideration), a FAQPage schema (matched exactly to the FAQ block, not embellished), and a BreadcrumbList schema. JSON-LD (JavaScript Object Notation for Linked Data, a structured-data format that search engines read to build rich results) is rendered server-side, not client-side, so Googlebot sees it on the first byte. The Rich Results Test (Google's structured-data validator) should pass on every page. If you're shipping 500 pages, you check at least 20 of them.

5. Quality check on a sample, not a full read. You are not going to read 500 pages. But you should read 25 of them — five from the start of the build, five from the middle, five from the end, five that the data-layer-dedup script flagged, and five that you randomly sampled. If those 25 are good, the build is probably good. If two of the random sample have invented facts, you have a prompt-layer bug and you stop the build.

The lesson, not the checklist

Most programmatic SEO advice ends with a checklist. Here's the reframe: a programmatic SEO build is not a content project, it's a data project with a content output. If your data layer is thin, no amount of Claude prompting will save the page. If your data layer is rich, you could probably get away with a mediocre template and the pages would still rank, because the facts are doing the work.

The trap I see teams fall into is the reverse: they spend a week on a beautiful template, an afternoon on the data layer, and then wonder why 500 pages feel like one page rewritten 500 times. Move the time. The template is a one-week task. The data layer is a one-month task. The prompt layer is a one-week task, again. Budget accordingly.

And the closing thought, the one I'd want a younger version of me to read: programmatic pages are not a replacement for hand-written content. They are a complement. The 30 hand-written pages your team can produce in a quarter are still your highest-leverage asset. The 500 programmatic pages are the long tail. The mistake is treating them as the same thing, with the same ROI, on the same timeline. They're not. Build the system, set expectations, ship the long tail — and keep writing the hand-written ones in parallel.