SEO

Internal Linking at Scale: Claude + Your Sitemap XML

Internal Linking at Scale: Claude + Your Sitemap XML
Contents

An e-commerce client came to me last winter with 312 blog posts and almost no internal links between them. Most posts were orphan pages — Google had to discover them through the sitemap and backlinks alone. Their organic traffic had been flat for six months. We added roughly 1,400 contextual internal links in two afternoons. By the end of the next quarter, the pages we touched averaged +38% organic impressions and 22 of them jumped at least one position on their primary keyword.

I am not telling you this to brag. I am telling you because the technique took about four hours total, and almost all of it was Claude reading their sitemap. Manual internal linking at that scale is fantasy. You stop after 20 pages because your brain melts.

Here is the workflow I have used three times now with good results. It works for blogs from 50 posts to 5,000. The only thing that changes is the prompt, not the shape of the process.

Step 1 — Export your sitemap, then strip it down to URLs and titles

Most CMSs generate a sitemap.xml at /sitemap.xml. If you use WordPress, Yoast or RankMath will give you one. If you are on Webflow, Shopify, or a custom setup, the path is usually obvious.

Open the XML in any text editor. You will see a list of <loc> entries. I pull them into a simple two-column CSV: URL, page title. That is your working file. Anything else — lastmod, priority, changefreq — is noise for this exercise.

If your site is large, the sitemap may be split into a sitemap index that points to several child sitemaps. Concatenate them. For 5,000+ URLs I usually work in batches of 200-300 to keep Claude's context clean.

Step 2 — Build a "content neighborhood" map for the cluster you care about

This is the step most people skip, and it is the one that separates random linking from useful linking. Pick a corner of the site that has a clear topic boundary. For my e-commerce client that was "sneaker buying guides" — about 80 posts. For a B2B SaaS, it might be a single product pillar with its supporting pages.

Export just that slice of the CSV. Add two more columns by hand or with a quick script: primary keyword, and the single sentence the post actually answers. A row in your file now looks like this:

URL: /blog/best-running-shoes-marathon
Title: 8 Best Running Shoes for Marathons in 2025
Primary keyword: best marathon running shoes
Answers: Which marathon running shoes should I buy if I am a sub-4-hour runner?

That sentence column is the secret. It is what lets Claude judge whether a link is a stretch or a real connection. Without it, the model defaults to "these words look similar, let's link them."

Step 3 — Send the file to Claude with this prompt

The prompt I run, slightly trimmed for length:

You are an SEO strategist reviewing a topic cluster on a client's website.
The cluster is about [TOPIC]. The audience is [AUDIENCE DESCRIPTION].

I have attached a CSV with these columns: URL, Title, Primary Keyword,
One-sentence description of what the post actually answers.

For each row, recommend 3-5 OTHER rows in the same file that should
internally link TO this page, and propose:
- The exact anchor text (must read naturally, not stuffed with the keyword)
- A 1-sentence rationale for why the link is relevant
- Where in the source post it would fit best (intro / a specific H2 / conclusion)

Rules:
- Skip obvious navigational links (homepage, about, contact).
- Do not suggest two pages linking to each other AND linking back (no
  reciprocal clutter).
- If no good in-cluster link exists, say "no strong candidate" — do not
  force one.
- Output as a markdown table. One row per source page.

I run this in a fresh chat per cluster. Cross-cluster suggestions get noisy fast, and the table format keeps it scannable.

Step 4 — Apply, but with a human veto

Claude's output is, in my experience, about 75% usable as-is. The other 25% falls into three failure modes:

  • Topic drift. It links two posts because they share a word, not because the reader would actually click. The "where in the source post" column usually exposes this — if the suggested location is a stretch, kill it.
  • Stuffed anchors. "Best marathon running shoes" is a fine anchor. "Best marathon running shoes for beginners" three times in one paragraph is not. Rewrite anchors that read like a 2012 SEO agency wrote them.
  • Reciprocal garbage. I told it not to do this, and it obeys 80% of the time. The other 20% it will happily suggest a chain of A→B→C→A. I eyeball the table for cycles and break them.

I do the application in batches of 10-15 posts per sitting. Open the post, find the suggested location, paste or rewrite the link, move on. About 3-5 minutes per post once you are in flow.

Step 5 — Validate after crawl

Do not trust the model. Validate. After the edits are live, run Screaming Frog or Sitebulb and check three things:

  1. No orphan pages remain in the cluster. The cluster's internal PageRank should now be a connected graph, not a star.
  2. No new redirect chains. A surprising number of "internal links" point at 301'd URLs. Claude does not know your .htaccess. You do.
  3. Anchor text distribution is varied. Pull a sample of 50 of the new links. If 40% use the exact same anchor phrase, you over-optimized. Mix it up before you move on.

I usually wait 4-6 weeks before judging impact. Internal linking is a slow lever. If a page jumps a position in two weeks, that is usually Google re-crawling and re-evaluating, not a structural effect.

Where this workflow falls short

Three honest limits I have hit:

  • Pillar pages with no clear cluster. If your site does not have topical silos — if everything is just "blog posts about anything" — Claude will invent clusters that don't exist. Fix the site architecture first, then run the workflow.
  • Very thin content. If the source post is 200 words, there is nowhere to add a link. Claude will suggest a location; you will realize the post needs a real rewrite. That's fine — but it is rewriting, not linking.
  • Time-sensitive pages. News posts and "X happened today" content age out before the link graph settles. I skip these. Internal linking is for evergreen.

A closing thought

Internal links are the lowest-cost, highest-control SEO lever you have. You decide exactly where they go. There is no ranking model, no link prospect, no email outreach. The reason most teams do not do it at scale is the same reason most teams do not clean their own data — it is boring, repetitive, and seems like it will take forever.

It does not have to take forever. A sitemap and an afternoon is usually enough.