Claude Computer Use: I Let It Read the SERP and Write Its Own Brief
Contents
Last Tuesday I handed Claude Sonnet 4.5 a target keyword and a screenshot of a blank Chromium tab. Four minutes later it produced a 1,200-word content brief that quoted 8 of the 10 "People Also Ask" boxes the SERP (Search Engine Results Page,搜索引擎结果页) was showing, mapped the H2s the top 5 ranking articles all shared, and flagged a structural gap none of them covered. I hadn't given it a single URL.
The trick is Claude Computer Use — Anthropic's beta tool that puts the model in front of a virtual display, takes a screenshot, returns an action (click here, type this, scroll, read), and loops. It's marketed for browser automation, but the use case nobody is talking about is the one that quietly replaces half the SEO toolstack: a content brief agent that reads the live SERP like a person would, then writes its own brief.
This is how I built that agent, what I learned running it across 60+ briefs, and where it earns its keep.
Why "real" SERP context beats scraped text
Most programmatic SERP scrapers (Ahrefs, SEMrush, our own custom Playwright jobs) return the same three columns: title, URL, meta description. That's a 1990s view of the SERP. Today's page is a structured document — a knowledge panel here, an AI Overview (Google's AI-generated answer box) there, a People Also Ask accordion, related searches at the bottom, sitelinks, video carousels. Search intent is encoded in the layout, not just the words.
When a human SEO opens a SERP, they skim the structure first. They note which questions Google is choosing to surface. They count ads, they look at who's winning the featured snippet, they ask "why is this page ranking when it's clearly worse than this other one." A text scraper throws all of that away.
Claude Computer Use reads the SERP the way a human does. It scrolls, it expands PAA accordions, it clicks into top-ranking pages, and it remembers what it saw in a scratchpad. The brief it produces is grounded in the actual page a user would see — not a parsed abstraction of it.
The 5-step build
1. Stand up a headless browser Claude can drive
The computer_20251124 tool type expects a display, dimensions in pixels, and a way to receive screenshots and send back actions. The cleanest setup I found is a Docker container:
dockerfileFROM mcr.microsoft.com/playwright:v1.48.0-jammy
RUN apt-get update && apt-get install -y xvfb x11vnc
ENV DISPLAY=:99
CMD Xvfb :99 -screen 0 1440x900x24 & \
x11vnc -display :99 -forever -rfbport 5900 & \
sleep infinitySpin it up, port-forward 5900 if you want to watch, and you have a 1440×900 virtual desktop Claude can drive. I run this on a $7/mo Hetzner (a budget European cloud provider) box.
2. Wire the agent loop
The core loop is roughly 60 lines of Python. Pseudocode:
pythondef run_agent(goal: str, max_turns: int = 50):
history = [{"role": "user", "content": goal}]
for turn in range(max_turns):
response = client.beta.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=4096,
tools=[{
"type": "computer_20251124",
"name": "computer",
"display_width_px": 1440,
"display_height_px": 900,
}],
messages=history,
system=SYSTEM_PROMPT,
)
action = parse_action(response) # left_click, type, scroll, etc.
if action is None:
return extract_brief(response)
execute(action) # dispatch to playwright / xdotool
screenshot = capture()
history.append(screenshot_to_tool_result(response, screenshot))The trick most people miss: the action primitives are pixel coordinates and key codes. A click at x=842, y=311. A scroll of delta_y=400. You dispatch those via xdotool (a command-line tool that simulates keyboard/mouse input on the X display) against the X display. It's not as clean as the API SDK tools (web_search, text_editor) but it's reliable once you write the dispatcher.
3. Give the agent a scratchpad
Computer Use alone forgets everything between screenshots. You need a way to make the agent take persistent notes — otherwise the brief at the end of 40 turns is just "what I most recently saw." Add the text_editor_20250722 tool alongside the computer tool. Now the agent can write observations to a file that survives the loop:
BRIEF.md (after turn 6):
- Target keyword: "AI content brief tool"
- 4 ads above the fold, all promoting Surfer/Clearscope/Frase
- Featured snippet: listicle, "8 tools compared"
- PAA Q1: "Is there a free AI content brief tool?"
- PAA Q4: "How is an AI brief different from a regular outline?"This is the single highest-leverage change. Without the scratchpad, the agent hallucinates the SERP. With it, the brief cites real page numbers.
4. Use a system prompt that constrains the agent to a brief-producing role
This is the prompt I actually run with. It works:
You are a senior SEO content strategist. You research search results by browsing
Google in a real browser. Your job is to produce a content brief for a writer.
Rules:
- Always start with a single Google search for the target keyword.
- Read the top 5 organic results. For each, note the URL, word count, H2s,
and one unique angle.
- Expand at least 6 "People Also Ask" boxes. Quote the question text exactly.
- Note the featured snippet format (list, paragraph, table) and what would beat it.
- After browsing, write a final brief to BRIEF.md with these sections:
1. Search intent (informational / commercial / transactional)
2. The 8 questions PAA surfaced, verbatim
3. The 5 subheadings every top-3 result uses
4. The angle gap — what no one is covering well
5. Recommended word count and format
6. 3 internal link targets
- Do not write the article. Stop after the brief.
- If a CAPTCHA appears, abandon the brief and return "BLOCKED".Two non-obvious moves in there: the "If a CAPTCHA appears" branch saves you from burning $0.50 on a loop that will never terminate, and "do not write the article" stops the agent from overstepping — it'll happily write a 2,000-word draft if you let it, which defeats the purpose.
5. Trigger and capture
End-to-end this is a function call. In production I wrap it in a small Flask app so any team member can drop a keyword into a Slack form and get a brief back in their DMs 4-6 minutes later. We process 8-12 briefs a week that way.
What the brief actually looks like
For the keyword "AI content brief tool" the agent produced a brief whose first three sections read:
Search intent. Commercial investigation. The user knows content briefs exist, is comparing tools, and is close to a buying decision. Ads and comparison listicles dominate the page.
PAA questions, verbatim. "What is the best AI tool for content briefs?" / "Is there a free AI content brief tool?" / "How is an AI brief different from a regular outline?" / "Do content writers use briefs?" / "How long should a content brief be?" / "What should a content brief include?" / "Are AI content briefs good for SEO?" / "Can ChatGPT write a content brief?"
Subheadings every top-3 result uses. What is a content brief / Why use one / Key elements / How to write one / Best tools / Free vs paid / Conclusion. The gap: nobody opens with the question "do you actually need one" — that angle is uncontested.
That last sentence is the part I find most useful. I would not have spotted that gap by hand on a Tuesday afternoon. The agent did in 4 minutes.
What to watch out for
- Loop budget. I cap at 50 turns. A clean brief takes 25-35. Past 50, you're usually in a loop because Google changed layout or the agent missed a click.
- Cost. Sonnet 4.5 with screenshots is not cheap. Each screenshot is ~1,300 tokens. A 35-turn brief lands between $0.30 and $0.80 depending on screenshot density. Track it. We burned $14 our first week running it on every keyword we could think of.
- CAPTCHAs and rate limits. About 1-2% of Google queries hit one. The "BLOCKED" branch in the prompt handles it. Don't try to solve CAPTCHAs — you'll get your home IP shadowbanned (flagged for repeated suspicious activity) within an hour.
- Don't scrape your own site. The agent will dutifully read your pages and "include" their weaknesses in the gap analysis. Run briefs against competitors, not yourself.
- Don't trust the word count estimate. It guesses based on a rendered page, which often includes nav, footer, comments. Multiply by 0.7-0.8.
Where this actually earns its keep
For greenfield content — topics where the SERP is fresh and intent is ambiguous — this is the most useful 4 minutes in my week. For an established cluster where I already know the SERP cold, it's overkill. The win is for keywords I haven't looked at in a while, or topics my writers are arguing about.
The pattern generalizes. Anything that requires "look at a live page and reason about its structure" — pricing page teardowns, competitor ad copy mining, App Store reviews for sentiment — is now a 60-line Python script with a system prompt. I have a folder of six of these. The "AI will replace marketers" framing is wrong, but a marketer with a folder of six Claude agents and a $7/mo server is going to outrun a marketer with $300/mo in SaaS subscriptions. That's the actual story.