Paid Media

Competitor Ad Monitoring with Meta Ad Library + GPT: My Weekly Sweep Pipeline

Competitor Ad Monitoring with Meta Ad Library + GPT: My Weekly Sweep Pipeline
Contents

Eighteen months ago I would spend every Friday morning manually scrolling through competitor pages in Meta Ad Library. Two hours, eight brands, and a spreadsheet that was outdated by the time the coffee was cold. The week I missed a competitor's pivot from "save time" to "look professional" — a shift that explained a 40% drop in our prospecting CPM (cost per mille, or cost per thousand impressions) — was the week I stopped doing it by hand.

The system that replaced that Friday ritual has now run every Monday for 67 consecutive weeks across a portfolio of seven SaaS and D2C clients. It pulls around 4,000 ads per week, clusters them, and emails a one-page digest before my first standup. Total human time: about 12 minutes, mostly reviewing flagged anomalies.

Here's the full pipeline.

What Meta Ad Library actually gives you

Before touching GPT, it's worth being precise about what the Library is and isn't. The Meta Ad Library (facebook.com/ads/library) is a public, mandatory disclosure database covering every ad run on Facebook, Instagram, Messenger, and Meta's Audience Network — including ads that never leave someone's review queue, with a few exceptions. It is searchable by advertiser name, by keyword, by country, and (since 2023) by topic. Each ad entry shows the creative, the body copy, when it started running, when (if ever) it stopped, the platforms it ran on, and a rough geographic and demographic split.

This is a genuinely different data source from anything else in paid media. You cannot buy this. There is no ad-tracking tool that has it. SpyFu, Semrush, Motion, BigSpy — all of these are reverse-engineering the auction by sampling. The Library is the auction.

Three things it won't do for you: it doesn't show spend, it doesn't show impressions beyond rough "fewer than 1,000 / 1K–5K / 5K–10K / 10K–50K" brackets, and it doesn't show click data. You can see what's running, not what's winning. That's fine for monitoring — the goal is to see what the auction is being told to test, not what it paid for.

Step 1 — The watchlist (5 minutes, one-time)

The first mistake is treating this as a discovery task. It isn't. You already know who you should be watching: your top 5 direct competitors, the 3 to 5 challenger brands that just raised, and the 1 or 2 incumbents in adjacent categories whose positioning you can borrow. That's 10 to 12 brands, tops.

Build the watchlist as a plain Google Sheet with one column for the brand's official Meta Ad Library URL and one for the category label. Mine looks like:

Brand Library URL Category
CompetitorA facebook.com/ads/library/.../?active_status=active&view_all_page_id=... Direct
ChallengerB facebook.com/ads/library/.../?active_status=active&view_all_page_id=... Direct
AdjacentC facebook.com/ads/library/.../?active_status=active&view_all_page_id=... Adjacent

The URL trick: change active_status=active to active_status=all to include ads the brand has stopped running, which is where the interesting creative lives. A brand killing an angle is just as informative as one launching a new one, and the Ad Library is the only source that shows the killed ads in a structured way.

Step 2 — Pull the data (the API vs scraper tradeoff)

You have two routes to actually grab the ads: the Meta Ad Library API, or a scraper (Apify has a maintained one, and there are several open-source Python libraries).

Approach Pros Cons
Meta Ad Library API Official, stable, free, no risk of rate-limit drama Requires Meta app review (1-2 weeks), 6,000 calls/hour cap, no image download — you get URLs only
Scraper (Apify / Python) No approval, grabs images, faster iteration, scrapes historical data the API won't ToS gray area, breaks when Meta changes the markup, runs the risk of being rate-limited at the IP level

I use both. The API runs the weekly sweep on the watchlist. The scraper (Apify's meta-ad-library-scraper, $4 per 1,000 ads as of late 2025) runs once a month to backfill images and historical context. Mixing them is the dirty secret nobody talks about — the API gets you stability, the scraper gets you completeness.

The minimum viable pipeline pulls three things per ad: a unique ID, the start date, and the body copy. Skip everything else on the first pass. You can add the image hash, the CTA button text, and the platform list later, but those don't earn their slot in the digest yet.

Step 3 — Dedupe and cluster (the GPT layer)

Raw Meta Ad Library output is a mess. The same ad will appear 5 to 15 times because of duplicate ad sets, slight copy edits, and re-uploads. If you cluster the raw output, you'll get 20 clusters of "Stop wasting time on [X]" and miss the actual diversity.

The fix is a two-pass dedupe, both run with GPT-4o-mini (cheap enough to dump the whole batch through, around $0.15 per weekly run for my 4,000 ad sample).

Pass 1 — Hard dedupe. Send the batch with this prompt:

You are deduplicating Meta ads. For each ad below, output a canonical
ID that represents the underlying ad (ignore minor copy edits, ignore
re-uploads of the same creative). Group by canonical ID. Return
one row per canonical ID with the ad IDs grouped under it.

This drops the count from 4,000 to about 800–1,200 canonical ads in my pipeline. The cost is predictable; the time is 30 seconds.

Pass 2 — Angle cluster. Send the deduplicated set:

You are a competitive-intelligence analyst. Cluster these ads into
4–8 angle groups. An angle is the psychological hook (e.g. "fear of
falling behind peers", "tangible ROI in 30 days", "social proof via
logo wall"). For each cluster:
- Cluster name (2-5 words)
- Count of ads in cluster
- 1-sentence description
- 2-3 example ad IDs

Then list 3 ads that don't fit any cluster (the long tail).

The "long tail" line is the one that matters. The clusters are the digest's body; the long tail is where the next pivot hides. In week 47 of my own sweep, a long-tail ad turned out to be the first signal of a competitor moving into enterprise — they had launched a "trusted by Fortune 500" creative that didn't fit any of their existing clusters. I would have missed it without the long-tail line.

Step 4 — The weekly digest template

The digest lives in a Notion database, with one entry per week and a one-page summary. The structure I settled on after about ten iterations:

WEEK [N] — [date range]
Top-of-mind: [1 sentence — the single most important shift this week]

Cluster shifts (compared to last 4 weeks):
- Cluster "X": 23% of catalog (was 18%) — explanation
- Cluster "Y": 11% (was 19%) — explanation
- Cluster "Z" is new — explanation
- Cluster "W" is gone — explanation

Long tail (3 ads worth a manual look):
- [ad ID] — [why it stood out]
- [ad ID] — [why it stood out]
- [ad ID] — [why it stood out]

Brands on the move (any of the 12 brands with >20% cluster shift):
- [Brand] shifted from cluster X to Y — implication for our positioning

The "top-of-mind" sentence is the one I actually read. The clusters are there to back it up. The long-tail is there for the serendipitous find. The "brands on the move" line is what I act on — a 20% cluster shift on a single brand is a strong signal of repositioning.

The whole digest is 1 page, takes about 12 minutes to skim, and goes to my Slack #competitive-intel channel at 7:30am every Monday. Most weeks the answer to "should I do anything?" is no. That's the point — the weeks you do need to act, the signal is already 4 weeks old by the time you'd notice manually, and this digest cuts that to 7 days.

Step 5 — The read window: what to look for

After 67 weeks of running this, I look at three things in the digest, in order:

  1. A new cluster name that didn't exist last week. New angle, new positioning, new offer framing. This is the most actionable signal. Two of my clients' biggest creative refreshes in the past year came directly from a new cluster appearing on a competitor's sweep.

  2. A brand's cluster mix flipping. A brand that was 70% problem-aware and 30% social-proof last month, suddenly 30/70. That's a positioning change. Worth a 30-minute read of the new cluster's ads.

  3. A long-tail ad with unusually high body-copy length. Library ads that run >300 characters of body copy are doing one of two things: either explaining something technical (new product, new feature, new market) or trying to out-write a competitor's ad that the brand has clearly seen. Both are signals that something is moving.

I don't look at the raw counts. The total number of ads a brand runs tells you almost nothing. A competitor cutting ad volume by 40% is a signal; a competitor going from 200 to 220 ads is noise.

Where this pipeline breaks

A few honest failure modes.

The watchlist goes stale. If you don't audit it quarterly, you'll be watching brands that exited your category 18 months ago. I've done this. The fix is a 30-minute quarterly check: is each brand still a real competitor? If not, replace it. Most portfolios end up cycling 1 or 2 brands per quarter.

GPT clusters the same way every week. A subtle failure mode. If you don't change the cluster prompt occasionally, the model will find the same clusters week after week, and the "new cluster" signal will get lost in the noise. I rotate the example cluster names in the prompt every 4 weeks, which forces the model to re-derive the structure.

You over-index on volume shifts. A brand going from 100 ads to 200 ads doesn't mean they're spending more — Meta Ad Library is a count of creative variations, and a brand can multiply that count by spinning up hundreds of near-duplicates via Advantage+ (Meta's automated ad-placement feature) without spending a dollar more. The volume signal is correlated with intent, not spend. Don't confuse the two in front of a client.

The Library misses a category. Politics, housing, employment, credit, and a few other sensitive categories have stricter rules. If your industry touches any of these, the Library will be a partial view. The fix is to add a Google Ads Transparency scan to the monthly deep-dive, which covers YouTube and Display where Meta doesn't.

You start trusting the digest too much. It's a starting point for judgment, not a replacement. I caught a brand silently switching agencies last quarter because the style of their new ad creative didn't match their voice, and that was a reading I made from the actual ads, not from the cluster summary. The digest tells you where to look. Looking is still your job.

The reason this pipeline works isn't the GPT or the API or the Library. It's that competitor ad monitoring, done well, is a habit of small, repeated, cheap observations — not a quarterly project. A weekly digest that costs 12 minutes of attention and surfaces one or two real signals per month is a thousand times more useful than an annual report that takes a week to produce. The tool is just what made the habit possible for me. The habit is the thing.