Marketing

AI Email Deliverability Audit: Diagnose 10,000 Emails in 10 Minutes

AI Email Deliverability Audit: Diagnose 10,000 Emails in 10 Minutes
Contents

The dashboard screamed before I did. Open rate: 14% — down from 38% the week before. Click rate: a near-honest 0.6%. A client had just pushed 22,000 emails through a brand-new sending domain for a Black Friday campaign, and Gmail had quietly moved them from "Inbox" to "Promotions" to "Spam" over the course of 48 hours. The campaign manager wanted answers. The CEO wanted a refund from the ESP (Email Service Provider, 邮件发送服务商). I wanted ten minutes and one terminal window.

What followed is the audit I now run on every new sender account and every domain that's ever taken a sudden delivery hit. It's not exhaustive — a full deliverability deep-dive takes a week and a seed list. But it catches 80% of the issues that actually matter, and it runs on free tools plus one well-constructed prompt. Here's the playbook.

What a 10-Minute Audit Can and Cannot Catch

A quick audit is a triage, not a cure. It will reliably catch:

  • Authentication misconfigurations — missing or broken SPF, DKIM, DMARC
  • Blocklist listings — your sending IP or domain is on a public deny list
  • Obvious content spam triggers — broken HTML, all-image emails, spammy subject patterns
  • Sender reputation red flags — bounce rate spikes, complaint rate above 0.1%
  • DNS hygiene — missing reverse DNS, MX misroutes, dangling CNAMEs

It will not reliably catch:

  • Inbox placement (Inbox vs Promotions vs Spam) — that needs a seed list of real test inboxes
  • Engagement quality at the subscriber level — that needs ESP-side analytics
  • Long-term reputation trends — Postmaster Tools needs 30+ days of history
  • Throttling by recipient domains — only visible through inbox placement testing

If a quick audit comes back clean and you're still landing in spam, the problem is in the bucket below. Save the seed-list tools (GlockApps, InboxAlly, Mail-Tester) for the second pass. The 10-minute version is for finding the dumb stuff fast.

The 6 Signals to Check, in Order

I've been doing this for 15 years. The first ten were spent running these checks by hand, in the wrong order, missing the thing that mattered. AI just lets me parallelize what I already know to look for. The order is the part that took the practice to learn.

1. Authentication: SPF, DKIM, DMARC — the Three Records That Have to Be There

Every sending domain needs three DNS records. Skipping any one of them in 2026 is an invitation to spam — Gmail and Yahoo made DMARC mandatory for bulk senders back in February 2024, and the enforcement has only gotten stricter.

The 60-second check (run all three):

bashdig TXT +short yourdomain.com | grep spf
dig TXT +short selector._domainkey.yourdomain.com   # your actual selector
dig TXT +short _dmarc.yourdomain.com

What you're looking for:

Record What it does Good answer
SPF (Sender Policy Framework, 发件人策略框架) Lists the IPs allowed to send for this domain v=spf1 include:_spf.your-esp.com -all (the -all is critical — ~all is too soft)
DKIM (DomainKeys Identified Mail, 域密钥识别邮件) Cryptographic signature proving the message wasn't tampered with v=DKIM1; k=rsa; p=MIIBIjANBgkqhkiG9w0BAQEFAAOC... (a long public key)
DMARC (Domain-based Message Authentication, Reporting & Conformance, 基于域的邮件认证) Tells recipients what to do when SPF or DKIM fails v=DMARC1; p=quarantine; rua=mailto:dmarc@yourdomain.com (start with quarantine, not none)

If you have time for one of these, run it through MXToolbox's SuperTool — it batches all three lookups and flags misconfigurations in plain English. The free tier handles 100+ lookups a day. I've caught misaligned DKIM selectors and forgotten -all in SPF more times than I want to admit using just that single page.

Quick test for the AI to do alongside: paste the three record outputs into ChatGPT and ask: "Are these three records complete, correctly aligned for a domain sending through [your ESP name], and would Gmail/Microsoft/Outlook accept this configuration?" The model will catch things like a malformed SPF that exceeds the 10-lookup limit, or a DMARC policy of p=none (which is essentially a memo to the recipient, not an instruction).

2. Blocklist Status: Are You Already on a Public Deny List?

This is the one that surprises people most. Your open rate drops 30%, you assume it's a content issue, and meanwhile your sending IP has been listed on Spamhaus since Tuesday. The first hour of "fixing the email" gets wasted on subject line tweaks.

The 30-second check:

  • MXToolbox Blacklist Check — paste your sending IP, get checked against 50+ blocklists at once
  • Spamhaus Blocklist Remover — if you're listed, the removal request is on the same page
  • MultiRBL.valli.org — alternative cross-check, sometimes catches listings Spamhaus misses

The IPs to check are the ones your ESP sends from. If you're on a shared IP (most ESPs default to this), the listing might not be your fault — but it still affects your delivery. Shared IPs with a reputation problem are a strong argument for moving to a dedicated IP at the next contract renewal.

3. Bounce Code Breakdown: Hard, Soft, and the Ones That Lie

Bounce codes are how mail servers tell you what went wrong. Most teams ignore them. Big mistake. A spike in a specific bounce code often points at the real problem — and the codes aren't standardized across mailbox providers, so you need an interpreter.

The 3-minute check:

Pull the last 10,000 sends' bounce logs from your ESP. Group by bounce code. The breakdown you're looking for:

  • Hard bounces (5.x.x codes) — recipient address is permanently dead. These should be suppressed immediately, not "managed." Anything over 2% hard bounce rate on a send is a problem.
  • Soft bounces (4.x.x codes) — temporary failure (mailbox full, server down, greylisting). A small percentage is normal; a sudden spike means the recipient domain is throttling you.
  • Block codes — the receiving server explicitly rejected you. Common patterns: 550 5.7.1 (reputation), 554 5.7.1 (content), 521 (IP block). The numbers tell you which system flagged you.
  • Spam complaint codes (feedback-loop, abuse-report) — recipient hit "Mark as Spam." This is the most damaging number in the whole report. Above 0.1% and you're in Postmaster Tools danger territory; above 0.3% and Gmail will start bulk-filtering you.

The AI shortcut: paste the top 20 bounce codes (with counts) into ChatGPT with this prompt:

You are a senior email deliverability analyst. Here is a bounce code summary from a 10,000-send email campaign: [paste data]. Categorize each code (hard/soft/block/complaint), explain what it means, and rank the top 3 issues by impact on sender reputation. Recommend a specific fix for each.

The model doesn't know your sender's exact history, but it knows the SMTP code taxonomy cold. In 30 seconds it produces a triage report that would take a junior marketer half a day.

4. Content Signals: The Spam Filter's View of Your Email

Spam filters score content. They look at specific patterns: image-to-text ratio, link count, HTML structure, subject line phrasing. Most of these are well-known. The challenge is that what triggers a filter changes as spammers evolve and as filters retrain.

The 5-minute check:

Send the email to Mail-Tester (mail-tester.com) — a free service that scores your email out of 10. Anything 8+ is in good shape; below 6 means at least one major issue.

For a more diagnostic pass, also run the email body through this ChatGPT prompt:

You are an email deliverability analyst. Here is the HTML source of an outbound email: [paste source]. Score it 1–10 on these dimensions: (1) image-to-text ratio, (2) link count and destination variety, (3) HTML clean (no broken tags, no Microsoft Word output, no all-image design), (4) subject line spam triggers (urgency words, ALL CAPS, excessive punctuation, emoji), (5) presence of a plain-text alternative, (6) unsubscribe header present. Flag any score under 7 with a specific fix.

Specific things to look for in your own audit:

  • Single-image emails — entire body is a 600KB PNG with no text fallback. Almost guaranteed to land in spam now.
  • URL shorteners — bit.ly, t.co inside marketing email. Spam filters assume you're hiding the destination.
  • "Click here to view" + image-only design — the worst combination. Looks great in preview, gets filtered on open.
  • Subject lines with three or more spam words — "FREE!!!", "ACT NOW", "Limited time" (yes, "limited time" is now a soft trigger). One is forgivable; three is a pattern.
  • Missing List-Unsubscribe header — Gmail requires it for bulk senders. Most ESPs add it automatically, but worth confirming in the raw header.

5. Sender Reputation: What Gmail and Microsoft Actually Think of You

Gmail runs a domain and IP reputation score for every sender that registers with Postmaster Tools. Microsoft has its own version in SNDS (Smart Network Data Services, 智能网络数据服务). Both are free. Both tell you the truth.

The 2-minute check (both, ideally):

  • Gmail Postmaster Tools — register your domain, verify, then check: domain reputation (High/Medium/Low/Bad), IP reputation, spam rate, authentication pass rate, feedback loop complaints. The dashboard lags 24-48 hours, so the numbers you see this morning are from two days ago. That's still the most honest signal you can get.
  • Microsoft SNDS — register your sending IP, get daily data on filtered message rate, complaint rate, and trap hits.
  • Sender Score (senderscore.org) — Validity's free reputation score, 0-100. Below 80 is a warning sign; below 70 means mailbox providers have already throttled you.

If your Gmail domain reputation is "Low" or "Bad," you have a reputation problem. No content tweak will save you. The fix takes weeks of clean sending to a highly engaged segment, a suppressed inactive-user cleanup, and possibly a new IP rotation. Knowing this in the first 10 minutes of the audit saves you from a week of fruitless subject-line optimization.

6. Engagement Quality: Who's Actually Opening (and What to Do About It)

The first five signals are about you — your config, your content, your reputation. This last one is about them — your subscribers. And honestly, it's the one that decides long-term deliverability.

The 2-minute check:

Pull from your ESP: what percentage of this list has opened or clicked in the last 90 days? If the number is below 20%, the list is dead. Sending to it is the single biggest thing dragging down your reputation, and no amount of authentication work will compensate for sustained low engagement.

The standard remediation:

  1. Sunset flow — anyone who hasn't opened in 90 days gets a 3-email re-engagement series with progressively more aggressive "do you still want these?" framing. Final email is the breakup: "Hit reply if you want to stay; otherwise we'll stop sending." Anyone who doesn't respond gets suppressed.
  2. Engagement-based segmentation — high-openers go to your main promotional list. Medium-openers get less frequent sends. Non-openers get the sunset or nothing at all.
  3. Re-permission for borderline users — 90-180 days inactive is the gray zone. A one-time "we're cleaning the list, want to stay?" re-confirmation campaign rebuilds a tighter, more engaged base.

A 30,000-list with 8,000 active openers will outperform a 100,000-list with 5,000 active openers on every metric that matters. The size of the list is vanity. The engagement of the list is money.

The Full 10-Minute Playbook

Here's the actual sequence, optimized to be runnable by one person in one sitting:

Minute Action Tool
0:00–1:30 Pull DNS records, validate SPF/DKIM/DMARC dig + MXToolbox SuperTool
1:30–2:30 Check sending IP against 50+ blocklists MXToolbox Blacklist Check + MultiRBL
2:30–5:00 Pull last 10K bounce log, group by code, paste top 20 into ChatGPT for triage ESP dashboard + ChatGPT
5:00–6:30 Send a test email to Mail-Tester, capture score mail-tester.com
6:30–8:00 Run the email body through the content-analysis ChatGPT prompt ChatGPT
8:00–9:00 Pull Gmail Postmaster Tools reputation + Microsoft SNDS data Postmaster Tools + SNDS
9:00–10:00 Pull 90-day engagement from ESP, identify sunset candidates ESP dashboard

If everything comes back clean, you have a 10-minute document showing the client (or yourself) that the config is correct and the issue is elsewhere. If something fails, you have a ranked list of fixes, prioritized by impact on inbox placement.

The escalation order when something is wrong:

  1. Blocklist listing — get removed, then continue
  2. Missing/broken auth records — fix DNS, wait 1-24h for propagation, retest
  3. Hard bounce rate > 2% — scrub the list, re-suppress dead addresses, resend
  4. Spam complaint rate > 0.1% — pause sends, run a list-cleaning campaign, restart with the sunset flow in place
  5. Domain reputation "Low" or "Bad" — the hard case. Start the 30-day reputation rebuild protocol (engaged segment only, no cold sends, monitor daily)
  6. Mail-Tester score < 6 — content rewrite; usually a single offender (one bad URL shortener, one all-image layout)

What to Watch For When You Run This Yourself

Three traps the first-time auditors always fall into:

Trap 1: Confusing inbox placement with delivery rate. A 99% delivery rate sounds great. But 60% of those delivered emails could be in spam folders where they'll never be seen. Delivery rate is "did the receiving server accept the message?" Inbox placement is "did it land where the user would actually see it?" They're different numbers, and only the second one matters for ROI. If your open rate is far below industry benchmarks despite a high delivery rate, inbox placement is the issue.

Trap 2: Trusting a "p=none" DMARC policy. p=none means "do nothing if authentication fails, just tell me about it in the reports." That's a monitoring setting, not a security setting. Gmail and Yahoo accepted it for the 2024 deadline, but p=quarantine is the floor for real protection. Move from none to quarantine as soon as you've confirmed your auth is clean (run the audit first — if SPF or DKIM is broken, switching to quarantine will start quarantining your own legitimate email).

Trap 3: Fixing the audit's symptoms, not its causes. A "fix the SPF record" recommendation is rarely the right answer on its own. The question is why was the SPF record broken — usually, it's that the team added a new sending service six months ago and never updated the record. The audit is a snapshot; the cause is a process gap. Document the fix in a place the next person on the team will see (Confluence, Notion, a runbook). Otherwise you'll be running the same audit in eight months when someone else adds another sending service.

The Frame That Stays Useful

The opening client — the one whose Black Friday campaign went sideways — got the full audit in 11 minutes. Three findings: their DKIM selector had been silently mis-rotated when they switched ESPs the previous month (so Gmail saw 40% of their mail as unsigned); their sending IP shared a /24 with three other customers, one of whom was on a Spamhaus list; and their 90-day engagement was 9%, which meant even clean authentication was fighting an uphill battle.

We fixed the DKIM in 20 minutes, requested delisting on the IP range, and ran a sunset flow that suppressed 67% of the list over the next two weeks. The next campaign — two weeks later, with the cleaned list and the rebuilt auth — went back to a 31% open rate.

The lesson wasn't that AI replaced my judgment. AI did the parallel work — the DNS lookups, the code categorization, the content scoring — at a speed that let me spend my time on the judgment part: which fixes mattered, in what order, and when to push back on a client who wanted to "just resend to everyone next Tuesday." Ten minutes of tooling plus ten minutes of prioritization usually beats ten hours of either one alone.