Generate 100 paid-social ad variants per hour on a local M-series Mac via Ollama + Llama 3.3 70B, then run a second pass that scores each variant 0-100 on hook, value clarity, and CTA specificity. The full prompt pair, the Python wrapper, and the math that makes $0 marginal cost win over GPT-4o at any team size.
A 6-week, 8,400-email field test of running a privacy-respecting local-LLM email triage layer on Apple Silicon — Apple Mail → AppleScript → Ollama-served Mistral 7B or Llama 3.1 8B — with a 4-bucket rubric, real hardware benchmarks, and a 92% accuracy figure you can reproduce.
A 30-day blind test: 800 meta titles + 800 product descriptions rewritten by both Qwen 2.5 14B (self-hosted on a refurbished workstation) and Claude Sonnet 4, rated by 3 SEO contractors. The result is not a clean win for open weights — it's a split. Where Qwen breaks even, where it collapses, and the actual cost math behind self-hosting for repeat-pattern SEO work.
Self-hosting a 70B model sounds reckless for a marketing team. For 90% of teams it is. But there are 4 specific jobs — bulk ticket classification, private competitive intel, overnight SEO meta-generation, PII-redacted list cleaning — where the math flips and a single A100 + Ollama pays for itself in 4-7 months. Hardware reality, Docker compose, real throughput, and the 4 prompts.
I ran the same ad-copy brief through self-hosted Mistral Small 24B and GPT-4o, blind-rated by a marketer who'd never seen either output. Here's the full setup — Ollama for laptops, vLLM for a single 4090 server, the prompt template I use, and the per-token cost math that decided which one I kept on the production account.
A practical guide to setting up SmolLM 1.7B on your laptop with Ollama and using it to rewrite marketing content — zero API costs, full privacy, and surprisingly good quality for everyday copy work.