Content

E-commerce Product Descriptions at Scale with Hypotenuse AI (and Avoid the Amazon Suspension Trap)

E-commerce Product Descriptions at Scale with Hypotenuse AI (and Avoid the Amazon Suspension Trap)
Contents

It was a Tuesday in October, and a friend in the e-commerce business showed me his "automated" product launch. 600 SKUs (Stock Keeping Units) added in a single weekend. He was proud. Three weeks later, Amazon pulled 240 of them for "duplicate content," and the account health dashboard went from green to red in one email.

The tool he used was the right tool. The workflow was the wrong workflow. And the failure had nothing to do with the AI and everything to do with what he fed it.

This is the post I wish I could have handed him that morning. A working pipeline for generating 500+ unique, brand-consistent product descriptions in one Hypotenuse AI (an AI content platform built specifically for e-commerce) batch — with the data input, the dedup check, and the platform-compliance rules baked in so you don't end up like he did.

Why Scaling Product Descriptions Is Harder Than It Looks

Every e-commerce owner eventually hits the same wall: the catalog grows faster than the copy can keep up. Ten products, you write by hand. A hundred, you hire a freelancer. A thousand, you need a system. Five thousand, you need AI.

But AI at scale for e-commerce is not the same as AI for a blog. A blog post is judged by the reader. A product description is judged by three judges at once: the platform's style guide, the platform's duplicate-detection algorithm, and the human customer. Miss any one of them and you get demotion, suppression, or — worst case — an account suspension.

That last one is what hit my friend. The descriptions were not bad. They were too similar. Amazon's ASIN (Amazon Standard Identification Number — Amazon's unique product ID) creation system flags near-duplicate content across listings, and a batch with the same template fingerprint is an easy target.

The Pipeline at a Glance

Here is the workflow. Five steps, in order. Skip one and the whole thing falls apart.

  1. Build a structured data input (Google Sheets or CSV) — one row per product
  2. Train a brand voice in Hypotenuse (once per brand, not per product)
  3. Run the batch in Hypotenuse's "Product Descriptions" bulk mode
  4. Cosine-similarity check the output for near-duplicates
  5. Apply platform-specific compliance filters before upload

I'll walk through each one.

Step 1: The Data Input Template (The Centerpiece)

Most failures start here, and most templates I see online are wrong. The instinct is to feed the AI a product name and a one-line description. That is not enough.

The minimum fields your spreadsheet needs:

Column Why it matters Example
sku Join key for everything downstream SHOE-RUN-001
product_name Subject of the description "Men's Trail Running Shoe"
category Forces the model to use the right vocabulary Footwear > Running
target_audience Determines tone, pain points, vocabulary "Trail runners, 30-50, mid-pack pace"
key_features 3-5 attributes the description MUST include "Vibram sole, 8mm drop, 280g"
banned_claims Things the description must NOT say (platform-specific) "best, #1, guaranteed, medical cure"
tone Maps to your brand voice preset "Confident, technical, no fluff"
word_count Hard cap for that SKU (Amazon bullets cap at ~500 chars) 80
seo_keywords Comma-separated, optional "trail running shoe, vibram, lightweight"
brand Lets you segment output by brand "AeroStride"

A row with nine filled columns beats a row with two. The model does the heavy lifting, but only if you give it raw material. I learned this the hard way running this for a client with 1,200 outdoor SKUs — when we upgraded from a 3-column sheet to the template above, the percentage of descriptions that needed human editing dropped from 38% to 11%.

Step 2: Train Your Brand Voice (Once)

Hypotenuse has a "Brand Voice" feature where you feed it 5-10 of your best existing descriptions, and it extracts a style fingerprint. Do this once per brand, save it, then select it at the top of the bulk job.

This is the step that gives you consistency. Without it, every batch sounds like it was written by a different freelancer — because internally, it kind of was. With it, your 500 descriptions read like they came from one copywriter who understood your brand.

Pro tip: don't use your homepage hero copy as training data. Use the mid-funnel descriptions — the ones that are doing the actual selling. That's where the real voice is.

Step 3: The Hypotenuse Bulk Run

In Hypotenuse, go to Product Descriptions → Bulk → Upload CSV. Map each spreadsheet column to the corresponding field. Two important settings:

  • Temperature (a setting that controls how creative/random the AI's output is — lower = more predictable, higher = more varied): 0.7. Not 0.3 (too repetitive, the exact reason my friend got suspended), not 1.0 (too creative, starts inventing features). 0.7 is the sweet spot for "consistently different."
  • Per-row override: respect the word_count column. If a row says 80 words, the output should be 80 words, not 150.

Run it. For 500 SKUs, expect 5-15 minutes. The output downloads as a CSV with a new description column. Don't open the file yet — go to Step 4.

Step 4: Cosine-Similarity Dedup Detection

This is the step my friend skipped. It's also the only thing standing between you and an Amazon suspension.

Cosine similarity is a way of measuring how similar two pieces of text are — expressed as a number between 0 (totally different) and 1 (identical). For e-commerce descriptions, the danger zone is anything above 0.85.

The check (Python with scikit-learn, or in a Google Sheet with a simple cosine formula):

pythonfrom sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import pandas as pd

df = pd.read_csv("output.csv")
vectors = TfidfVectorizer(ngram_range=(1,2)).fit_transform(df["description"])
sim_matrix = cosine_similarity(vectors)

# Flag any pair with similarity > 0.85
for i in range(len(df)):
    for j in range(i+1, len(df)):
        if sim_matrix[i][j] > 0.85:
            print(f"DUPE: {df.iloc[i]['sku']} <-> {df.iloc[j]['sku']} = {sim_matrix[i][j]:.2f}")

For each flagged pair, I regenerate the lower-priority one (the one with fewer key features) with a tone override that nudges it further from its neighbor. Two passes usually brings the whole batch under 0.80.

If you don't code, Hypotenuse has a "Similarity Checker" add-on. It costs extra. It's worth it.

Step 5: Platform Compliance Filters

This is where most AI-generated descriptions die. Each platform has its own rules, and they don't overlap nicely.

Amazon:

  • Product description: max 2,000 characters, plain text only (no HTML)
  • Bullet points: exactly 5, each max 500 characters
  • No promotional language: "best seller," "#1," "sale," "limited time"
  • No URLs, no email addresses, no phone numbers
  • No claims you can't substantiate (medical, FDA, eco-certifications)
  • Attributes in bullets must match the actual product category

Shopify:

  • Far more permissive. No character limits on the body, HTML allowed
  • But: don't stuff SEO keywords. Google's helpful content update penalizes this
  • Avoid duplicate H1s and missing alt text

Etsy:

  • Description max 10,000 characters, but the first 160 characters are what shows in search snippets
  • No external links in the description (Etsy penalizes for off-platform links)
  • Tags: 13 max, all lowercase, no phrases

The simplest way to enforce these in a batch: write a post-processing script (Python or Google Apps Script) that applies each platform's regex rules and fails the description if it violates. Hypotenuse's "Compliance" preset does some of this, but the Amazon rules in particular change quarterly, so I run a manual check on a 20-description sample before any full upload.

The Trap, Named Explicitly

The Amazon suspension trap is not a punishment for using AI. It's a punishment for using AI without dedup detection and compliance filtering. Amazon's policy is clear: identical or near-identical content across ASINs is a violation, regardless of who wrote it. The platform can't tell — and doesn't care — whether the duplicate came from a human copywriter or a model.

If you run the five steps above, you are not in danger. If you skip Step 4, you are gambling with your account health.

The Real Win

500 unique, brand-consistent, platform-compliant descriptions in a single afternoon is not a hypothetical. It's a Tuesday for a properly set up pipeline. The tool is not the bottleneck. The discipline around the tool is.

The marketers I trust with this kind of work aren't the ones with the fanciest AI subscription. They're the ones who treat the input spreadsheet like a product spec and the output like a deliverable to be QA'd (quality-assured), not generated.