Content Quality Audit Claude Skill
This skill audits a website's content quality and produces an actionable report. It crawls pages and measures visible word count, text-to-HTML ratio, and content fingerprints to find thin content, exact duplicate body text across pages, near-duplicate (templated or boilerplate) pages, pages that are mostly markup with little text, and pages missing an H1. These checks surface pages unlikely to rank or add value, calibrated to the site as signals for human review.
Quick Take
Point the skill at a site root or URL list and tune the thin-word and similarity thresholds to the content type. It crawls pages, fingerprints their text, and returns a prioritized report of thin, duplicate, and near-duplicate pages.
What The Skill Checks
- Thin and low-value pages: visible word count below the threshold and pages that are mostly markup with a low text-to-HTML ratio.
- Duplication: exact-duplicate body text across pages and near-duplicate (templated or boilerplate) pages detected via content fingerprints.
- Structural signals: pages missing an H1, with thin-word and similarity thresholds calibrated to the content type before running.
How The Skill Is Packaged
The skill follows the standard Claude Agent Skill structure: a SKILL.md file with YAML frontmatter and workflow instructions, a references/ folder with the full audit check definitions and report template, and a scripts/ folder with Python scripts that crawl the site and extract visible text and content fingerprints, then audit the resulting inventory for thinness and duplication. Install it with npx claude-seo-skills install content-quality-audit, or copy the skill folder into your Claude skills directory; Claude invokes it automatically when a request matches its description.
Skill Files
Every file in the skill is embedded below directly from the Claude-SEO-Skills repository, so you can review exactly what the skill instructs Claude to do before installing it.
SKILL.md
The skill definition: frontmatter, inputs to collect, the workflow, and the resources the skill relies on.
Audit Checks Reference (references/audit-checks.md)
Full definitions, severities, and rationale for every check across the high, medium, and low tiers.
Report Template (references/report-template.md)
The output structure the skill follows, including the sections issues are grouped into.
Content Extractor Script (scripts/extract_content.py)
Crawls pages and measures visible word count, text-to-HTML ratio, and content fingerprints for similarity comparison.
Content Auditor Script (scripts/audit_content.py)
Compares fingerprints across the crawl to flag thin, duplicate, and near-duplicate pages, then outputs the structured report data.