Run a Test Code Quality Checklist with AI
Returns a per-test-file quality checklist with 20-30 items grouped by category (naming / structure / assertions / isolation / performance / maintainability) — each marked PASS/FAIL with one-line evidence from the code.
When to use it
- Pre-merge sanity check on new test files.
- Auditing an existing test file before refactor.
- Standardizing quality across a team.
- Teaching engineers what 'good test code' looks like by showing what to check.
The prompt
XML-tagged — best for Claude 4.x
<role>
You are a test quality coach. You evaluate test files against a structured rubric so reviewers can see EXACTLY where the file falls short.
</role>
<context>
Categories (every test file should cover):
- **Naming** — describe / it / test names; constants vs magic; clarity
- **Structure** — Arrange-Act-Assert; setup/teardown; describe blocks
- **Assertions** — strong vs weak; one concept per test; failure messages
- **Isolation** — independence between tests; shared state; test order
- **Performance** — no hard sleeps; parallelizable; reasonable runtime per test
- **Maintainability** — readability; DRY (not over-DRY); future-proof selectors
</context>
<task>
For the test file below, run a checklist with 20-30 items spread across the 6 categories. Each item:
1. Concrete check (e.g., "All `test()` names are imperative sentences describing behavior")
2. PASS / FAIL marker
3. Evidence (line number or snippet) supporting the verdict
4. If FAIL, what would make it pass
Group items by category. Items must be specific to the file's framework (Playwright / Jest / etc.).
</task>
<input>
Test file content: {test_code}
Framework: {framework}
</input>
<constraints>
- 20-30 items total, distributed roughly equally across the 6 categories.
- Each item is YES/NO answerable; no gradient.
- Evidence is required for both PASS and FAIL.
- Distinguish framework idioms (Jest `toBe` vs Playwright `toEqual`).
- Don't double-count — one underlying issue shouldn't fail 3 checklist items.
</constraints>
<output_format>
Markdown sections per category (6 H2 headings). Each section contains a table: Check | Status | Evidence | If FAIL, fix by. End with an overall "Score" (count of PASS / total).
</output_format>
Before writing, identify which items are MOST RELEVANT to this file's particular shape — don't pad the checklist with irrelevancies.Example
Common pitfalls
- Checklist gets generic (every check is 'has good naming?') — items should be SPECIFIC to spotted patterns.
- Double-counting — one anti-pattern (e.g., shared state) fails 5 checklist items. Cluster these.
- Evidence missing or vague ('see test file') — require line numbers.
- Score calculation gets fudged when items are 'PARTIAL' — force YES/NO and document partial in evidence.
Tips
- Run on every NEW test file in PR review; catches issues before they propagate.
- Pair with `review-test-code-anti-patterns` — checklist verifies breadth; anti-patterns drill into specific issues.
- Tailor the checklist to your team's standards; add items for org-specific patterns (e.g., 'follows our test-id naming convention').
- Re-run quarterly on existing high-value test files; quality drifts.
FAQ
ESLint and similar tools catch simple, syntactic issues. This checklist catches structural and design issues that need semantic understanding — naming clarity, isolation, assertion quality. Use both.
Related prompts
Review Test Code for Anti-Patterns
Reads a test file and returns a categorized list of anti-patterns — hard sleeps, shared mutable state, weak assertions (`toBeTruthy` instead of `toEqual`), missing teardown, mixed setup/assertion concerns — each with line numbers, severity, and a suggested fix.
Open →Convert Synchronous Waits to Auto-Waiting
Reads a test using hard waits and returns a rewritten version using Playwright auto-waiting (`expect(locator).toBeVisible()`, `toHaveText()`, `toHaveCount()`) — justifies each replacement by what state the original was waiting for, preserves the test's intent.
Open →Refactor Test Suite for DRY
Scans a set of test files and identifies duplicated setup, fixture state, and assertion patterns — proposes refactors using Playwright fixtures, factory functions, or shared helper modules with concrete code diffs. Warns against premature abstraction (single-use helpers).
Open →Page Object Model Refactoring Reviewer
Reviews a Page Object Model class and returns specific refactoring suggestions — locator priority (role > label > testid > CSS), action vs assertion separation, action granularity (one method per user intent), and constructor cleanliness — with diff-style proposed changes.
Open →