Skip to content

Run a Test Code Quality Checklist with AI

Updated 2026-06-08·basic·Test Code Review

Returns a per-test-file quality checklist with 20-30 items grouped by category (naming / structure / assertions / isolation / performance / maintainability) — each marked PASS/FAIL with one-line evidence from the code.

When to use it

  • Pre-merge sanity check on new test files.
  • Auditing an existing test file before refactor.
  • Standardizing quality across a team.
  • Teaching engineers what 'good test code' looks like by showing what to check.

The prompt

XML-tagged — best for Claude 4.x

<role>
You are a test quality coach. You evaluate test files against a structured rubric so reviewers can see EXACTLY where the file falls short.
</role>

<context>
Categories (every test file should cover):
- **Naming** — describe / it / test names; constants vs magic; clarity
- **Structure** — Arrange-Act-Assert; setup/teardown; describe blocks
- **Assertions** — strong vs weak; one concept per test; failure messages
- **Isolation** — independence between tests; shared state; test order
- **Performance** — no hard sleeps; parallelizable; reasonable runtime per test
- **Maintainability** — readability; DRY (not over-DRY); future-proof selectors
</context>

<task>
For the test file below, run a checklist with 20-30 items spread across the 6 categories. Each item:
1. Concrete check (e.g., "All `test()` names are imperative sentences describing behavior")
2. PASS / FAIL marker
3. Evidence (line number or snippet) supporting the verdict
4. If FAIL, what would make it pass

Group items by category. Items must be specific to the file's framework (Playwright / Jest / etc.).
</task>

<input>
Test file content: {test_code}
Framework: {framework}
</input>

<constraints>
- 20-30 items total, distributed roughly equally across the 6 categories.
- Each item is YES/NO answerable; no gradient.
- Evidence is required for both PASS and FAIL.
- Distinguish framework idioms (Jest `toBe` vs Playwright `toEqual`).
- Don't double-count — one underlying issue shouldn't fail 3 checklist items.
</constraints>

<output_format>
Markdown sections per category (6 H2 headings). Each section contains a table: Check | Status | Evidence | If FAIL, fix by. End with an overall "Score" (count of PASS / total).
</output_format>

Before writing, identify which items are MOST RELEVANT to this file's particular shape — don't pad the checklist with irrelevancies.

Example

Common pitfalls

  • Checklist gets generic (every check is 'has good naming?') — items should be SPECIFIC to spotted patterns.
  • Double-counting — one anti-pattern (e.g., shared state) fails 5 checklist items. Cluster these.
  • Evidence missing or vague ('see test file') — require line numbers.
  • Score calculation gets fudged when items are 'PARTIAL' — force YES/NO and document partial in evidence.

Tips

  • Run on every NEW test file in PR review; catches issues before they propagate.
  • Pair with `review-test-code-anti-patterns` — checklist verifies breadth; anti-patterns drill into specific issues.
  • Tailor the checklist to your team's standards; add items for org-specific patterns (e.g., 'follows our test-id naming convention').
  • Re-run quarterly on existing high-value test files; quality drifts.

FAQ

ESLint and similar tools catch simple, syntactic issues. This checklist catches structural and design issues that need semantic understanding — naming clarity, isolation, assertion quality. Use both.

Related prompts