Should I automate the checklist in CI?

Some items can be automated (no waitForTimeout — ESLint rule). Most can't (assertion quality, isolation). Use checklist for human review; lint for what's machine-checkable.

What if my team uses a different framework?

Adapt the framework-specific items (`expect`, fixture patterns, etc.). The categories (naming / structure / assertions / isolation / performance / maintainability) are universal.

How do I calibrate to team standards?

Run the checklist on 5-10 representative files; agree as a team which items are most important. Document the agreed checklist in your contributing guide.

Run a Test Code Quality Checklist with AI

Updated 2026-06-08·basic·Test Code Review

Returns a per-test-file quality checklist with 20-30 items grouped by category (naming / structure / assertions / isolation / performance / maintainability) — each marked PASS/FAIL with one-line evidence from the code.

When to use it

Pre-merge sanity check on new test files.
Auditing an existing test file before refactor.
Standardizing quality across a team.
Teaching engineers what 'good test code' looks like by showing what to check.

The prompt

XML-tagged — best for Claude 4.x

<role>
You are a test quality coach. You evaluate test files against a structured rubric so reviewers can see EXACTLY where the file falls short.
</role>

<context>
Categories (every test file should cover):
- **Naming** — describe / it / test names; constants vs magic; clarity
- **Structure** — Arrange-Act-Assert; setup/teardown; describe blocks
- **Assertions** — strong vs weak; one concept per test; failure messages
- **Isolation** — independence between tests; shared state; test order
- **Performance** — no hard sleeps; parallelizable; reasonable runtime per test
- **Maintainability** — readability; DRY (not over-DRY); future-proof selectors
</context>

<task>
For the test file below, run a checklist with 20-30 items spread across the 6 categories. Each item:
1. Concrete check (e.g., "All `test()` names are imperative sentences describing behavior")
2. PASS / FAIL marker
3. Evidence (line number or snippet) supporting the verdict
4. If FAIL, what would make it pass

Group items by category. Items must be specific to the file's framework (Playwright / Jest / etc.).
</task>

<input>
Test file content: {test_code}
Framework: {framework}
</input>

<constraints>
- 20-30 items total, distributed roughly equally across the 6 categories.
- Each item is YES/NO answerable; no gradient.
- Evidence is required for both PASS and FAIL.
- Distinguish framework idioms (Jest `toBe` vs Playwright `toEqual`).
- Don't double-count — one underlying issue shouldn't fail 3 checklist items.
</constraints>

<output_format>
Markdown sections per category (6 H2 headings). Each section contains a table: Check | Status | Evidence | If FAIL, fix by. End with an overall "Score" (count of PASS / total).
</output_format>

Before writing, identify which items are MOST RELEVANT to this file's particular shape — don't pad the checklist with irrelevancies.

Example

Common pitfalls

Checklist gets generic (every check is 'has good naming?') — items should be SPECIFIC to spotted patterns.
Double-counting — one anti-pattern (e.g., shared state) fails 5 checklist items. Cluster these.
Evidence missing or vague ('see test file') — require line numbers.
Score calculation gets fudged when items are 'PARTIAL' — force YES/NO and document partial in evidence.

Tips

Run on every NEW test file in PR review; catches issues before they propagate.
Pair with `review-test-code-anti-patterns` — checklist verifies breadth; anti-patterns drill into specific issues.
Tailor the checklist to your team's standards; add items for org-specific patterns (e.g., 'follows our test-id naming convention').
Re-run quarterly on existing high-value test files; quality drifts.

FAQ

ESLint and similar tools catch simple, syntactic issues. This checklist catches structural and design issues that need semantic understanding — naming clarity, isolation, assertion quality. Use both.

Related prompts

Test Code Reviewintermediate

Review Test Code for Anti-Patterns

Reads a test file and returns a categorized list of anti-patterns — hard sleeps, shared mutable state, weak assertions (`toBeTruthy` instead of `toEqual`), missing teardown, mixed setup/assertion concerns — each with line numbers, severity, and a suggested fix.

Open →

Test Code Reviewintermediate

Convert Synchronous Waits to Auto-Waiting

Reads a test using hard waits and returns a rewritten version using Playwright auto-waiting (`expect(locator).toBeVisible()`, `toHaveText()`, `toHaveCount()`) — justifies each replacement by what state the original was waiting for, preserves the test's intent.

Open →

Test Code Reviewintermediate

Refactor Test Suite for DRY

Scans a set of test files and identifies duplicated setup, fixture state, and assertion patterns — proposes refactors using Playwright fixtures, factory functions, or shared helper modules with concrete code diffs. Warns against premature abstraction (single-use helpers).

Open →

Test Code Reviewadvanced

Page Object Model Refactoring Reviewer

Reviews a Page Object Model class and returns specific refactoring suggestions — locator priority (role > label > testid > CSS), action vs assertion separation, action granularity (one method per user intent), and constructor cleanliness — with diff-style proposed changes.

Open →