When should I retry vs reshard?

Retry handles transient flake (~5% rate). Resharding handles capacity. Retries cost ~1.5x runtime per flake; resharding multiplies fixture cost. For low flake rates, retries; for high flake rates, FIX the flake (use refactor-flaky-test).

Playwright workers vs shards?

Workers run tests inside one process (shared browser instance possible). Shards run on different machines. Workers reduce per-machine compute time; shards add machines. Browser tests typically benefit from sharding more than workers.

Should I shard unit tests too?

Usually no — unit tests are fast enough that shard overhead exceeds benefit. Vitest / Jest run thousands of unit tests in seconds with multiple workers. Reserve sharding for E2E.

Configure Parallel Test Execution with AI

Updated 2026-06-08·intermediate·CI/CD for QA

Returns a parallel-execution config tailored to your framework (Playwright or Jest), CI runner count, average test duration, and flakiness rate — including shard count, worker count per shard, test ordering strategy, and a reasoning paragraph.

When to use it

Test suite has grown and total runtime is hurting PR velocity.
You added CI capacity and want to use it efficiently.
Flake rate is high enough that retries are stealing budget from parallelization.
Migrating from single-runner CI to a sharded setup.

The prompt

XML-tagged — best for Claude 4.x

<role>
You are a CI optimization engineer. You know that parallelization has a knee point — beyond which fixture cost, machine spin-up, and queue overhead dominate. You compute shard count from data, not from intuition.
</role>

<context>
Variables:
- **Total tests**: N
- **Average test duration**: D seconds
- **Worker setup overhead per shard**: O seconds (Playwright ~10-30s; Jest ~5-15s)
- **Available CI runners**: R
- **Flake rate**: F (% of tests that need retry)

Optimal shard count balances: total compute time per shard (N * D / shards + O) vs. wall-clock parallelism (limited by R).

Practical caveats:
- Fixture cost — DB seeding, auth setup — paid PER SHARD, not amortized. If fixture cost > 30% of test time, fewer shards is better.
- Flaky tests sometimes need single-worker execution to avoid order dependencies.
- File-level vs test-level sharding affect different things. Playwright shards at file level by default.
</context>

<task>
For the inputs below:
1. Compute recommended shard count using the formula: shards = min(R, ceil(N * D / target_per_shard_seconds))
2. Recommend workers per shard based on framework (Playwright: 1 worker/shard for browser tests with shared fixtures; Jest: # of CPU cores).
3. Recommend test ordering strategy (file system order, alphabetical, by historical duration descending).
4. Address flaky-test retry vs reshard tradeoff.
5. Recommend file-level vs test-level sharding.
6. Provide reasoning paragraph explaining the choices.
</task>

<input>
Total tests: {total}
Average test duration (seconds): {avg_duration}
Flaky test rate: {flake_rate}
Available CI runners / parallelism budget: {runners}
Framework (Playwright / Jest / Vitest): {framework}
Fixture cost per shard (seconds): {fixture_cost}
</input>

<constraints>
- Recommended shard count must be a number (e.g., "8 shards"), not a range.
- Worker count per shard depends on framework — Playwright typically 1 worker/shard with own browser; Jest can run CPU-bound workers in parallel.
- Test ordering recommendation MUST address flake — alphabetical can mask flakes; by-duration prevents long tails.
- Explicitly note when shard count > runners (queueing) or when fixture cost dominates.
- Output target shard runtime should be < 5 min to keep CI snappy.
</constraints>

<output_format>
Four sections:
1. **Configuration** — table: Setting | Value | Reasoning
2. **Sample CI config** — code block (Playwright config or Jest CI YAML)
3. **Tradeoffs** — bullets on what this optimizes for vs against
4. **Re-evaluation triggers** — when to re-tune (suite grows, flake increases, CI capacity changes)
</output_format>

Before writing, do the math: total time = N * D + (shards * O). Make sure shards / runners ratio is sensible.

Example

Common pitfalls

Model ignores flake rate and recommends max shards — flaky tests then spread thin and retries don't absorb them.
Workers per shard set to # CPU cores for Playwright — wrong; concurrent workers share fixtures and crash.
Alphabetical ordering recommended — masks order-dependent flake. Recommend by-duration-desc instead.
Fixture cost ignored in math — fewer shards with same total time often better when fixture > 30% of work.

Tips

Profile per-test duration first; the longest 10% often dominate runtime. Optimize those before parallelizing.
Use Playwright `--shard=X/Y` for file-level sharding; merge reports after.
For Jest, use `--shard=X/Y` (Jest 28+) plus `--maxWorkers=<n>` for in-shard parallelism.
Pair with `test-impact-from-git-diff` for fewer tests run per PR (further reduces shard count needed).

FAQ

When fixture cost (DB setup, browser launch, auth) > 30% of total runtime. Adding shards then multiplies fixture cost faster than reducing test time. Profile first.

Related prompts

CI/CD for QAintermediate

GitHub Actions QA Pipeline Generator

Returns a complete `.github/workflows/qa.yml` with unit → integration → E2E stages, Playwright browser matrix, dependency + browser caching keyed on lockfile, artifact retention with explicit period, and failure notification via webhook / Slack.

Open →

CI/CD for QAintermediate

GitLab CI Test Pipeline Configuration

Returns a `.gitlab-ci.yml` with stages (build, test:unit, test:integration, test:e2e), parallel matrix for E2E browsers, cache keyed on lockfile, rules section for MR vs main differences, and artifact reports with explicit expiration.

Open →

CI/CD for QAadvanced

Test Impact Analysis from Git Diff

Reads a git diff and an import graph and returns the minimal test set to run for that diff — direct hits, transitive impact (1-2 levels), and explicit full-suite triggers (config / migration / lock file changes).

Open →

Test Automationadvanced

Refactor Flaky Test to Stable

Takes a flaky test and its failure history, identifies which of the canonical root causes (race, hard sleep, shared state, network dependency, ordering, animation) is responsible, and produces a rewritten test that fixes the specific cause — no blanket retries.

Open →