Configure Parallel Test Execution with AI
Returns a parallel-execution config tailored to your framework (Playwright or Jest), CI runner count, average test duration, and flakiness rate — including shard count, worker count per shard, test ordering strategy, and a reasoning paragraph.
When to use it
- Test suite has grown and total runtime is hurting PR velocity.
- You added CI capacity and want to use it efficiently.
- Flake rate is high enough that retries are stealing budget from parallelization.
- Migrating from single-runner CI to a sharded setup.
The prompt
XML-tagged — best for Claude 4.x
<role>
You are a CI optimization engineer. You know that parallelization has a knee point — beyond which fixture cost, machine spin-up, and queue overhead dominate. You compute shard count from data, not from intuition.
</role>
<context>
Variables:
- **Total tests**: N
- **Average test duration**: D seconds
- **Worker setup overhead per shard**: O seconds (Playwright ~10-30s; Jest ~5-15s)
- **Available CI runners**: R
- **Flake rate**: F (% of tests that need retry)
Optimal shard count balances: total compute time per shard (N * D / shards + O) vs. wall-clock parallelism (limited by R).
Practical caveats:
- Fixture cost — DB seeding, auth setup — paid PER SHARD, not amortized. If fixture cost > 30% of test time, fewer shards is better.
- Flaky tests sometimes need single-worker execution to avoid order dependencies.
- File-level vs test-level sharding affect different things. Playwright shards at file level by default.
</context>
<task>
For the inputs below:
1. Compute recommended shard count using the formula: shards = min(R, ceil(N * D / target_per_shard_seconds))
2. Recommend workers per shard based on framework (Playwright: 1 worker/shard for browser tests with shared fixtures; Jest: # of CPU cores).
3. Recommend test ordering strategy (file system order, alphabetical, by historical duration descending).
4. Address flaky-test retry vs reshard tradeoff.
5. Recommend file-level vs test-level sharding.
6. Provide reasoning paragraph explaining the choices.
</task>
<input>
Total tests: {total}
Average test duration (seconds): {avg_duration}
Flaky test rate: {flake_rate}
Available CI runners / parallelism budget: {runners}
Framework (Playwright / Jest / Vitest): {framework}
Fixture cost per shard (seconds): {fixture_cost}
</input>
<constraints>
- Recommended shard count must be a number (e.g., "8 shards"), not a range.
- Worker count per shard depends on framework — Playwright typically 1 worker/shard with own browser; Jest can run CPU-bound workers in parallel.
- Test ordering recommendation MUST address flake — alphabetical can mask flakes; by-duration prevents long tails.
- Explicitly note when shard count > runners (queueing) or when fixture cost dominates.
- Output target shard runtime should be < 5 min to keep CI snappy.
</constraints>
<output_format>
Four sections:
1. **Configuration** — table: Setting | Value | Reasoning
2. **Sample CI config** — code block (Playwright config or Jest CI YAML)
3. **Tradeoffs** — bullets on what this optimizes for vs against
4. **Re-evaluation triggers** — when to re-tune (suite grows, flake increases, CI capacity changes)
</output_format>
Before writing, do the math: total time = N * D + (shards * O). Make sure shards / runners ratio is sensible.Example
Common pitfalls
- Model ignores flake rate and recommends max shards — flaky tests then spread thin and retries don't absorb them.
- Workers per shard set to # CPU cores for Playwright — wrong; concurrent workers share fixtures and crash.
- Alphabetical ordering recommended — masks order-dependent flake. Recommend by-duration-desc instead.
- Fixture cost ignored in math — fewer shards with same total time often better when fixture > 30% of work.
Tips
- Profile per-test duration first; the longest 10% often dominate runtime. Optimize those before parallelizing.
- Use Playwright `--shard=X/Y` for file-level sharding; merge reports after.
- For Jest, use `--shard=X/Y` (Jest 28+) plus `--maxWorkers=<n>` for in-shard parallelism.
- Pair with `test-impact-from-git-diff` for fewer tests run per PR (further reduces shard count needed).
FAQ
When fixture cost (DB setup, browser launch, auth) > 30% of total runtime. Adding shards then multiplies fixture cost faster than reducing test time. Profile first.
Related prompts
GitHub Actions QA Pipeline Generator
Returns a complete `.github/workflows/qa.yml` with unit → integration → E2E stages, Playwright browser matrix, dependency + browser caching keyed on lockfile, artifact retention with explicit period, and failure notification via webhook / Slack.
Open →GitLab CI Test Pipeline Configuration
Returns a `.gitlab-ci.yml` with stages (build, test:unit, test:integration, test:e2e), parallel matrix for E2E browsers, cache keyed on lockfile, rules section for MR vs main differences, and artifact reports with explicit expiration.
Open →Test Impact Analysis from Git Diff
Reads a git diff and an import graph and returns the minimal test set to run for that diff — direct hits, transitive impact (1-2 levels), and explicit full-suite triggers (config / migration / lock file changes).
Open →Refactor Flaky Test to Stable
Takes a flaky test and its failure history, identifies which of the canonical root causes (race, hard sleep, shared state, network dependency, ordering, animation) is responsible, and produces a rewritten test that fixes the specific cause — no blanket retries.
Open →