Skip to content

Design a Comprehensive Load Test Strategy with AI

Updated 2026-06-08·intermediate·Performance Testing

Returns a load test strategy covering 5 scenario types (baseline / load / stress / spike / soak) with thresholds for response time, throughput, and error rate, environment requirements, monitoring checkpoints, and pass/fail criteria — and explicit environment-parity statement.

When to use it

  • Designing performance testing for a new service or major release.
  • Standardizing performance testing across multiple services in your org.
  • Justifying performance investment to leadership with a defensible plan.
  • Auditing existing perf tests to see if you're actually testing what you think.

The prompt

XML-tagged — best for Claude 4.x

<role>
You are a performance engineer. You know the difference between load, stress, spike, and soak — and you require ENVIRONMENT PARITY because a load test against a 1-CPU staging proves nothing about a 16-CPU production.
</role>

<context>
Five canonical scenario types:
- **Baseline** — Single-user, single-request: establish a non-loaded performance floor.
- **Load** — Expected production traffic sustained for 15-30 min: verify the system handles normal day.
- **Stress** — Gradually increasing load until breaking point: find the limit.
- **Spike** — Sudden traffic surge (e.g., 5x in 30 seconds): test elasticity.
- **Soak** — Steady load for hours: detect memory leaks, connection pool exhaustion.

Each scenario has different success criteria. Thresholds are per-scenario, not global.
</context>

<task>
For the system below, produce a load test strategy with:
1. **All 5 scenario types** — each with: traffic profile, duration, success criteria (response time p95/p99, throughput, error rate)
2. **Environment requirements** — explicit parity statement comparing test env to prod
3. **Test data volume** — fixtures, seeded data, third-party stubs
4. **Monitoring** — what's captured during runs, alert thresholds
5. **Success criteria** — overall pass/fail definition for each scenario
</task>

<input>
System description: {system}
Expected production traffic: {traffic}
Acceptable response time / error rate: {sla}
Known constraints (environment limits, partner rate limits): {constraints}
</input>

<constraints>
- All 5 scenarios MUST appear; do not conflate or omit.
- Each scenario has p95 AND p99 thresholds (or justify omitting p99).
- Environment parity statement is mandatory — name the differences explicitly.
- Soak duration is hours (not minutes); spike is seconds-to-minutes.
- Each scenario's success criterion must include error rate, not just response time.
</constraints>

<output_format>
Six sections:
1. **Scenario table** — Scenario | Traffic profile | Duration | p95 | p99 | Throughput | Error rate
2. **Environment parity** — paragraph explicitly comparing test env to prod
3. **Test data volume** — bullets
4. **Monitoring** — what's captured + alert thresholds
5. **Run cadence** — when each scenario runs (per-PR, nightly, pre-release)
6. **Overall pass/fail** — paragraph defining what "performance is healthy" means
</output_format>

Before writing, identify any scenario type that doesn't apply (rare but possible) and explain why instead of including it pro forma.

Example

Common pitfalls

  • Model conflates load and stress as 'increasing load'. Force the explicit distinction.
  • Soak gets a 30-minute duration — that's not soak. Force hours.
  • Spike test runs as a 5-minute ramp; the 'sudden' part gets lost. Force seconds-scale ramp.
  • Environment parity gets glossed over with 'use a representative environment'. Demand specific named differences.

Tips

  • Run baseline before EVERY load test — without a floor, you can't tell if performance regressed.
  • Save the response-time histograms; year-over-year comparison surfaces drift the day-to-day misses.
  • Pair with `k6-script-generator` to produce executable test code for each scenario.
  • Pair with `performance-bottleneck-analysis` to interpret the results when something fails.

FAQ

Stress finds the breaking point via gradual increase. Spike tests recovery from a sudden surge at a known level. Stress answers 'where do we break?'. Spike answers 'do we survive Black Friday's first 30 seconds?'.

Related prompts