Analyze Performance Bottlenecks with AI
Reads a load test result summary (latency percentiles, throughput, error rate, system metrics) and returns a ranked list of suspected bottleneck layers — network, application, database, dependent service, or infrastructure — each with evidence cited from the metrics and a recommended next investigation step.
When to use it
- A load test failed thresholds and you need direction before deep profiling.
- Production p95 climbed and you need a hypothesis before opening every dashboard.
- Comparing two load test runs to explain the regression.
- Teaching engineers to read performance data — the prompt teaches the reasoning.
The prompt
XML-tagged — best for Claude 4.x
<role>
You are a performance engineer who has shipped capacity planning for high-scale systems. You generate hypotheses, not conclusions, because performance data is rarely unambiguous. Every claim cites the metric that supports it.
</role>
<context>
Bottleneck layers (typical order of investigation):
- **Network** — latency variance, packet loss, regional routing
- **Application** — CPU saturation, GC pauses, lock contention
- **Database** — connection pool exhaustion, slow queries, lock waits
- **Dependent service** — third-party latency, retries, rate limits
- **Infrastructure** — instance class limits, disk I/O, memory pressure
Each layer has signature metrics. Don't conclude from one signal; corroborate from multiple.
</context>
<task>
For the test results below:
1. Identify 2-4 candidate bottleneck layers with hypothesis per layer.
2. Rank by likelihood, citing the SPECIFIC metric(s) from the results that support each rank.
3. Recommend ONE next investigation step per hypothesis (what to measure, where to look).
4. Note ALL conflicting signals (metrics that contradict the top hypothesis).
5. Flag if data is insufficient to rank with confidence; name what additional data would help.
</task>
<input>
Test result summary (metrics): {results}
System architecture (services, dependencies): {architecture}
What you tested (load profile): {profile}
Recent changes (deploys, config): {changes}
</input>
<constraints>
- Never claim "the bottleneck is X" without 2+ supporting metrics.
- Rank by likelihood, not order of discovery.
- Conflicting signals are MANDATORY to surface — if everything aligns, the data is probably oversimplified.
- Next-step recommendations must be ONE step ("check connection pool utilization"), not a list ("debug everything").
- Avoid generic recommendations ("profile the application").
</constraints>
<output_format>
Four sections:
1. **Ranked hypotheses** — table: Rank | Layer | Evidence | Confidence (High / Medium / Low)
2. **Next steps** — table: Hypothesis | Next investigation step
3. **Conflicting signals** — bullet list (or "None observed" — but interrogate before saying that)
4. **Data gaps** — what's missing that would sharpen the analysis
</output_format>
Before writing, identify the FIRST point in the load profile where things degraded — the timing is the most informative signal.Example
Common pitfalls
- Model jumps to a single conclusion ('it's the database') without acknowledging the multiple possible causes.
- Confidence levels get omitted; every hypothesis treated as equally likely. Re-prompt for ranking with confidence.
- Conflicting signals are skipped — the model wants to be helpful. Force the section.
- Recommendations get generic ('profile the application'). Demand specific, actionable next steps.
Tips
- Feed the full timeline (when did each metric change?) — the order of degradation is the most informative signal.
- Include 'recent changes' even if you don't think they're related — the model is good at finding correlations the human glossed over.
- Pair with `bug-repro-from-logs` when you have application/HAR logs alongside the metrics.
- Re-run as data comes in from the next-step investigation — the analysis sharpens with each iteration.
FAQ
App CPU has spikes (not steady high), p99 is much worse than p95 (long-tail latency from stop-the-world pauses), and there's no DB or network correlation. Check GC pause logs to confirm.
Related prompts
Design a Load Test Strategy
Returns a load test strategy covering 5 scenario types (baseline / load / stress / spike / soak) with thresholds for response time, throughput, and error rate, environment requirements, monitoring checkpoints, and pass/fail criteria — and explicit environment-parity statement.
Open →Generate k6 Test Script from Endpoint
Reads an endpoint description and returns a ready-to-run k6 script with `options.scenarios` (ramping-arrival-rate), thresholds for p95/p99/error rate, realistic think times, and a `handleSummary()` for exporting to Grafana / InfluxDB or k6 Cloud.
Open →Reproduce a Bug from Logs
Reads application logs, HAR files, or browser console excerpts and reconstructs a step-by-step reproduction recipe with timestamps, the failing request, suspected preconditions, and a confidence flag per inferred step.
Open →Root Cause Analysis (5 Whys + Fishbone)
Given a defect description, returns a literal 5 Whys chain, a fishbone diagram (text representation) categorizing contributing factors into People / Process / Technology / Environment, and a list of preventive measures with named owners — never generic recommendations.
Open →