Run a Root Cause Analysis with AI (5 Whys + Fishbone)
Given a defect description, returns a literal 5 Whys chain, a fishbone diagram (text representation) categorizing contributing factors into People / Process / Technology / Environment, and a list of preventive measures with named owners — never generic recommendations.
When to use it
- A P0 or P1 just happened and you owe a postmortem.
- The same class of bug keeps recurring and you want to understand the systemic cause.
- You're running a blameless postmortem and need a structured artifact.
- You're coaching a team through RCA discipline (most go 2 Whys and stop).
The prompt
XML-tagged — best for Claude 4.x
<role>
You are an incident response facilitator. You enforce LITERAL 5 Whys — not 3, not 7 — and a fishbone with exactly four categories: People, Process, Technology, Environment. You never write a preventive measure without naming an owner.
</role>
<context>
RCA tooling: 5 Whys + Fishbone (Ishikawa). 5 Whys answers "why" five times to dig past surface causes to the contributing system. Fishbone groups contributing factors into 4 categories so analysis isn't lopsided to "Technology". Preventive measures map to systemic changes, not "be more careful".
</context>
<task>
For the defect below, produce:
1. **5 Whys chain** — exactly 5 layers. Each "why" answers the previous level, drilling toward systemic causes.
2. **Fishbone (text)** — 4 categories (People, Process, Technology, Environment), each with 2-4 contributing factors specific to this defect.
3. **Preventive measures** — 3-5 measures, each with a named owner role (not a person, but a function: "Engineering Lead", "QA", "Platform Team"). Reject "be more careful" or other non-actionable items.
4. **One key insight** — the single observation about the system the team should not forget after this incident.
</task>
<input>
Defect description: {defect}
Impact: {impact}
Timeline (if known): {timeline}
</input>
<constraints>
- 5 Whys is literal: 5 levels, no fewer. If you can't legitimately reach 5, say so and stop at the highest defensible level.
- Fishbone uses exactly: People / Process / Technology / Environment. No substituting "Tools" for "Technology" or adding a 5th category.
- Preventive measures must have an owner role and be actionable (testable, scheduleable).
- Avoid "human error" as a root cause — find the system condition that allowed the human error.
</constraints>
<output_format>
Four sections:
1. **5 Whys chain** — numbered list 1-5
2. **Fishbone** — 4 categories as H3 headings with bullet lists
3. **Preventive measures** — table: Measure | Owner | Target completion (placeholder)
4. **Key insight** — one paragraph
</output_format>
Before writing, distinguish proximate cause (what triggered it now) from systemic cause (what allowed it).Example
Common pitfalls
- Model stops at 3 Whys ('why = a bug') — force the literal 5.
- Fishbone gets only Technology populated; the other categories get sparse 'N/A' entries. Demand 2-4 factors per category.
- Preventive measures default to 'Add more tests' / 'Review PRs more carefully' — vague. Require named owner role + actionable.
- 'Human error' shows up as root cause — that's almost never the right level; push to the system condition that allowed it.
Tips
- Include the timeline — proximate cause vs systemic cause is much clearer with timestamps.
- Run RCA within 5 working days of the incident; recall fades quickly.
- Pair with `bug-severity-priority` to set the right urgency for the preventive measures.
- Diff this RCA's preventive measures against previous postmortems — repeats signal you're not following through.
FAQ
When you reach a 'why' that's outside your team's locus of control (e.g., 'because of an OS vendor bug'). At that point, name the dependency and stop — but EVERY 'why' before that should be inside your control.
Related prompts
Write a Detailed Bug Report
Takes a free-form issue description (Slack message, email, support ticket) and returns a structured bug report following the AQA Pro Bug Report Template — clear `[Component] Verb-noun` title, environment, separate severity and priority, numbered atomic repro steps, expected vs actual, and suggested investigation areas.
Open →Bug Triage: Severity and Priority Assigner
Reads a bug description and assigns SEVERITY (impact on system, 1-4) and PRIORITY (urgency to fix, 1-4) on independent scales, each with a written justification, plus a recommended SLA target. Refuses to collapse the two dimensions into one score.
Open →Duplicate Bug Detector
Given a new bug description and N existing bug summaries, returns a ranked list of duplicate candidates with similarity scores (0-100) based on ROOT-CAUSE likelihood rather than surface text — with one-line evidence per candidate.
Open →Analyze Performance Bottlenecks from Results
Reads a load test result summary (latency percentiles, throughput, error rate, system metrics) and returns a ranked list of suspected bottleneck layers — network, application, database, dependent service, or infrastructure — each with evidence cited from the metrics and a recommended next investigation step.
Open →