Skip to content

Run a Root Cause Analysis with AI (5 Whys + Fishbone)

Updated 2026-06-08·intermediate·Bug Triage & Reporting

Given a defect description, returns a literal 5 Whys chain, a fishbone diagram (text representation) categorizing contributing factors into People / Process / Technology / Environment, and a list of preventive measures with named owners — never generic recommendations.

When to use it

  • A P0 or P1 just happened and you owe a postmortem.
  • The same class of bug keeps recurring and you want to understand the systemic cause.
  • You're running a blameless postmortem and need a structured artifact.
  • You're coaching a team through RCA discipline (most go 2 Whys and stop).

The prompt

XML-tagged — best for Claude 4.x

<role>
You are an incident response facilitator. You enforce LITERAL 5 Whys — not 3, not 7 — and a fishbone with exactly four categories: People, Process, Technology, Environment. You never write a preventive measure without naming an owner.
</role>

<context>
RCA tooling: 5 Whys + Fishbone (Ishikawa). 5 Whys answers "why" five times to dig past surface causes to the contributing system. Fishbone groups contributing factors into 4 categories so analysis isn't lopsided to "Technology". Preventive measures map to systemic changes, not "be more careful".
</context>

<task>
For the defect below, produce:
1. **5 Whys chain** — exactly 5 layers. Each "why" answers the previous level, drilling toward systemic causes.
2. **Fishbone (text)** — 4 categories (People, Process, Technology, Environment), each with 2-4 contributing factors specific to this defect.
3. **Preventive measures** — 3-5 measures, each with a named owner role (not a person, but a function: "Engineering Lead", "QA", "Platform Team"). Reject "be more careful" or other non-actionable items.
4. **One key insight** — the single observation about the system the team should not forget after this incident.
</task>

<input>
Defect description: {defect}
Impact: {impact}
Timeline (if known): {timeline}
</input>

<constraints>
- 5 Whys is literal: 5 levels, no fewer. If you can't legitimately reach 5, say so and stop at the highest defensible level.
- Fishbone uses exactly: People / Process / Technology / Environment. No substituting "Tools" for "Technology" or adding a 5th category.
- Preventive measures must have an owner role and be actionable (testable, scheduleable).
- Avoid "human error" as a root cause — find the system condition that allowed the human error.
</constraints>

<output_format>
Four sections:
1. **5 Whys chain** — numbered list 1-5
2. **Fishbone** — 4 categories as H3 headings with bullet lists
3. **Preventive measures** — table: Measure | Owner | Target completion (placeholder)
4. **Key insight** — one paragraph
</output_format>

Before writing, distinguish proximate cause (what triggered it now) from systemic cause (what allowed it).

Example

Common pitfalls

  • Model stops at 3 Whys ('why = a bug') — force the literal 5.
  • Fishbone gets only Technology populated; the other categories get sparse 'N/A' entries. Demand 2-4 factors per category.
  • Preventive measures default to 'Add more tests' / 'Review PRs more carefully' — vague. Require named owner role + actionable.
  • 'Human error' shows up as root cause — that's almost never the right level; push to the system condition that allowed it.

Tips

  • Include the timeline — proximate cause vs systemic cause is much clearer with timestamps.
  • Run RCA within 5 working days of the incident; recall fades quickly.
  • Pair with `bug-severity-priority` to set the right urgency for the preventive measures.
  • Diff this RCA's preventive measures against previous postmortems — repeats signal you're not following through.

FAQ

When you reach a 'why' that's outside your team's locus of control (e.g., 'because of an OS vendor bug'). At that point, name the dependency and stop — but EVERY 'why' before that should be inside your control.

Related prompts