Skip to content

Detect Duplicate Bugs with AI

Updated 2026-06-08·intermediate·Bug Triage & Reporting

Given a new bug description and N existing bug summaries, returns a ranked list of duplicate candidates with similarity scores (0-100) based on ROOT-CAUSE likelihood rather than surface text — with one-line evidence per candidate.

When to use it

  • Triaging a high-volume bug intake (50+ bugs/week) where duplicates waste cycles.
  • Closing a long-stale bug and need to find it might have re-surfaced under a different title.
  • Looking for patterns across customer complaints — same root cause, different reports.
  • Auditing a bug tracker for likely duplicates that should be merged.

The prompt

XML-tagged — best for Claude 4.x

<role>
You are a bug triage specialist. You judge duplicates by ROOT CAUSE likelihood, not surface text. Two bugs that mention "checkout" may be unrelated. Two bugs with totally different titles may be the same race condition.
</role>

<context>
Similarity is judged on: (a) likely root cause overlap, (b) shared component / code path, (c) shared trigger conditions. Surface text overlap (same words) is a weak signal — it can mislead in both directions.
</context>

<task>
For the new bug and the candidate list:
1. Score each candidate 0-100 for likelihood of being a duplicate.
2. Provide ONE LINE of evidence per candidate explaining the score.
3. Recommend an action per candidate: MERGE (clear duplicate, > 80), LIKELY DUPLICATE (review, 50-80), UNRELATED (< 50).
4. Identify any pattern across multiple candidates suggesting a broader common cause.
</task>

<input>
New bug description: {new_bug}
Candidate bugs (with brief summaries): {candidates}
Component context (optional, helps disambiguate): {context}
</input>

<constraints>
- Scoring based on ROOT CAUSE likelihood, not text overlap.
- Evidence cites SHARED conditions (component, trigger, error code) not shared keywords.
- A high score requires more than overlapping keywords — must cite a causal mechanism.
- If candidates are sparse or limited info, lower scores and note ambiguity.
</constraints>

<output_format>
Two sections:
1. **Ranked table** — Candidate | Score (0-100) | Evidence | Action
2. **Pattern note** — 1-2 sentences if multiple high-score candidates suggest a broader root cause; otherwise "No pattern observed"
</output_format>

Before writing, identify the new bug's likely root cause (proximate + systemic). Use that to evaluate similarity instead of word overlap.

Example

Common pitfalls

  • Model scores on word overlap — 'discount' in both gets high score regardless of mechanism. Re-prompt to enforce mechanism-based scoring.
  • Sparse candidates produce high scores from limited evidence. Lower confidence when descriptions are short.
  • Pattern note gets skipped when not enforced; that's where the most valuable insight often hides (multiple candidates sharing a root cause).
  • Score thresholds (> 80, 50-80, < 50) should match your team's auto-merge tolerance; adjust if needed.

Tips

  • Run this on a regular cadence (weekly) against open bugs vs newly filed ones to keep the tracker clean.
  • Feed in the candidates from a similarity search first (e.g., JIRA "similar issues" feature) — limits the prompt size and improves accuracy.
  • Don't auto-merge based on score alone; treat as triage suggestion. Human confirms.
  • When the pattern note surfaces a broader root cause, file a META bug to track the systemic fix.

FAQ

Use their similarity APIs to pre-filter the candidate set, then feed the top 5-10 candidates into this prompt for ranking. Most bug trackers have search APIs but lack the causal-mechanism scoring this prompt provides.

Related prompts