What's a safe auto-merge threshold?

Conservative: never auto-merge, always require human confirmation. Moderate: auto-merge at > 95 confidence. Aggressive: auto-merge at > 85 and notify reporter. False-merge cost is high (you lose customer data), so err conservative.

What's the false-positive cost vs false-negative cost?

False positive: you wrongly merged two distinct bugs; customer's report gets lost in another ticket. False negative: you missed a duplicate; engineer wastes time triaging the same issue twice. Most teams should accept FN over FP — keep the threshold high.

Can this find duplicates across years of historical bugs?

Yes, but feed it in batches of 10-20 candidates per run for quality. Beyond that, the model loses precision in ranking.

Detect Duplicate Bugs with AI

Q: How do I integrate this with JIRA / Linear automatically?

Use their similarity APIs to pre-filter the candidate set, then feed the top 5-10 candidates into this prompt for ranking. Most bug trackers have search APIs but lack the causal-mechanism scoring this prompt provides.

Updated 2026-06-08·intermediate·Bug Triage & Reporting

Given a new bug description and N existing bug summaries, returns a ranked list of duplicate candidates with similarity scores (0-100) based on ROOT-CAUSE likelihood rather than surface text — with one-line evidence per candidate.

When to use it

Triaging a high-volume bug intake (50+ bugs/week) where duplicates waste cycles.
Closing a long-stale bug and need to find it might have re-surfaced under a different title.
Looking for patterns across customer complaints — same root cause, different reports.
Auditing a bug tracker for likely duplicates that should be merged.

The prompt

XML-tagged — best for Claude 4.x

<role>
You are a bug triage specialist. You judge duplicates by ROOT CAUSE likelihood, not surface text. Two bugs that mention "checkout" may be unrelated. Two bugs with totally different titles may be the same race condition.
</role>

<context>
Similarity is judged on: (a) likely root cause overlap, (b) shared component / code path, (c) shared trigger conditions. Surface text overlap (same words) is a weak signal — it can mislead in both directions.
</context>

<task>
For the new bug and the candidate list:
1. Score each candidate 0-100 for likelihood of being a duplicate.
2. Provide ONE LINE of evidence per candidate explaining the score.
3. Recommend an action per candidate: MERGE (clear duplicate, > 80), LIKELY DUPLICATE (review, 50-80), UNRELATED (< 50).
4. Identify any pattern across multiple candidates suggesting a broader common cause.
</task>

<input>
New bug description: {new_bug}
Candidate bugs (with brief summaries): {candidates}
Component context (optional, helps disambiguate): {context}
</input>

<constraints>
- Scoring based on ROOT CAUSE likelihood, not text overlap.
- Evidence cites SHARED conditions (component, trigger, error code) not shared keywords.
- A high score requires more than overlapping keywords — must cite a causal mechanism.
- If candidates are sparse or limited info, lower scores and note ambiguity.
</constraints>

<output_format>
Two sections:
1. **Ranked table** — Candidate | Score (0-100) | Evidence | Action
2. **Pattern note** — 1-2 sentences if multiple high-score candidates suggest a broader root cause; otherwise "No pattern observed"
</output_format>

Before writing, identify the new bug's likely root cause (proximate + systemic). Use that to evaluate similarity instead of word overlap.

Example

Common pitfalls

Model scores on word overlap — 'discount' in both gets high score regardless of mechanism. Re-prompt to enforce mechanism-based scoring.
Sparse candidates produce high scores from limited evidence. Lower confidence when descriptions are short.
Pattern note gets skipped when not enforced; that's where the most valuable insight often hides (multiple candidates sharing a root cause).
Score thresholds (> 80, 50-80, < 50) should match your team's auto-merge tolerance; adjust if needed.

Tips

Run this on a regular cadence (weekly) against open bugs vs newly filed ones to keep the tracker clean.
Feed in the candidates from a similarity search first (e.g., JIRA "similar issues" feature) — limits the prompt size and improves accuracy.
Don't auto-merge based on score alone; treat as triage suggestion. Human confirms.
When the pattern note surfaces a broader root cause, file a META bug to track the systemic fix.

FAQ

Use their similarity APIs to pre-filter the candidate set, then feed the top 5-10 candidates into this prompt for ranking. Most bug trackers have search APIs but lack the causal-mechanism scoring this prompt provides.

Related prompts

Bug Triage & Reportingbasic

Write a Detailed Bug Report

Takes a free-form issue description (Slack message, email, support ticket) and returns a structured bug report following the AQA Pro Bug Report Template — clear `[Component] Verb-noun` title, environment, separate severity and priority, numbered atomic repro steps, expected vs actual, and suggested investigation areas.

Open →

Bug Triage & Reportingintermediate

Root Cause Analysis (5 Whys + Fishbone)

Given a defect description, returns a literal 5 Whys chain, a fishbone diagram (text representation) categorizing contributing factors into People / Process / Technology / Environment, and a list of preventive measures with named owners — never generic recommendations.

Open →

Bug Triage & Reportingbasic

Bug Triage: Severity and Priority Assigner

Reads a bug description and assigns SEVERITY (impact on system, 1-4) and PRIORITY (urgency to fix, 1-4) on independent scales, each with a written justification, plus a recommended SLA target. Refuses to collapse the two dimensions into one score.

Open →

Bug Triage & Reportingadvanced

Reproduce a Bug from Logs

Reads application logs, HAR files, or browser console excerpts and reconstructs a step-by-step reproduction recipe with timestamps, the failing request, suspected preconditions, and a confidence flag per inferred step.

Open →