Human review checkpoints — how many?

Best practice: ONE checkpoint after the most consequential decision (severity). More checkpoints add friction; fewer let mistakes propagate. Start with one; add another only if you see consistent mistakes downstream.

Hallucination mitigation?

Structured output schemas at every step force the model into a narrow output space. Combined with rubric constraints inside prompts, hallucination drops significantly. Still: log everything; review weekly for systematic mistakes.

What if Step 3 conflicts with Step 2?

Build conflict detection (e.g., security bug classified but assigned S4 severity). The error-handling section of each step catches this and escalates to human.

Build an AI Bug Triage Workflow (Multi-Step)

Updated 2026-06-08·advanced·AI Agent Workflows

Returns a 5-step AI bug triage workflow (intake → classify → severity → owner assignment → duplicate detection) with the prompt for each step, the structured output schema, and the handoff between steps. Final step emits a structured triage decision.

When to use it

Automating high-volume bug triage (50+ bugs/week).
Building a bot for JIRA / Linear that triages on-creation.
Designing an internal AI workflow integrated with chat tools (Slack bot reports incoming bugs).
Documenting an AI workflow for review before implementation.

The prompt

XML-tagged — best for Claude 4.x

<role>
You are an AI workflow designer. You know that single-shot triage prompts hallucinate; chained workflows with structured output schemas are more reliable. Each step's output is the next step's input.
</role>

<context>
A well-designed AI triage workflow has:
- **Discrete steps** — each with a specific responsibility
- **Schema between steps** — JSON structure passed from one to next; never free-form text
- **Human checkpoints** — explicit places where a human can intercept before downstream automation
- **Fail-safe** — if any step fails to produce valid output, escalate to human, don't proceed
</context>

<task>
For the team configuration below, design a 5-step triage workflow:
1. **Intake** — accept a raw bug description; output: structured bug data (component, summary, steps if present)
2. **Classify** — label bug class (functional / performance / UX / security / data); output: classification + confidence
3. **Severity** — assign SEV (S1-S4) based on intake; output: SEV + reasoning
4. **Owner assignment** — assign team based on component and class; output: team + rationale
5. **Duplicate detection** — search candidate list; output: top 3 duplicates with scores

For each step, provide:
- The prompt
- Output JSON schema
- How the output feeds the next step
- Human checkpoint markers (after which step humans review)
- Error handling
</task>

<input>
Team / owners configuration: {teams}
Severity rubric (S1-S4 definitions): {severity_rubric}
Bug components inventory: {components}
Duplicate detection candidate source: {dup_source}
</input>

<constraints>
- 5 distinct steps; no fewer.
- Each step has explicit output schema (JSON).
- Output of step N is input of step N+1.
- Final step emits a SINGLE structured decision object.
- Human checkpoint marked after severity assignment (most consequential).
- Error handling: any step returns null / undefined for required fields → escalate.
</constraints>

<output_format>
Six sections:
1. **Workflow diagram** — text flow (Step 1 → 2 → 3 → ...)
2-6. **Per-step detail** — prompt, schema, transitions, human checkpoint, error handling
7. **Integration sketch** — pseudocode for running the workflow as a function chain
</output_format>

Before writing, identify the human checkpoint location and any step requiring particular caution.

Example

Common pitfalls

Model produces a monolithic single prompt instead of distinct steps — defeats the chain benefit.
Schemas missing between steps; downstream code can't parse output reliably.
Human checkpoint omitted — automation runs unsupervised through the most consequential decision.
Error handling glossed; what happens when a step fails to produce valid JSON?

Tips

Implement step-by-step in production; validate one before adding the next.
Log every step's input + output to a database for later audit and improvement.
Pair with `duplicate-bug-detector` to deepen Step 5; this prompt sketches it but doesn't give full reasoning rigor.
Add a 'kill switch' env var to disable the workflow during outages of LLM or bug tracker.

FAQ

Both have APIs for create / update / link bugs. Workflow runs as a webhook handler. After Step 5, write the structured decision back to the bug tracker as fields + linked-issue relationships. Subscribe to creation events to trigger.

Related prompts

Bug Triage & Reportingbasic

Write a Detailed Bug Report

Takes a free-form issue description (Slack message, email, support ticket) and returns a structured bug report following the AQA Pro Bug Report Template — clear `[Component] Verb-noun` title, environment, separate severity and priority, numbered atomic repro steps, expected vs actual, and suggested investigation areas.

Open →

Bug Triage & Reportingbasic

Bug Triage: Severity and Priority Assigner

Reads a bug description and assigns SEVERITY (impact on system, 1-4) and PRIORITY (urgency to fix, 1-4) on independent scales, each with a written justification, plus a recommended SLA target. Refuses to collapse the two dimensions into one score.

Open →

Bug Triage & Reportingintermediate

Duplicate Bug Detector

Given a new bug description and N existing bug summaries, returns a ranked list of duplicate candidates with similarity scores (0-100) based on ROOT-CAUSE likelihood rather than surface text — with one-line evidence per candidate.

Open →

AI Agent Workflowsintermediate

Generate CLAUDE.md for a QA Project

Reads a description of your QA project (framework, language, conventions, CI setup) and returns a ready-to-commit `CLAUDE.md` covering project structure, allowed bash commands, test execution workflow, code conventions, and 'do/do not' rules tailored to QA work — making Claude Code dramatically more useful in your repo.

Open →