How do I suppress nits?

Categorize findings (blocking / suggestion / nit). Bot posts blocking + suggestions. Nits go to a separate audit log, not the PR. Reviewers can opt in to see nits if they want.

Which LLM should power the bot?

Best ratio of capability-to-cost is mid-tier models (Claude 4.x Haiku, GPT-5 mini) for the bot — they can apply rubrics well. Reserve top-tier models for the harder cases when the bot is uncertain.

Is the bot a replacement for human review?

No. The bot handles predictable feedback (anti-patterns, conventions). Human reviewers handle design, edge cases, business logic, and the things the bot can't see (e.g., the right way to test a new feature).

Build an AI Pull Request Review Bot for QA

Updated 2026-06-08·advanced·AI Agent Workflows

Returns a prompt + decision framework for an AI PR-review bot tuned for QA — checks for test anti-patterns, missing tests for new branches/conditions, naming convention violations, and assertion quality, with explicit rules for when to comment vs skip and how to phrase comments without false confidence.

When to use it

Building an internal PR review bot to reduce reviewer burden.
Standardizing test-quality feedback across all PRs.
Reducing reviewer fatigue — the bot handles the predictable feedback so humans focus on design.
Documenting the AI review process for stakeholders before implementation.

The prompt

XML-tagged — best for Claude 4.x

<role>
You are an AI workflow architect with a strong sense of when to ENGAGE vs when to SHUT UP. A noisy PR bot loses trust within 3 reviews; a useful bot earns trust by adding signal without noise.
</role>

<context>
A well-designed PR review bot has:
- **A focused scope** — what it checks; what it explicitly DOESN'T check
- **A decision framework** — when to comment, when to skip, when to request changes vs leave a suggestion
- **A tone guide** — phrase feedback as observations, not commands; never assert what you don't know
- **An escalation rule** — what to do if it can't decide

For QA-focused review, the most valuable comments cover:
- Missing tests for new conditions / branches
- Test anti-patterns introduced (waitForTimeout, weak assertions)
- Naming convention violations
- Coverage gaps (a function added without any test)

LESS valuable (avoid):
- Style nitpicks (let the linter)
- Repeated guidance the team has already heard
- Subjective design opinions
- Comments about non-test changes
</context>

<task>
Design an AI PR review bot for QA with:
1. **System prompt** — the bot's core instruction
2. **Scope** — what to check, what to skip
3. **Decision framework** — flowchart: when to comment vs skip vs request changes
4. **Tone guide** — phrasing rules with examples
5. **Escalation rule** — what to do when uncertain
6. **Per-PR workflow** — input (diff + changed files + tests), processing steps, output (PR comments or pass)
</task>

<input>
Stack: {stack}
Team conventions to enforce: {conventions}
Tone (formal / casual / mixed): {tone}
What to NEVER comment on (e.g., subjective design choices): {never_comment}
Integration target (GitHub / GitLab): {integration}
</input>

<constraints>
- Tone: observations, not commands. NEVER assert "this is wrong" without evidence.
- Comments cite line numbers in the diff.
- The bot must EXPLICITLY NOT comment on style issues handled by linter.
- Escalate (don't comment) when uncertain.
- Provide example comments per category (good and bad).
- The decision framework must be explicit (flowchart-like).
</constraints>

<output_format>
Six sections:
1. **System prompt** — the LLM-facing prompt
2. **Scope (check / skip table)**
3. **Decision framework** — when to comment vs skip vs request changes
4. **Tone guide** — phrasing rules + good/bad examples
5. **Escalation rule** — when to abstain
6. **Per-PR workflow** — pseudocode showing inputs, processing, outputs
</output_format>

Before writing, identify the 3 most common "noise" patterns from PR bots (nitpicks, repetition, false confidence) and address each.

Example

Common pitfalls

Bot comments on every PR even when nothing is wrong — erodes trust quickly. Build in 'silence is OK' explicitly.
Suggestions phrased as commands ('change this') — make sure tone is observation-based.
Bot comments on lines outside the diff (existing code) — restrict to changed lines.
Multiple critiques bundled into one comment; one issue per comment for traceability.

Tips

Pilot with the team's most senior QA engineer; calibrate based on their feedback for first 2 weeks.
Log every bot comment + author response (accepted / disagreed / ignored) — refine the prompt based on data.
Pair with `claude-md-for-qa` — the same conventions feed both the bot and human reviewers.
Add a 'turn off' command for individual PRs (e.g., \"@qa-bot off\" in a PR comment); some PRs are unconventional by intent.

FAQ

Use Octokit / GitHub Apps. The bot needs \"Pull requests: write\" permission. Inline comments use \`POST /repos/{owner}/{repo}/pulls/{pull_number}/comments\` with path + line number.

Related prompts

AI Agent Workflowsintermediate

Generate CLAUDE.md for a QA Project

Reads a description of your QA project (framework, language, conventions, CI setup) and returns a ready-to-commit `CLAUDE.md` covering project structure, allowed bash commands, test execution workflow, code conventions, and 'do/do not' rules tailored to QA work — making Claude Code dramatically more useful in your repo.

Open →

Test Code Reviewintermediate

Review Test Code for Anti-Patterns

Reads a test file and returns a categorized list of anti-patterns — hard sleeps, shared mutable state, weak assertions (`toBeTruthy` instead of `toEqual`), missing teardown, mixed setup/assertion concerns — each with line numbers, severity, and a suggested fix.

Open →

Test Code Reviewbasic

Test Code Quality Checklist

Returns a per-test-file quality checklist with 20-30 items grouped by category (naming / structure / assertions / isolation / performance / maintainability) — each marked PASS/FAIL with one-line evidence from the code.

Open →

AI Agent Workflowsadvanced

Build an AI Bug Triage Workflow

Returns a 5-step AI bug triage workflow (intake → classify → severity → owner assignment → duplicate detection) with the prompt for each step, the structured output schema, and the handoff between steps. Final step emits a structured triage decision.

Open →