How do I account for AI-assisted test authoring?

Reduce design and automation estimates by 20-40% if your team genuinely uses AI for these activities. Don't reduce execution or reporting — those are still human-driven. Calibrate against real data after a quarter.

What about regression effort?

Build regression into a per-sprint fixed budget (e.g., 16 hours per sprint for the QA team) rather than estimating per story. Per-story regression estimates compound to unrealistic numbers and don't reflect the actual work pattern.

Why 50% as the sanity threshold?

Empirical. When QA effort exceeds half of dev effort, the most common cause is the story is too big — break it down. The second most common is the story is high-risk — separate QA workstream may be warranted (security review, performance work, etc.).

Estimate Testing Effort from User Stories with AI

Updated 2026-06-08·intermediate·Test Strategy

Reads a backlog of user stories and returns testing effort estimates broken down by activity (test design, execution, automation, reporting) with a confidence level (high/medium/low) and a sanity-check flag when total testing exceeds 50% of dev estimate.

When to use it

You're planning a sprint or quarter and need defensible QA effort numbers.
You're estimating capacity to negotiate scope with product.
You're calibrating against historical data to see which stories you underestimated.
You're a fractional QA lead estimating across multiple teams.

The prompt

XML-tagged — best for Claude 4.x

<role>
You are a QA planning lead. Your estimates are explicit about confidence, broken down by activity, and called out when they exceed reasonable proportions of development effort.
</role>

<context>
Estimating "QA effort" as a single number is useless. Break into 4 buckets: test design, test execution (manual), automation authoring, reporting/communication. Each story gets a confidence level. Sanity check: if total QA effort exceeds 50% of dev estimate, flag for re-discussion — the story is likely too big or too risky.
</context>

<task>
For each user story:
1. Estimate effort (in hours OR story points — match the user's unit) per activity bucket.
2. Sum the activities into a total QA estimate.
3. Assign a confidence level (High / Medium / Low) based on similarity to historical work and clarity of acceptance criteria.
4. Flag stories where QA estimate > 50% of dev estimate (only when dev estimate is provided).
5. Identify the largest source of uncertainty per story.
</task>

<input>
Story list (with brief description, dev estimate if available): {stories}
Unit (hours or story points): {unit}
Team context (automation maturity, historical velocity): {context}
</input>

<constraints>
- Four activity buckets, all four populated per story even if some are 0.
- Confidence: High (similar work shipped recently, criteria clear), Medium (some uncertainty), Low (novel work or vague criteria).
- Flag emoji when QA estimate > 50% of dev: `(!)` or text `HIGH RATIO`.
- Identify ONE source of uncertainty per story (don't list five for everything).
- Total at the bottom: sum across all stories.
</constraints>

<output_format>
Two sections:
1. **Estimation table** — Story | Design | Execution | Automation | Reporting | Total | Confidence | Uncertainty | Flag.
2. **Notes** — 2-3 sentences on assumptions and the largest aggregate uncertainty.
</output_format>

Before writing, identify which stories are most ambiguous so confidence levels are honest.

Example

Common pitfalls

Model rounds to whole days when asked for hours, losing precision.
Confidence defaults to 'Medium' across the board — push back if it isn't varied per story.
HIGH RATIO flag gets omitted unless dev estimates are present AND the constraint is in the prompt.
Uncertainty defaults to 'requirements unclear' for everything — generic, useless; require specificity.

Tips

Provide dev estimates whenever possible — the HIGH RATIO flag is the most valuable output, and it requires them.
Re-run quarterly with actual hours-spent data from last quarter; calibration improves dramatically.
Use this to negotiate scope — if total exceeds team capacity, the table makes it visible which stories to cut or descope.
Pair with `risk-based-testing-prioritization` — high-risk stories may deserve more QA hours than estimation alone suggests.

FAQ

Whichever your team uses for dev estimates. Mixing units defeats the comparison. Hours are more direct for QA (sprint commitments, capacity planning). Points are more natural in teams that have stable point velocity.

Related prompts

Test Strategyintermediate

Create Test Plan from PRD

Reads a Product Requirements Document and returns a release-specific Test Plan with scope, milestones, RACI per major activity, deliverables, entry/exit criteria, risk assessment, and a defect management workflow. Uses date placeholders the user fills rather than fabricating.

Open →

Test Strategyintermediate

Risk-Based Testing Prioritization

Reads a feature list and outputs a prioritization matrix with four weighted dimensions (business impact, technical complexity, usage frequency, defect history) scored 1-5, total weighted score, recommended test execution order, and tie-break rule.

Open →

Test Strategyintermediate

Generate Test Strategy Document

Returns a full Test Strategy document with 8 mandatory sections — scope, test levels, test types, entry/exit criteria, risk matrix, environment, resources, deprioritized test types — tailored to a product description and stated constraints.

Open →

CI/CD for QAintermediate

GitHub Actions QA Pipeline Generator

Returns a complete `.github/workflows/qa.yml` with unit → integration → E2E stages, Playwright browser matrix, dependency + browser caching keyed on lockfile, artifact retention with explicit period, and failure notification via webhook / Slack.

Open →