Estimate Testing Effort from User Stories with AI
Reads a backlog of user stories and returns testing effort estimates broken down by activity (test design, execution, automation, reporting) with a confidence level (high/medium/low) and a sanity-check flag when total testing exceeds 50% of dev estimate.
When to use it
- You're planning a sprint or quarter and need defensible QA effort numbers.
- You're estimating capacity to negotiate scope with product.
- You're calibrating against historical data to see which stories you underestimated.
- You're a fractional QA lead estimating across multiple teams.
The prompt
XML-tagged — best for Claude 4.x
<role>
You are a QA planning lead. Your estimates are explicit about confidence, broken down by activity, and called out when they exceed reasonable proportions of development effort.
</role>
<context>
Estimating "QA effort" as a single number is useless. Break into 4 buckets: test design, test execution (manual), automation authoring, reporting/communication. Each story gets a confidence level. Sanity check: if total QA effort exceeds 50% of dev estimate, flag for re-discussion — the story is likely too big or too risky.
</context>
<task>
For each user story:
1. Estimate effort (in hours OR story points — match the user's unit) per activity bucket.
2. Sum the activities into a total QA estimate.
3. Assign a confidence level (High / Medium / Low) based on similarity to historical work and clarity of acceptance criteria.
4. Flag stories where QA estimate > 50% of dev estimate (only when dev estimate is provided).
5. Identify the largest source of uncertainty per story.
</task>
<input>
Story list (with brief description, dev estimate if available): {stories}
Unit (hours or story points): {unit}
Team context (automation maturity, historical velocity): {context}
</input>
<constraints>
- Four activity buckets, all four populated per story even if some are 0.
- Confidence: High (similar work shipped recently, criteria clear), Medium (some uncertainty), Low (novel work or vague criteria).
- Flag emoji when QA estimate > 50% of dev: `(!)` or text `HIGH RATIO`.
- Identify ONE source of uncertainty per story (don't list five for everything).
- Total at the bottom: sum across all stories.
</constraints>
<output_format>
Two sections:
1. **Estimation table** — Story | Design | Execution | Automation | Reporting | Total | Confidence | Uncertainty | Flag.
2. **Notes** — 2-3 sentences on assumptions and the largest aggregate uncertainty.
</output_format>
Before writing, identify which stories are most ambiguous so confidence levels are honest.Example
Common pitfalls
- Model rounds to whole days when asked for hours, losing precision.
- Confidence defaults to 'Medium' across the board — push back if it isn't varied per story.
- HIGH RATIO flag gets omitted unless dev estimates are present AND the constraint is in the prompt.
- Uncertainty defaults to 'requirements unclear' for everything — generic, useless; require specificity.
Tips
- Provide dev estimates whenever possible — the HIGH RATIO flag is the most valuable output, and it requires them.
- Re-run quarterly with actual hours-spent data from last quarter; calibration improves dramatically.
- Use this to negotiate scope — if total exceeds team capacity, the table makes it visible which stories to cut or descope.
- Pair with `risk-based-testing-prioritization` — high-risk stories may deserve more QA hours than estimation alone suggests.
FAQ
Whichever your team uses for dev estimates. Mixing units defeats the comparison. Hours are more direct for QA (sprint commitments, capacity planning). Points are more natural in teams that have stable point velocity.
Related prompts
Create Test Plan from PRD
Reads a Product Requirements Document and returns a release-specific Test Plan with scope, milestones, RACI per major activity, deliverables, entry/exit criteria, risk assessment, and a defect management workflow. Uses date placeholders the user fills rather than fabricating.
Open →Risk-Based Testing Prioritization
Reads a feature list and outputs a prioritization matrix with four weighted dimensions (business impact, technical complexity, usage frequency, defect history) scored 1-5, total weighted score, recommended test execution order, and tie-break rule.
Open →Generate Test Strategy Document
Returns a full Test Strategy document with 8 mandatory sections — scope, test levels, test types, entry/exit criteria, risk matrix, environment, resources, deprioritized test types — tailored to a product description and stated constraints.
Open →GitHub Actions QA Pipeline Generator
Returns a complete `.github/workflows/qa.yml` with unit → integration → E2E stages, Playwright browser matrix, dependency + browser caching keyed on lockfile, artifact retention with explicit period, and failure notification via webhook / Slack.
Open →