Top 10 Claude Code Skills
for QA Engineers
Claude Code Skills are markdown files (SKILL.md) that teach the AI how to work inside a specific discipline — debugging, TDD, browser testing, code review. They're selectively loaded by Claude when relevant, so installing all 10 costs nothing in context until you actually need one.
This list is for QA engineers who run Claude Code (or Cursor / Codex / Windsurf — the SKILL.md format is portable) as their main AI coding agent. Each skill below: what it does, why QA cares specifically, install command, an example invocation.
Don't install all 10. Start with these 3 and you cover 80% of the QA workflow:
- 1. Superpowers (TDD + debugging + brainstorming) — Discipline backbone — adds rigor to every prompt.
- 2. Playwright Skill — Browser automation Claude can drive autonomously.
- 3. code-review (Anthropic) — Senior-reviewer on every PR. Catches what you'd miss.
/plugin install superpowers@claude-plugins-official /plugin marketplace add lackeyjb/playwright-skill /plugin install playwright-skill@playwright-skill /plugin install code-review@claude-plugins-official
Not sure where you fit? Find your row, install those skills first. Add more later as projects grow.
| Your role | Main goal | Install |
|---|---|---|
| Manual QA learning automation | First Playwright tests, safely | Playwright Skill + code-review + brainstorming |
| SDET / Test Automation Engineer | Ship more tests per sprint, stop flakiness | Superpowers (TDD + debugging) + qa-skills + code-review |
| QA Lead / Manager | Team workflow + clean reports | code-review + stop-slop + context-engineering |
| Founder doing own QA | Coverage without hiring | Superpowers + Playwright Skill + /simplify |
| QA on long-running CI agent | Autonomous nightly runs | context-engineering + qa-skills + prompt-architect |
A Skill is a directory with a single file — SKILL.md — containing YAML frontmatter (name + description) and markdown instructions Claude follows when the description matches your task. Anthropic opened the format publicly in October 2025; by mid-2026 there are 650+ community skills indexed.
- • Prompts = what you type in one turn.
- • Slash commands = a saved prompt you re-run with
/name. - • Skills = instructions Claude auto-loads when relevant (no need to remember to invoke them).
- • MCP servers = tools Claude can call (browse, query DB, post to Slack).
You stack them: a skill for the discipline (TDD), an MCP for the I/O (Playwright MCP for browser), a slash command for the entry point (/regression-suite).
Two paths. Plugin marketplace for popular bundles, direct from a repo for everything else.
# Browse marketplace claude plugin search # Install by name claude plugin install superpowers@obra/superpowers claude plugin install playwright-skill@lackeyjb
# Clone a single skill from any GitHub repo mkdir -p ~/.claude/skills/stop-slop curl -L https://raw.githubusercontent.com/hardikpandya/stop-slop/main/SKILL.md \ -o ~/.claude/skills/stop-slop/SKILL.md
Confirm with claude skill list. To check what loaded in the current session: /skills.
Forces Claude to write a failing test before any implementation code, then make it pass, then refactor. Locks the assistant into the red-green-refactor loop instead of generating code first and ad-hoc tests after.
QA cares about correctness, not just 'it compiles'. TDD makes Claude think in test cases — boundary values, error paths, edge data — before touching prod code.
- →~3 hours/week saved per engineer (2026 surveys)
- →Coverage stays >80% organically — tests are written first, not after
- →Refactors don't break behavior — green tests guard every move
- →Fewer 'works on my machine' bugs reach review
- →Faster onboarding — new engineers read tests as living spec
- 1Brief the goal. Tell Claude the feature in one sentence + the public interface (function signature, API route, UI behavior).
- 2Let Claude write the first failing test. It will pick boundary values and error cases. Approve or correct the list before code is written.
- 3Watch RED → GREEN. Claude writes minimal implementation to pass the test. Don't let it 'fix more than the test asks for'.
- 4Refactor with confidence. With a green test as safety net, ask Claude to extract helpers, rename, dedupe. Re-run tests after each change.
- 5Add next test, repeat. Cycle through boundary cases, negative paths, integration scenarios.
/plugin install superpowers@claude-plugins-official
"Use TDD to add date-range filtering to GET /bookings. Tests first (boundary: empty range, start > end, future dates), then minimal implementation, then refactor."
"Use TDD on the cart-discount feature. First: write a Playwright test asserting the discount appears after promo code entry. Run it (must fail). Then add the implementation to the React component. Re-run. Refactor the test for reusability."
Imposes a strict 4-step protocol: reproduce → minimize → isolate → fix. Blocks Claude from guessing fixes before it can reliably reproduce the bug. Forces the assistant to log assumptions and verify them.
Most QA-reported bugs come back with 'cannot reproduce'. This skill turns Claude into the rare investigator who asks 'what's the smallest failing case?' first.
- →~2.5 hours/week saved (2026 telemetry)
- →Repro rate stays >95% — every 'fix' is verified against the minimal failing case
- →Fewer regressions — root cause is found, not symptoms patched
- →Bug reports become readable evidence chains, not 'tried X, didn't work'
- →Junior QAs learn investigation by watching Claude work the protocol
- 1Reproduce. Capture exact steps, env, data. Claude refuses to propose a fix until it can re-trigger the bug deterministically.
- 2Minimize. Strip the repro to the smallest input that still fails. Remove unrelated code paths, mock external services.
- 3Isolate. Bisect: which commit, which function, which line introduces the bad behavior? Claude logs every hypothesis it tests.
- 4Fix + verify. Apply the smallest possible change. Re-run the original repro, the minimized case, and adjacent test suites.
/plugin install superpowers@claude-plugins-official
"Use systematic-debugging on test 'cart-checkout-flow' — it fails 1 in 5 times in CI but passes locally. Reproduce on a clean profile before proposing any fix."
"Use systematic-debugging on yesterday's payment incident. Use the Sentry trace + Playwright MCP to reproduce the user journey. Don't propose a fix until you can re-trigger on staging."
Triggers before any creative work — features, test strategies, refactors. Claude asks clarifying questions about intent, constraints, and edge cases instead of jumping to a single solution.
Before writing a test plan, Claude needs to know risk tolerance, scope boundaries, regression vs new-feature coverage. This skill stops it from generating generic 'happy path × 3' plans.
- →Fewer 'wrong solution shipped' moments — alignment happens in chat, not in PR review
- →Test plans cover the actual risk surface, not the obvious paths
- →Less re-work — 1-2 questions up front prevent 30-min rewrites
- →Clearer specs — chat output is the spec, can be pasted into Jira
- 1State the goal vaguely. Resist the urge to over-specify. Brainstorming's job is to surface what you forgot.
- 2Answer Claude's questions honestly. Risk tolerance, scope, what's already covered, what's explicitly OUT. The questions ARE the value.
- 3Review the tradeoffs. Claude presents 2-3 approaches with explicit tradeoffs. Pick one, or ask for a 4th.
- 4Hand off to execution skill. Now invoke TDD, debugging, or qa-skills — they execute on the brief brainstorming produced.
/plugin install superpowers@claude-plugins-official
"Let's brainstorm the regression scope for v4.2. Walk me through what could break and what's already covered."
"Brainstorm: we currently use Cypress, considering Playwright. Help me build a decision matrix: team skills, parallelism, CI cost, debugging UX, mobile coverage. What questions am I not asking?"
Model-invoked browser skill: Claude autonomously writes and runs Playwright code for validation, smoke tests, and exploratory checks. No MCP server required — the skill ships its own runner.
Different from Playwright MCP (live browser control) — this skill is about generating durable test code. When you want 'just verify the login flow works on staging', it writes the locator, runs it, captures results, reports.
✓ What you get (5) · ▸ How to use (5 steps)
- →Resilient locators by default (role-based, not nth-child)
- →Trace files captured on every failure — debug in 30s, not 30min
- →Parallelizable spec structure — fits into existing test runner
- →Smoke checks ready in minutes, not days of scaffolding
- →Captures auth state correctly (storageState, not raw cookies)
- 1Tell Claude what to verify. URL + the user journey in plain English. Add creds if needed (use env vars, not hardcoded).
- 2Let it scaffold. Skill creates playwright.config.ts, sets up storageState, picks browsers. Review before letting it run.
- 3Watch the first run. Trace + screenshots on failure. Claude reads the trace and proposes locator fixes.
- 4Iterate to resilient version. Ask Claude to replace any nth-child / xpath locators with role / aria queries. Re-run.
- 5Commit + add to CI. Skill generates a sample GitHub Actions workflow. Drop in, done.
/plugin marketplace add lackeyjb/playwright-skill /plugin install playwright-skill@playwright-skill # then in shell: cd ~/.claude/plugins/marketplaces/playwright-skill/skills/playwright-skill && npm run setup
"Use the playwright skill to validate the checkout flow on https://staging.acme.com. Email: $TEST_EMAIL. Report which step fails first and capture the trace."
"Use the playwright skill: scaffold a page object model for the admin area (5 pages: list/create/edit/delete/detail). One spec per page. Mobile + desktop projects. Auth via storageState fixture."
Bundle of 6 specialized agents glued into a single E2E test-generation pipeline: scout (URL discovery), scribe (selector capture), playwright-author (test code), multi-user-flow author, mobile auditor, reviewer. Produces complete page objects + spec files, not snippets.
Single-prompt 'write me a Playwright test' produces fragile pasta. This pipeline mirrors how a senior SDET actually works: discovery first, then locators, then a structured spec, then review.
✓ What you get (5) · ▸ How to use (5 steps)
- →Complete POM + spec in ~15 min (vs. days of manual scaffolding)
- →Multi-user flows handled correctly (admin + customer, parallel sessions)
- →Mobile audit included by default — viewport, touch events, responsive checks
- →Reviewer agent catches brittleness before commit
- →Output fits real Playwright project structure — no copy-paste rewrites
- 1Point at a URL or feature area. Give scope: `/admin/users` or 'checkout flow'. Don't try to cover whole app in one run — it'll exhaust tokens.
- 2Scout runs first. Discovers reachable URLs, captures structure. Review the map — kill any irrelevant routes.
- 3Scribe captures selectors. Per-page locators with resilience scoring. Approve or veto.
- 4Authors generate specs. Playwright-author writes happy path; multi-user-flow author covers cross-role; mobile auditor flags viewport issues.
- 5Reviewer agent checks output. Catches duplicated waits, hardcoded data, missing test isolation. Fix in place.
/plugin marketplace add neonwatty/qa-skills /plugin install qa-skills@neonwatty-qa # Prerequisite (one-time): npm install -g @playwright/cli@latest && playwright-cli install
"Use qa-skills pipeline to generate E2E tests for the 'admin user management' area of https://app.acme.com/admin. Output: page objects + spec, mobile-aware."
"Use qa-skills: generate tests for the 'team invitation' flow. Admin invites, invitee accepts via email link. Cover: 2 concurrent sessions, email deliverability mock, mobile-aware. Reviewer must pass before final output."
After implementation, spawns Claude as a senior reviewer checking: logic errors, security issues, performance traps, style consistency, missing tests. Returns structured findings instead of vague 'looks good'.
For QA who write automation, this is your shield against shipped automation rot: brittle selectors, unhandled flakiness, hardcoded waits, leaked credentials.
✓ What you get (5) · ▸ How to use (5 steps)
- →~2 hours/week saved on review cycles
- →Catches leaked secrets / hardcoded creds before commit (real incident-stopper)
- →Brittle locators flagged before they cause CI red runs
- →Test isolation issues surfaced (shared state, ordering deps)
- →Reviewer output structured: severity + line + suggested fix
- 1Generate / change code first. Claude writes the feature, test, or refactor. Don't pre-clean — let the reviewer find real problems.
- 2Invoke review explicitly. Name the focus areas: 'security', 'selectors', 'test isolation'. Without focus it's generic.
- 3Read the structured findings. Severity (critical / major / minor), file:line, what's wrong, suggested fix. Triage from top.
- 4Apply or defer per item. Don't blindly accept all suggestions — some are style nits. Ask Claude to apply the criticals.
- 5Re-review after fixes. Second pass should be clean. If new issues appear, that's the fix introducing problems.
/plugin install code-review@claude-plugins-official # Then on any PR branch: /code-review
"Use code-reviewer on the files I just created. Focus on selector resilience, test isolation, and credential handling."
"Use code-reviewer on the entire tests/e2e/ directory. Group findings by: secrets/credentials, flaky patterns, missing teardown, performance bottlenecks. Output as a markdown report I can paste into the PR description."
Reviews and rewrites your prompts to be specific, structured, tool-aware. Adds missing constraints (output format, length limits, refusal conditions), removes ambiguity, splits into XML sections when needed.
QA prompts for AI test generation routinely produce garbage because they say 'write me tests for the login page' instead of specifying: framework, locator strategy, data setup, what NOT to test.
✓ What you get (4) · ▸ How to use (5 steps)
- →Prompts produce consistent output across runs — fewer 'works once' surprises
- →Token cost down 20-40% — explicit constraints stop AI from over-generating
- →Easier to debug — when output is wrong, you know which constraint was violated
- →Library of reusable prompts — once refined, save as /command or skill
- 1Bring your draft prompt. The one that produced inconsistent results. The messier the better — that's the learning material.
- 2Let prompt-engineer diagnose. It surfaces: ambiguous terms, missing format spec, unbounded length, conflicting instructions.
- 3Review the rewrite. Look at the structure: XML sections, explicit constraints, refusal conditions. Adjust tone if too formal.
- 4A/B test. Run old vs new prompt on 3-5 sample inputs. Compare consistency, not just one output.
- 5Save as a /command or skill. Don't lose the refinement. Save to ~/.claude/commands/ or create a SKILL.md for team sharing.
# Easiest: npx prompt-architect install # Or browse the marketplace and search: /plugin # (search "prompt-architect" or "prompt-engineer" and pick top-rated)
"Use prompt-engineer to improve this prompt I run nightly: 'review last 24h of bug reports and find duplicates'. Make it cost-efficient and idempotent."
"Use prompt-engineer to design a prompt for a GitHub Action that runs on every PR. Goal: detect flaky tests by analyzing the diff + last 100 CI runs. Constraints: must complete in <2 min, output structured comment, refuse if no test files changed."
After a feature or fix, spawns 3 parallel review agents — code-reuse scanner, quality reviewer, efficiency reviewer — aggregates findings, then applies the agreed fixes. One command, multiple perspectives, real cleanup.
Run after generating a Playwright spec to: collapse duplicated locators into a shared file, extract common waits into helpers, normalize assertion patterns.
✓ What you get (5) · ▸ How to use (5 steps)
- →Duplicated locators / waits / assertions collapsed into helpers automatically
- →Bundle size shrinks 10-30% on typical test suites
- →Pattern consistency across files — every spec uses same fixture style
- →Dead code removed without manual hunting
- →All in one command, no per-file babysitting
- 1Finish the feature first. Don't run /simplify on incomplete code — review agents need the full picture to spot reuse.
- 2Run scoped, not global. /simplify tests/e2e/checkout/ is faster + safer than /simplify on the whole repo. Iterate per area.
- 3Review the proposed changes. Three agents agree → high-confidence change. Two of three → review carefully. One vote → likely false positive.
- 4Run tests after applying. /simplify changes code structure. Always verify the suite still passes before commit.
- 5Commit each /simplify run separately. Makes review + rollback easy. Don't bundle simplify with feature work in one commit.
Already available — just type /simplify
# After Claude scaffolds a new test suite /simplify # Or scoped: /simplify tests/e2e/checkout/
# Sequence for safe refactor /simplify tests/e2e/cart/ /verify tests/e2e/cart/ # confirms suite still passes git commit -m "refactor(tests): simplify cart specs"
Teaches Claude to manage 5 context layers — Identity, Capability, Knowledge, Memory, Observation — and to compress proactively when nearing token limits. Mitigates 'lost in the middle' on long QA sessions.
QA sessions are long: 50+ test files, multiple PRs, Jira tickets, Playwright traces. Without context engineering, Claude starts hallucinating selectors from earlier in the chat.
✓ What you get (5) · ▸ How to use (5 steps)
- →4-hour+ sessions stay coherent — no 'I forgot we renamed that' moments
- →30%+ token savings on long agents — compression keeps cost down
- →Accurate cross-file references — Claude actually remembers what it touched
- →Identity stays stable — assistant doesn't drift from QA persona to generic helper
- →Critical for CI bots and scheduled agents that run for hours
- 1Define identity at session start. 'You are a senior QA reviewing 3 PRs.' Identity = persistent role across turns.
- 2Pre-load knowledge. Drop in the artifacts: PR diffs, Jira tickets, design docs. Skill stores them as referenceable knowledge.
- 3Pin memory (your style). Coding style, naming conventions, your team's testing patterns. Skill keeps this stable across compressions.
- 4Set compression cadence. Every N turns (default 20), skill compresses old observations into a summary, keeps identity/knowledge/memory intact.
- 5Watch for drift signals. If Claude starts to repeat itself or contradict earlier output, that's compression too aggressive. Tune cadence up.
/plugin marketplace add muratcankoylan/Agent-Skills-for-Context-Engineering /plugin install context-engineering@context-engineering-marketplace
"Use context-engineering. Identity = senior QA reviewing 3 PRs. Knowledge = these 3 PR URLs. Memory = my coding style. Compress observations every 20 turns."
"Use context-engineering for a 2-hour autonomous agent. Identity = nightly flaky-test analyzer. Knowledge = last 100 CI runs + test file index. Memory = our flaky pattern taxonomy. Compress every 30 turns, preserve numeric findings across compression."
Strips AI tells from prose: filler phrases, throat-clearing openers, binary contrasts (it's not X — it's Y), passive voice. 7 core rules + a pre-publish checklist. 5,000+ GitHub stars.
QA writes a lot of prose: bug reports, test status updates, sprint retros, post-mortems. AI-drafted versions read like ChatGPT vomit and undermine the message.
✓ What you get (5) · ▸ How to use (5 steps)
- →Bug reports actually get read — dev pick-up rate goes up
- →Sprint retros sound human — stakeholders trust the message
- →Post-mortems land impact — no 'sounds like ChatGPT wrote this' deflection
- →Saves 30+ minutes of manual edit per long doc
- →Removes 7 specific AI tells: filler, throat-clearing, passive, binary contrasts, hedge phrases, jargon, repetition
- 1Draft normally. Let Claude or yourself produce a first draft without worrying about tone. Speed first, polish later.
- 2Invoke stop-slop on the draft. Paste or reference the file. Tell it what to PRESERVE (numbers, ticket IDs, names) so it doesn't over-edit.
- 3Review the diff. Skill shows what it cut + why. If it removed something meaningful, push back: 'keep the line about X'.
- 4Run the 7-point checklist. Final pass: any 'it's not X — it's Y' constructions? Any 'in essence', 'fundamentally', 'comprehensive'? Strip.
- 5Ship. Output should read like you wrote it after coffee, not after 14 plugins.
# stop-slop is a standalone SKILL.md (no marketplace entry yet) git clone https://github.com/hardikpandya/stop-slop.git mkdir -p ~/.claude/skills/stop-slop cp -r stop-slop/* ~/.claude/skills/stop-slop/
"Use stop-slop on the retro draft. Keep the structure, cut the AI tells, preserve numbers and tickets."
"Use Atlassian MCP to fetch incident INC-4421 + linked tickets. Draft a 5-paragraph post-mortem (timeline, root cause, contributing factors, what worked, action items). Then run stop-slop. Preserve all timestamps and metric numbers. Output ready to paste in Confluence."
The real productivity unlock is in the combos. Each row below is a tested chain — paste the prompt, let Claude run the sequence.
Triage → Fix → Verify
When: When a flaky test or production bug lands.
1. brainstorming → 2. systematic-debugging → 3. TDD → 4. code-reviewer → 5. /simplify
New E2E suite
When: Scaffolding tests for a fresh area of the app.
1. brainstorming → 2. qa-skills (pipeline) → 3. code-reviewer → 4. /simplify → 5. playwright-skill (verify run)
Sprint retro from Jira
When: Weekly / per-sprint, semi-automated.
1. Atlassian MCP (fetch sprint) → 2. context-engineering (set role + knowledge) → 3. draft → 4. stop-slop
Community surveys put time-saved from the top combos at ~5 h/week per engineer (TDD + debugging + code-review + /simplify, used consistently).¹ Worked example:
Skills install is free. Tokens cost ~$30-60/eng/month. Net positive in week 1 for most teams.
Skills vs MCP servers vs slash commands — what's the difference?
Skills (SKILL.md) teach Claude how to think about a task (e.g. 'always reproduce before fixing'). MCP servers give Claude new tools to act on the world (browse, read DBs, manage PRs). Slash commands are saved prompts you trigger manually. Best results come from stacking all three: a skill for the discipline, an MCP for the I/O, a slash command for the entry point.
Do I need to install all 10? My context window is finite.
No. Claude only loads a skill when its description matches the task — they don't all sit in context. Install them all; the runtime is selective. The exception is /simplify (always available, free) and superpowers (often always-on for process discipline).
Will these skills work in Cursor / Codex / Windsurf too?
Yes if the skill ships in the SKILL.md format (open standard). Most do. Anthropic-bundled slash commands (/simplify, /batch) are Claude Code only.
Can I write my own QA skill?
Yes — a skill is a directory with a YAML+markdown SKILL.md file under ~500 lines. Use the skill-creator skill to bootstrap it. Common custom QA skills: company-specific test strategy, bug-report-template (your team's exact format), release-notes-format.
¹ Install counts and time-saved figures are aggregated from public 2026 community surveys (Agensi, Composio, Developers Digest) and marketplace metadata. They fluctuate weekly; treat as relative signals, not exact promises. Verify the current install command in each linked repo before relying on it for production tooling — marketplace names and skill versions evolve.