Generate a Synthetic Monitoring Scenario with AI
Reads a critical user journey and returns a Playwright-based synthetic monitoring script with business-step checkpoints, failure-screenshot capture, an alerting threshold tied to a stated SLO/SLI, and a recommended run frequency.
When to use it
- Setting up uptime monitoring for a customer-facing critical path.
- Migrating from a click-record-replay tool (Pingdom) to code-based synthetic monitoring.
- Tying synthetic checks to SLOs so alert noise has business meaning.
- Adding a smoke test that runs post-deploy in production safely.
The prompt
XML-tagged — best for Claude 4.x
<role>
You are an SRE who writes synthetic monitoring as code. You distinguish synthetic monitoring (probes that fire periodically against production) from E2E tests (verify behavior pre-release). You map every alert to an SLO so noise has business meaning.
</role>
<context>
Synthetic monitoring runs as code in production. Each run is fast (< 30s), captures evidence on failure (screenshot, console, network), and pages a human only when a stated SLO is breached. Tools: Datadog Synthetic, Checkly, Grafana Synthetic Monitoring, custom Playwright + cron.
</context>
<task>
For the journey below, produce:
1. A Playwright synthetic monitoring script (TypeScript) that walks the journey
2. Business-step CHECKPOINTS — each business milestone (logged in, item added, payment shown) is an assertion, not an implementation detail
3. Failure capture — on failure, take screenshot, save trace, log console
4. SLO mapping — alert ONLY when failure rate crosses a threshold tied to a stated SLO (e.g., 99.9% means 4.3 min/month downtime budget)
5. Recommended frequency — based on the SLO budget and detection-time goal
</task>
<input>
Critical user journey description: {journey}
SLO target (e.g., 99.9% uptime, 95% successful purchases): {slo}
What's "down" mean for this journey (any step fails, only checkout fails, etc.): {down_definition}
</input>
<constraints>
- Checkpoints are BUSINESS milestones, not "step 1 / step 2".
- SLO mapping is explicit (target % → budget in minutes/month → alerting rule).
- Frequency tied to detection-time goal (e.g., 5-min frequency = max 10-min detection of breach).
- Failure capture is automatic, not optional.
- No `page.waitForTimeout` — use auto-waiting.
- Production-safe: no destructive operations (no real charges, no real account deletion).
</constraints>
<output_format>
Four sections:
1. **Playwright script** — TypeScript code block
2. **SLO mapping** — table: SLO target | Budget | Alert threshold | Page-after-N-failures rule
3. **Frequency recommendation** — paragraph with reasoning
4. **Production-safety notes** — bullets on what NOT to do (real charges, etc.)
</output_format>
Before writing, identify the 3-5 business milestones in the journey that an alert SHOULD fire on if they fail.Example
Common pitfalls
- Model produces an E2E test, not a synthetic monitor — E2E asserts UI details that don't matter for SLO. Force business-milestone checkpoints.
- SLO mapping gets glossed over with 'alert on failures'. Demand the budget math.
- Frequency defaults to '1 min' without considering cost or SLO match — re-prompt for justified frequency.
- Production safety gets forgotten — without a synthetic promo code and account, the monitor will create real orders.
Tips
- Tag synthetic traffic with a header so it's excluded from RUM/analytics dashboards.
- Run from multiple geos — single-geo monitors miss regional CDN / network issues.
- Pair with real-user monitoring (RUM) — synthetic catches uptime; RUM catches performance variance from real client conditions.
- Re-evaluate SLO and frequency quarterly; they drift as traffic patterns change.
FAQ
Both. Synthetic provides predictable uptime signal at known intervals; RUM provides truth from real users. Synthetic catches outages; RUM catches degradation. Most teams should have synthetic for critical paths + RUM for everything else.
Related prompts
Design a Load Test Strategy
Returns a load test strategy covering 5 scenario types (baseline / load / stress / spike / soak) with thresholds for response time, throughput, and error rate, environment requirements, monitoring checkpoints, and pass/fail criteria — and explicit environment-parity statement.
Open →Generate Playwright Page Object Model
Give the model a page description plus a list of UI elements and it returns a complete Page Object Model in TypeScript using Playwright's auto-waiting locators (getByRole / getByTestId), typed action and assertion methods, and a page-level fixture.
Open →Analyze Performance Bottlenecks from Results
Reads a load test result summary (latency percentiles, throughput, error rate, system metrics) and returns a ranked list of suspected bottleneck layers — network, application, database, dependent service, or infrastructure — each with evidence cited from the metrics and a recommended next investigation step.
Open →Create API Test Suite from OpenAPI Spec
Reads an OpenAPI 3.x specification and returns an API test suite that validates response schemas per documented status code, covers authentication, pagination, filtering, and the standard error responses (400, 401, 403, 404, 429, 500). Output is framework-agnostic plan plus Playwright APIRequestContext skeleton.
Open →