Skip to main content
Simulation alerts notify your team when test results fall outside the quality boundaries you’ve set, turning simulation runs into an automated safety net that flags regressions the moment they appear.

How Alerts Work

After a simulation run completes and conversations are scored against your Custom Metrics, Bluejay counts how many results have crossed a configured threshold within a rolling time window. The alert only fires when enough violations accumulate — so a single edge-case failure doesn’t trigger noise, but a real regression does.

Configuration

FieldDescriptionExample
MetricThe Custom Metric or built-in metric to monitorGoal Completion Rate
ConditionWhether to alert when the score is above or below the boundaryBelow
ThresholdThe numeric boundary that counts as a violation85%
OccurrencesHow many violations must occur before the alert fires3
Time WindowThe rolling interval over which violations are counted30 minutes

Example: Catching Latency Regressions

An alert configured to fire when average agent latency exceeds 3 seconds at least 5 times within 30 minutes:
  • A run where 2 out of 10 conversations exceed 3 seconds does not trigger — only 2 of the required 5 occurrences.
  • A second run within the same window adds 4 more violations, reaching 6 total — the alert fires and your team gets notified.
  • If 45 minutes pass with only 3 violations, the earlier ones roll out of the window — no alert.
For zero-tolerance metrics like hallucination detection, set occurrences to 1 so the alert fires on the first violation in testing.

Routing to Slack

Connect Bluejay to your Slack workspace through the Slack integration and route alerts to specific channels:
  • Engineering — latency regressions, broken tool call flows, model performance drops
  • QA — hallucination detections, low pass rates, redundancy spikes
  • Release management — quality gate failures that block a deployment

Common Configurations

ScenarioOccurrencesTime Window
Hallucination in testing160 min
Goal completion regression330 min
Latency regression530 min
Compliance failure160 min
Tool call accuracy drop330 min

Common Use Cases

  • Regression detection — alert when scores drop after a prompt or model change so you can investigate before shipping
  • CI/CD quality gates — pair alerts with GitHub Actions for both human notification and automated PR blocking
  • Feature validation — set tight thresholds on feature-specific metrics when launching new agent capabilities
  • Compliance checks — use occurrences of 1 for regulatory metrics where any single failure must be caught

Best Practices

  • Set strict thresholds — simulations run controlled scenarios, so thresholds can be tighter than production
  • Use alerts alongside CI checks — alerts notify humans; CI checks block merges. Use both for defense in depth
  • Create feature-specific alerts — dedicated alerts for new capabilities prevent regressions from hiding behind overall pass rates
  • Raise the bar over time — tighten thresholds as your agent improves so alerts continue to catch meaningful regressions

Next Steps

https://mintcdn.com/bluejay/nHYhgnFQxow8fCGw/logo/slack-blue.svg?fit=max&auto=format&n=nHYhgnFQxow8fCGw&q=85&s=e45468beecd5a17a70215a430050f9c9

Slack Integration

Connect Bluejay to Slack for real-time alert delivery.

Custom Metrics

Define the metrics that power your alert thresholds.

Simulation Dashboards

Visualize simulation trends alongside your alerts.

GitHub Actions

Automate simulation runs and quality gates in CI.