Skip to main content
Observability gives you visibility into live customer interactions so you can understand quality, reliability, and operational health in the real world. It complements testing by showing what actually happens after deployment.

What You’ll Learn

  • How to send production conversations to Bluejay for evaluation
  • What gets captured and scored automatically
  • How to use results for dashboards, alerts, and continuous improvement

How Observability Works

Use Observability to evaluate production calls, inspect transcripts and traces, review metrics, and identify where your agent is drifting from the experience you intended to ship. You can send conversations to Bluejay via the evaluate API, webhooks, or native integrations with providers like Retell, Vapi, Bland, and ElevenLabs. Bluejay accepts audio recordings, transcripts, or both. Every conversation is evaluated against your Custom Metrics and the built-in hallucination and redundancy detectors.

What Gets Captured

  • Hallucination detection — identifies when agents provide incorrect or fabricated information
  • Redundancy analysis — measures unnecessary repetition in agent responses
  • Custom Metric scores — every metric you’ve defined is evaluated and scored
  • Token and latency data — operational signals for understanding performance
  • Full transcripts — stored and searchable for detailed investigation

Ingestion Methods

MethodBest For
Evaluate APIDirect integration from your backend
Webhook ingestionPlatforms that support outbound webhooks
Retell integrationRetell-powered agents
Vapi integrationVapi-powered agents
Bland integrationBland-powered agents
ElevenLabs integrationElevenLabs Conversational AI

Next Steps

API Integration Tutorial

Step-by-step guide to connecting your pipeline.

Tool Calls

Include tool call data and metadata in evaluations.

Traces

Send OpenTelemetry traces to Bluejay.

Webhooks

Receive real-time notifications for evaluation events.