What You’ll Learn
- How to send production conversations to Bluejay for evaluation
- What gets captured and scored automatically
- How to use results for dashboards, alerts, and continuous improvement
How Observability Works
Use Observability to evaluate production calls, inspect transcripts and traces, review metrics, and identify where your agent is drifting from the experience you intended to ship. You can send conversations to Bluejay via the evaluate API, webhooks, or native integrations with providers like Retell, Vapi, Bland, and ElevenLabs. Bluejay accepts audio recordings, transcripts, or both. Every conversation is evaluated against your Custom Metrics and the built-in hallucination and redundancy detectors.What Gets Captured
- Hallucination detection — identifies when agents provide incorrect or fabricated information
- Redundancy analysis — measures unnecessary repetition in agent responses
- Custom Metric scores — every metric you’ve defined is evaluated and scored
- Token and latency data — operational signals for understanding performance
- Full transcripts — stored and searchable for detailed investigation
Ingestion Methods
| Method | Best For |
|---|---|
| Evaluate API | Direct integration from your backend |
| Webhook ingestion | Platforms that support outbound webhooks |
| Retell integration | Retell-powered agents |
| Vapi integration | Vapi-powered agents |
| Bland integration | Bland-powered agents |
| ElevenLabs integration | ElevenLabs Conversational AI |
Next Steps
API Integration Tutorial
Step-by-step guide to connecting your pipeline.
Tool Calls
Include tool call data and metadata in evaluations.
Traces
Send OpenTelemetry traces to Bluejay.
Webhooks
Receive real-time notifications for evaluation events.