DTMF tool for Digital Humans
Digital humans can now be configured to send DTMF tones during a call via a newallow_dtmf_tool field, aligned with how allow_end_call_tool and allow_silence_tool are stored and returned. Create and bulk-create default to allow_dtmf_tool: true when omitted; updates treat the field as optional (omit to leave unchanged). OpenAPI schemas DigitalHumanRequestData, DigitalHumanResponseData, and UpdateDigitalHumanRequest include the new property.Create digital human · Update digital humanRuns per Digital Human
Simulations now support executing multiple runs for each selected digital human in a single batch. Setruns_per_digital_human on the simulation (persisted in experiments.settings) to control the default, or pass an explicit per-run override when queuing a run. Values must be positive integers; when unset, each digital human runs once. The Bluejay dashboard exposes the new control on the simulation settings page and the “Create simulation” and “Create new run” dialogs.Simulations overviewDigital Human intelligence upgrade
We upgraded the reasoning quality of digital humans during both simulations and observability replays. Expect more coherent multi-turn behavior, better adherence to persona and objectives, and steadier tool-use decisions across longer conversations. No configuration changes are required — the upgrade applies automatically to all digital humans.Digital humans overviewScripted silence in workflows
When a digital human is running in workflow mode, the silence tool now evaluates only at end-of-turn instead of firing from timeout-based checks mid-turn. This removes a class of false silence triggers on scripted workflow steps and keeps silence decisions aligned with the workflow’s turn boundaries. Non-workflow digital humans are unchanged.Workflows overviewWorkflow latency fix
Fixed a latency regression in workflow-mode conversations where session-handler state could delay turn transitions. Workflow runs now advance between agent and user turns without the extra wait, improving perceived responsiveness on branching paths.Workflows overviewDigital Human: silence tool fields
Digital human create, read, update, delete, and bulk APIs now exposeallow_silence_tool and silence_tool_instructions, aligned with how allow_end_call_tool and hangup_instructions are stored and returned. Create and bulk-create default to allow_silence_tool: false and silence_tool_instructions: "default" when omitted; updates treat fields as optional (omit to leave unchanged). OpenAPI schemas DigitalHumanRequestData, DigitalHumanResponseData, and UpdateDigitalHumanRequest include the new properties.Create digital human · Update digital humanRedesigned Workflows
Workflows are a structured way to test your agent along a conversation path.- Agent and user turns — you define what the agent should say or do (so simulations can check it) and what the digital human says on each step.
- Branching — options nodes capture different things the caller might do next (speech, DTMF, silence).
- Coverage — each distinct path through the graph becomes its own digital human, so every branch gets exercised.
- Docs — cookbook and API reference refreshed; older workflow endpoints are under Deprecated.
Revamped documentation
We completely overhauled the Bluejay docs with improved navigation, expanded guides, and new content across the board. Highlights include:- Restructured navigation with dedicated tabs for Documentation, API Reference, Bluejay University, and Changelog
- Bluejay University — a new learning track with guided lessons on simulations, observability, metrics, and the API
- Expanded integration guides covering all supported providers and simulation transports
- Cookbook recipes for common workflows like GitHub Actions CI, API-driven evaluations, and webhook setup
SMS simulation support
Bluejay now supports SMS-based simulations, enabling you to test text-based agent flows end-to-end. Configure SMS simulations the same way you configure voice simulations — define digital humans, set up custom metrics, and run batch evaluations against your SMS agent.SMS simulations support all existing integrations including telephony providers and HTTP webhooks.Read the docsThreshold alarms
Set threshold-based alarms on any custom metric to get alerted when agent performance degrades. Define upper or lower bounds, choose your notification channel (Slack, email, or webhook), and Bluejay will trigger alerts automatically when production metrics cross your thresholds.Alarms work across both observability and simulation metrics, so you can catch regressions in production and in testing.Read the docsFaster call log evaluation
We made significant performance improvements to the observability evaluation pipeline:- 3x faster evaluation for call logs with custom metrics
- Parallel metric execution — multiple custom metrics now evaluate concurrently instead of sequentially
- Reduced API latency for the
/evaluateand/re-evaluateendpoints by approximately 40%
Miro integration for simulation visualization
You can now connect Bluejay to Miro to automatically generate visual conversation flow diagrams from your simulation results. Each simulation run produces a Miro board showing the conversation tree, branching paths, and metric outcomes.Connect your Miro workspace from the Integrations page in your Bluejay dashboard.Read the docsWorkflow scheduling improvements
Workflows now support cron-based scheduling with finer granularity. You can schedule simulation runs and observability evaluations to execute at specific intervals — hourly, daily, or on a custom cron expression.Additional improvements include:- Retry logic for failed workflow steps
- Execution history with detailed step-level logs
- Webhook notifications on workflow completion or failure
Community-based simulation runs
You can now run simulations against an entire community of digital humans in a single batch. Previously, simulations ran against individual digital humans or manually selected groups. Community-based runs let you test your agent against a diverse, pre-configured population in one click.Combine communities with custom metrics to get aggregate performance scores across demographic segments, persona types, or behavioral profiles.Read the docsCustom metric formulas
Custom metrics now support formula-based definitions in addition to LLM-as-a-Judges. Define metrics using arithmetic expressions over existing metric scores, enabling composite metrics like weighted averages or pass/fail thresholds without writing evaluation prompts.Formula metrics evaluate instantly and do not consume LLM credits.Read the docsPipecat integration
Bluejay now integrates natively with Pipecat for running simulations against Pipecat-powered voice agents. Connect your Pipecat pipeline endpoint and Bluejay will handle session orchestration, audio transport, and evaluation.Read the docsKnowledge base versioning API
The new Knowledge Base API lets you manage versioned knowledge base snapshots for your agents. Create versions, apply labels, and roll back to previous versions — all through the API. Knowledge base versions integrate with simulations so you can A/B test agent behavior across different knowledge configurations.Read the docsDashboard redesign
We redesigned the Bluejay dashboard with a focus on surfacing actionable insights. The new layout includes:- At-a-glance health scores for each agent across simulation and production metrics
- Trend sparklines showing metric performance over time
- Alert badges highlighting agents that need attention
- Quick-launch actions for running simulations and viewing recent call logs
ElevenLabs observability integration
Bluejay now supports direct observability integration with ElevenLabs Conversational AI. Connect your ElevenLabs account to automatically ingest call logs, evaluate them with custom metrics, and surface quality issues in your dashboard.Read the docsDigital human generation improvements
The digital human generation engine has been upgraded with better persona diversity and more realistic conversational styles:- Expanded trait library with 40+ new customer traits including emotional tone, technical proficiency, and communication preferences
- Scenario-aware generation — digital humans now adapt their behavior based on the simulation scenario context
- Bulk generation — generate up to 100 digital humans in a single API call
Slack alerting integration
Connect Bluejay to Slack to receive real-time alerts when production metrics drop below thresholds or simulation runs complete. Configure per-channel routing so the right team gets the right alerts.Read the docsWebSocket simulation support
Bluejay now supports WebSocket-based simulations for testing real-time, bidirectional agent communication. Configure your WebSocket endpoint, define the message protocol, and run simulations with full transcript capture and metric evaluation.Read the docsPrompt versioning and labels
The Prompt API now supports versioning and labeling. Create multiple versions of a prompt, tag them with labels likeproduction or staging, and reference them by label in your agent configuration. Roll back to any previous version instantly.Read the docsWebhook-based log ingestion
You can now send call logs to Bluejay via webhook for evaluation. Configure a webhook endpoint in your dashboard, point your agent platform at it, and Bluejay will automatically ingest, evaluate, and store the results.This is the fastest way to get observability running if your platform isn’t covered by a native integration.Read the docsLiveKit simulation integration
Run simulations against LiveKit-powered voice agents. Bluejay connects to your LiveKit room, manages participant sessions, and captures full audio transcripts for evaluation.Read the docsMetrics Lab
Introducing Metrics Lab — an interactive environment for prototyping and testing custom metrics before deploying them. Write evaluation prompts, test them against sample transcripts, and iterate on scoring criteria without affecting production data.Read the docsFolder-based agent organization
Agents can now be organized into folders for better workspace management. Create folders, move agents between them, and filter your agent list by folder. Folders are available in both the dashboard and the API.Read the docsRetell observability integration
Bluejay now integrates directly with Retell for production call monitoring. Connect your Retell account to automatically pull call logs, run evaluations, and track agent performance over time.Read the docsVapi observability integration
Bluejay now integrates with Vapi for production call monitoring. Connect your Vapi account to automatically ingest call logs, run custom metric evaluations, and track agent quality over time.Read the docsBland observability integration
You can now connect Bluejay to Bland for production observability. Call logs from your Bland-powered agents are automatically ingested, evaluated against your custom metrics, and surfaced in the dashboard.Read the docsCommunities and workflows API endpoints
New API endpoint groups for managing communities and workflows:- Communities — create, update, add members, list, and delete communities programmatically
- Workflows — define, schedule, and manage automation workflows through the API
SIP simulation integration
Bluejay now supports SIP-based simulations. Connect your SIP trunk and run simulations directly over the SIP protocol, enabling testing for enterprise telephony deployments and contact center agents.Read the docsTelephony simulation support
Run simulations over PSTN by connecting your telephony provider to Bluejay. Dial into your agent’s phone number, capture the full conversation, and evaluate it with custom metrics — all without changing your agent’s infrastructure.Read the docsObservability and evaluation API endpoints
New API endpoint groups for observability and evaluation workflows:- Observability — evaluate and re-evaluate call logs, manage call log lifecycle
- Custom Metrics — create, bulk-create, update, list, and delete custom metrics via API
Digital humans and simulation runs API endpoints
New API endpoint groups for simulation orchestration:- Digital Humans — create, generate, update, list, and manage digital human personas
- Simulation Runs — queue voice and SMS runs, retrieve results, and manage active conversations
Agents and simulations API endpoints
The first set of public API endpoints is now available:- Agents — create, update, list, move, and delete agents
- Simulations — create, configure, list, and manage simulations programmatically