Changelog - Bluejay Docs

April 23rd, 2026

NewDigital Humans

DTMF tool for Digital Humans

Digital humans can now be configured to send DTMF tones during a call via a new allow_dtmf_tool field, aligned with how allow_end_call_tool and allow_silence_tool are stored and returned. Create and bulk-create default to allow_dtmf_tool: true when omitted; updates treat the field as optional (omit to leave unchanged). OpenAPI schemas DigitalHumanRequestData, DigitalHumanResponseData, and UpdateDigitalHumanRequest include the new property.Create digital human · Update digital human

April 23rd, 2026

NewSimulations

Runs per Digital Human

Simulations now support executing multiple runs for each selected digital human in a single batch. Set runs_per_digital_human on the simulation (persisted in experiments.settings) to control the default, or pass an explicit per-run override when queuing a run. Values must be positive integers; when unset, each digital human runs once. The Bluejay dashboard exposes the new control on the simulation settings page and the “Create simulation” and “Create new run” dialogs.Simulations overview

April 23rd, 2026

ImprovementDigital Humans

Digital Human intelligence upgrade

We upgraded the reasoning quality of digital humans during both simulations and observability replays. Expect more coherent multi-turn behavior, better adherence to persona and objectives, and steadier tool-use decisions across longer conversations. No configuration changes are required — the upgrade applies automatically to all digital humans.Digital humans overview

April 23rd, 2026

ImprovementWorkflows

Scripted silence in workflows

When a digital human is running in workflow mode, the silence tool now evaluates only at end-of-turn instead of firing from timeout-based checks mid-turn. This removes a class of false silence triggers on scripted workflow steps and keeps silence decisions aligned with the workflow’s turn boundaries. Non-workflow digital humans are unchanged.Workflows overview

April 23rd, 2026

PerformanceWorkflows

Workflow latency fix

Fixed a latency regression in workflow-mode conversations where session-handler state could delay turn transitions. Workflow runs now advance between agent and user turns without the extra wait, improving perceived responsiveness on branching paths.Workflows overview

April 14th, 2026

ImprovementAPI

Digital Human: silence tool fields

Digital human create, read, update, delete, and bulk APIs now expose allow_silence_tool and silence_tool_instructions, aligned with how allow_end_call_tool and hangup_instructions are stored and returned. Create and bulk-create default to allow_silence_tool: false and silence_tool_instructions: "default" when omitted; updates treat fields as optional (omit to leave unchanged). OpenAPI schemas DigitalHumanRequestData, DigitalHumanResponseData, and UpdateDigitalHumanRequest include the new properties.Create digital human · Update digital human

April 3rd, 2026

ImprovementAPI

Redesigned Workflows

Workflows are a structured way to test your agent along a conversation path.

Agent and user turns — you define what the agent should say or do (so simulations can check it) and what the digital human says on each step.
Branching — options nodes capture different things the caller might do next (speech, DTMF, silence).
Coverage — each distinct path through the graph becomes its own digital human, so every branch gets exercised.
Docs — cookbook and API reference refreshed; older workflow endpoints are under Deprecated.

Create workflow · Cookbook

March 22nd, 2026

ImprovementDocs

Revamped documentation

We completely overhauled the Bluejay docs with improved navigation, expanded guides, and new content across the board. Highlights include:

Restructured navigation with dedicated tabs for Documentation, API Reference, Bluejay University, and Changelog
Bluejay University — a new learning track with guided lessons on simulations, observability, metrics, and the API
Expanded integration guides covering all supported providers and simulation transports
Cookbook recipes for common workflows like GitHub Actions CI, API-driven evaluations, and webhook setup

Explore the docs

March 20th, 2026

NewSimulations

SMS simulation support

Bluejay now supports SMS-based simulations, enabling you to test text-based agent flows end-to-end. Configure SMS simulations the same way you configure voice simulations — define digital humans, set up custom metrics, and run batch evaluations against your SMS agent.SMS simulations support all existing integrations including telephony providers and HTTP webhooks.Read the docs

March 15th, 2026

NewAlerts

Threshold alarms

Set threshold-based alarms on any custom metric to get alerted when agent performance degrades. Define upper or lower bounds, choose your notification channel (Slack, email, or webhook), and Bluejay will trigger alerts automatically when production metrics cross your thresholds.Alarms work across both observability and simulation metrics, so you can catch regressions in production and in testing.Read the docs

March 14th, 2026

PerformanceObservability

Faster call log evaluation

We made significant performance improvements to the observability evaluation pipeline:

3x faster evaluation for call logs with custom metrics
Parallel metric execution — multiple custom metrics now evaluate concurrently instead of sequentially
Reduced API latency for the /evaluate and /re-evaluate endpoints by approximately 40%

These improvements apply automatically to all existing observability configurations.Read the docs

March 10th, 2026

NewIntegration

Miro integration for simulation visualization

You can now connect Bluejay to Miro to automatically generate visual conversation flow diagrams from your simulation results. Each simulation run produces a Miro board showing the conversation tree, branching paths, and metric outcomes.Connect your Miro workspace from the Integrations page in your Bluejay dashboard.Read the docs

March 3rd, 2026

ImprovementWorkflows

Workflow scheduling improvements

Workflows now support cron-based scheduling with finer granularity. You can schedule simulation runs and observability evaluations to execute at specific intervals — hourly, daily, or on a custom cron expression.Additional improvements include:

Retry logic for failed workflow steps
Execution history with detailed step-level logs
Webhook notifications on workflow completion or failure

Read the docs

February 24th, 2026

NewSimulations

Community-based simulation runs

You can now run simulations against an entire community of digital humans in a single batch. Previously, simulations ran against individual digital humans or manually selected groups. Community-based runs let you test your agent against a diverse, pre-configured population in one click.Combine communities with custom metrics to get aggregate performance scores across demographic segments, persona types, or behavioral profiles.Read the docs

February 17th, 2026

ImprovementMetrics

Custom metric formulas

Custom metrics now support formula-based definitions in addition to LLM-as-a-Judges. Define metrics using arithmetic expressions over existing metric scores, enabling composite metrics like weighted averages or pass/fail thresholds without writing evaluation prompts.Formula metrics evaluate instantly and do not consume LLM credits.Read the docs

February 10th, 2026

NewIntegration

Pipecat integration

Bluejay now integrates natively with Pipecat for running simulations against Pipecat-powered voice agents. Connect your Pipecat pipeline endpoint and Bluejay will handle session orchestration, audio transport, and evaluation.Read the docs

February 3rd, 2026

NewAPI

Knowledge base versioning API

The new Knowledge Base API lets you manage versioned knowledge base snapshots for your agents. Create versions, apply labels, and roll back to previous versions — all through the API. Knowledge base versions integrate with simulations so you can A/B test agent behavior across different knowledge configurations.Read the docs

January 27th, 2026

ImprovementDashboard

Dashboard redesign

We redesigned the Bluejay dashboard with a focus on surfacing actionable insights. The new layout includes:

At-a-glance health scores for each agent across simulation and production metrics
Trend sparklines showing metric performance over time
Alert badges highlighting agents that need attention
Quick-launch actions for running simulations and viewing recent call logs

The redesign is live for all users.Read the docs

January 20th, 2026

NewIntegration

ElevenLabs observability integration

Bluejay now supports direct observability integration with ElevenLabs Conversational AI. Connect your ElevenLabs account to automatically ingest call logs, evaluate them with custom metrics, and surface quality issues in your dashboard.Read the docs

January 13th, 2026

ImprovementSimulations

Digital human generation improvements

The digital human generation engine has been upgraded with better persona diversity and more realistic conversational styles:

Expanded trait library with 40+ new customer traits including emotional tone, technical proficiency, and communication preferences
Scenario-aware generation — digital humans now adapt their behavior based on the simulation scenario context
Bulk generation — generate up to 100 digital humans in a single API call

Read the docs

January 6th, 2026

NewIntegration

Slack alerting integration

Connect Bluejay to Slack to receive real-time alerts when production metrics drop below thresholds or simulation runs complete. Configure per-channel routing so the right team gets the right alerts.Read the docs

December 16th, 2025

NewSimulations

WebSocket simulation support

Bluejay now supports WebSocket-based simulations for testing real-time, bidirectional agent communication. Configure your WebSocket endpoint, define the message protocol, and run simulations with full transcript capture and metric evaluation.Read the docs

December 9th, 2025

ImprovementAPI

Prompt versioning and labels

The Prompt API now supports versioning and labeling. Create multiple versions of a prompt, tag them with labels like production or staging, and reference them by label in your agent configuration. Roll back to any previous version instantly.Read the docs

December 2nd, 2025

NewObservability

Webhook-based log ingestion

You can now send call logs to Bluejay via webhook for evaluation. Configure a webhook endpoint in your dashboard, point your agent platform at it, and Bluejay will automatically ingest, evaluate, and store the results.This is the fastest way to get observability running if your platform isn’t covered by a native integration.Read the docs

November 25th, 2025

NewIntegration

LiveKit simulation integration

Run simulations against LiveKit-powered voice agents. Bluejay connects to your LiveKit room, manages participant sessions, and captures full audio transcripts for evaluation.Read the docs

November 18th, 2025

NewMetrics

Metrics Lab

Introducing Metrics Lab — an interactive environment for prototyping and testing custom metrics before deploying them. Write evaluation prompts, test them against sample transcripts, and iterate on scoring criteria without affecting production data.Read the docs

November 11th, 2025

ImprovementDashboard

Folder-based agent organization

Agents can now be organized into folders for better workspace management. Create folders, move agents between them, and filter your agent list by folder. Folders are available in both the dashboard and the API.Read the docs

November 4th, 2025

NewIntegration

Retell observability integration

Bluejay now integrates directly with Retell for production call monitoring. Connect your Retell account to automatically pull call logs, run evaluations, and track agent performance over time.Read the docs

October 21st, 2025

NewIntegration

Vapi observability integration

Bluejay now integrates with Vapi for production call monitoring. Connect your Vapi account to automatically ingest call logs, run custom metric evaluations, and track agent quality over time.Read the docs

October 7th, 2025

NewIntegration

Bland observability integration

You can now connect Bluejay to Bland for production observability. Call logs from your Bland-powered agents are automatically ingested, evaluated against your custom metrics, and surfaced in the dashboard.Read the docs

September 29th, 2025

NewAPI

Communities and workflows API endpoints

New API endpoint groups for managing communities and workflows:

Communities — create, update, add members, list, and delete communities programmatically
Workflows — define, schedule, and manage automation workflows through the API

Read the docs

September 22nd, 2025

NewIntegration

SIP simulation integration

Bluejay now supports SIP-based simulations. Connect your SIP trunk and run simulations directly over the SIP protocol, enabling testing for enterprise telephony deployments and contact center agents.Read the docs

September 15th, 2025

NewIntegration

Telephony simulation support

Run simulations over PSTN by connecting your telephony provider to Bluejay. Dial into your agent’s phone number, capture the full conversation, and evaluate it with custom metrics — all without changing your agent’s infrastructure.Read the docs

September 1st, 2025

NewAPI

Observability and evaluation API endpoints

New API endpoint groups for observability and evaluation workflows:

Observability — evaluate and re-evaluate call logs, manage call log lifecycle
Custom Metrics — create, bulk-create, update, list, and delete custom metrics via API

Read the docs

August 18th, 2025

NewAPI

Digital humans and simulation runs API endpoints

New API endpoint groups for simulation orchestration:

Digital Humans — create, generate, update, list, and manage digital human personas
Simulation Runs — queue voice and SMS runs, retrieve results, and manage active conversations

Read the docs

August 4th, 2025

NewAPI

Agents and simulations API endpoints

The first set of public API endpoints is now available:

Agents — create, update, list, move, and delete agents
Simulations — create, configure, list, and manage simulations programmatically

These endpoints form the foundation of the Bluejay API and enable full automation of your testing pipeline.Read the docs

July 21st, 2025

NewSimulations

Digital human personas

Introducing digital humans — synthetic customer personas that power Bluejay simulations. Define demographic profiles, personality traits, communication styles, and scenario-specific behaviors to create realistic test conversations at scale.Read the docs

July 7th, 2025

NewSimulations

Simulation engine

The Bluejay simulation engine is live. Run synthetic conversations against your voice agents to validate behavior before production. Define scenarios, assign digital humans, and evaluate performance with custom metrics.Read the docs

June 23rd, 2025

NewObservability

Observability pipeline

Bluejay’s observability pipeline is now available. Ingest production call logs, evaluate them against custom metrics, and surface quality trends in the dashboard. Supports both API-based and webhook-based log ingestion.Read the docs

June 9th, 2025

NewMetrics

Custom metrics engine

Define custom evaluation criteria tailored to your use case. Bluejay’s custom metrics engine supports LLM-as-a-Judge evaluations with configurable scoring rubrics, pass/fail thresholds, and dynamic variables that adapt to conversation context.Read the docs

May 19th, 2025

NewDashboard

Agent management and dashboard

The Bluejay dashboard is live. Create and manage your conversational AI agents, view performance summaries, and navigate your workspace from a central hub.Read the docs

April 28th, 2025

NewPlatform

Evaluation framework

Bluejay’s evaluation framework is ready. Score agent conversations using structured rubrics, capture per-turn and per-call metrics, and generate evaluation reports. This framework underpins both simulation testing and production observability.

April 7th, 2025

NewPlatform

Core platform infrastructure

The foundational Bluejay platform is up and running — authentication, workspace management, and the base API layer. This milestone sets the stage for all product features to follow.

Product Updates

​DTMF tool for Digital Humans

​Runs per Digital Human

​Digital Human intelligence upgrade

​Scripted silence in workflows

​Workflow latency fix

​Digital Human: silence tool fields

​Redesigned Workflows

​Revamped documentation

​SMS simulation support

​Threshold alarms

​Faster call log evaluation

​Miro integration for simulation visualization

​Workflow scheduling improvements

​Community-based simulation runs

​Custom metric formulas

​Pipecat integration

​Knowledge base versioning API

​Dashboard redesign

​ElevenLabs observability integration

​Digital human generation improvements

​Slack alerting integration

​WebSocket simulation support

​Prompt versioning and labels

​Webhook-based log ingestion

​LiveKit simulation integration

​Metrics Lab

​Folder-based agent organization

​Retell observability integration

​Vapi observability integration

​Bland observability integration

​Communities and workflows API endpoints

​SIP simulation integration

​Telephony simulation support

​Observability and evaluation API endpoints

​Digital humans and simulation runs API endpoints

​Agents and simulations API endpoints

​Digital human personas

​Simulation engine

​Observability pipeline

​Custom metrics engine

​Agent management and dashboard

​Evaluation framework

​Core platform infrastructure

DTMF tool for Digital Humans

Runs per Digital Human

Digital Human intelligence upgrade

Scripted silence in workflows

Workflow latency fix

Digital Human: silence tool fields

Redesigned Workflows

Revamped documentation

SMS simulation support

Threshold alarms

Faster call log evaluation

Miro integration for simulation visualization

Workflow scheduling improvements

Community-based simulation runs

Custom metric formulas

Pipecat integration

Knowledge base versioning API

Dashboard redesign

ElevenLabs observability integration

Digital human generation improvements

Slack alerting integration

WebSocket simulation support

Prompt versioning and labels

Webhook-based log ingestion

LiveKit simulation integration

Metrics Lab

Folder-based agent organization

Retell observability integration

Vapi observability integration

Bland observability integration

Communities and workflows API endpoints

SIP simulation integration

Telephony simulation support

Observability and evaluation API endpoints

Digital humans and simulation runs API endpoints

Agents and simulations API endpoints

Digital human personas

Simulation engine

Observability pipeline

Custom metrics engine

Agent management and dashboard

Evaluation framework

Core platform infrastructure