Skip to main content

49 terms

No matching terms found.

A

3 terms
The voice or chat AI system you are testing or monitoring in Bluejay. An agent holds configuration such as goals, a knowledge base, system prompt, and connection details that anchor simulations, observability, and metrics.See more: Agents Overview | Agent Configuration | API Reference
The iterative loop — Build, Test, Monitor, Improve — that Bluejay enables for conversational AI teams. Each stage feeds back into the next to drive continuous agent quality.See more: Introduction
A notification triggered when a monitored custom metric crosses a configured threshold. Alerts can be delivered via Slack or email and appear as badges on dashboards.See more: Alerts Overview | Slack Integration

B

1 term
A WebSocket frame carrying raw bytes rather than UTF-8 text. In the CHIRP protocol, binary frames transport PCM audio samples with no envelope or headers.See more: WebSocket Integration

C

7 terms
A stored record of a production conversation along with its evaluation scores, latency metrics, and metadata. Call logs can be re-evaluated when custom metrics change.See more: Observability | Retrieve Call Log API
Conversational Handoff for Inter-agent Realtime Protocol — the message envelope used for WebSocket-based simulations. CHIRP defines message types including text, audio, status, and control.See more: WebSocket Integration
A reusable group of Digital Humans organized around a shared testing purpose. Communities enable consistent benchmarking across agents and simulation runs, and a Digital Human can belong to multiple communities.See more: Communities Overview | Communities Deep Dive | API Reference
The transport channel used to connect an agent to Bluejay for simulations. Supported types include telephony, SIP, LiveKit, WebSocket, and HTTP.See more: Agents Overview | Simulation Types
A user-defined evaluation signal used to measure what matters for a specific use case. Custom metrics can be LLM-judged (scored via a natural-language prompt) or formula-based (computed from other metrics), and they work identically across simulations and production observability.See more: Custom Metrics Overview | API Reference
A reusable profile that combines customer traits into a template for Digital Human generation. Personas can be created via the API and attached to agents.See more: Customer Traits | API Reference
Structured behavioral and contextual attributes — such as tone, language, urgency, and simulated PII — that shape how a Digital Human behaves during a simulation.See more: Customer Traits Overview | Digital Humans

D

4 terms
A consolidated view of agent health scores, trend sparklines, alert badges, and quality signals across simulation runs and live production traffic.See more: Dashboards Overview
The process of segmenting an audio recording by speaker, attributing each utterance to the correct participant (e.g. AGENT or CUSTOMER). Accurate diarization is a prerequisite for generating well-structured transcripts and reliable evaluation scores.
Dual-Tone Multi-Frequency signaling — the tones produced when a caller presses keys on a phone keypad (often called touch-tone). Used in IVR menus and voice agents to capture numeric input and symbols such as * and #. In Bluejay workflow graphs, a turn can model DTMF input so simulations cover keypad paths.See more: Workflows cookbook
A synthetic customer used to simulate realistic conversations during evaluation. Each Digital Human has a persona, scenario script, customer traits, and success criteria. They can be created manually or auto-generated.See more: Digital Humans Overview | Digital Humans Deep Dive | API Reference

E

1 term
The API endpoint and process for submitting a conversation transcript or recording for scoring against custom metrics. Evaluation results feed into call logs, dashboards, and webhook notifications.See more: Evaluate API | Observability Cookbook | API Integration Tutorial

F

2 terms
An organizational container for grouping agents. Folders also scope webhook subscriptions so notifications can be targeted to specific agent groups.See more: Webhooks | API Reference
A custom metric defined as an arithmetic combination of other metric scores, enabling composite quality signals without additional LLM evaluation.See more: Custom Metrics | Metrics Lab

H

1 term
When an agent provides incorrect or fabricated information that contradicts its knowledge base. Bluejay detects hallucinations automatically during evaluation and includes reasoning for each finding.See more: Observability | Observability Overview

K

1 term
Structured reference content — FAQs, policies, product information — attached to an agent. The knowledge base serves as the ground truth for hallucination detection and grounded evaluations. Supports versioning with labels.See more: Agent Configuration | API Reference

L

4 terms
A tag (e.g. “production”, “staging”) applied to prompt or knowledge base versions for environment management and version pinning across simulation runs.See more: API Reference
The time elapsed between the end of a user’s utterance and the start of the agent’s spoken response. Latency is a key voice AI quality signal — high latency degrades perceived naturalness. Bluejay captures and surfaces latency metrics on call logs and dashboards.
A WebRTC-based integration for room-based voice simulations. Bluejay creates a LiveKit room and token, then a Digital Human joins the room alongside the agent for real-time voice testing.See more: LiveKit Integration | Simulation Types
A custom metric mode where a natural-language prompt is used to score conversations via an LLM. The prompt describes what to evaluate and how to score, and the LLM returns a structured result.See more: Custom Metrics

M

2 terms
An environment where teams draft, human-annotate, test, and refine custom metrics before promoting them to production use. Metrics Lab supports side-by-side comparison of scoring approaches.See more: Metrics Lab Overview | Custom Metrics
Single-channel audio. WebSocket simulations use mono because voice is a single-source signal. All CHIRP audio frames are mono at 16 kHz.See more: WebSocket Integration

O

1 term
Bluejay’s live-monitoring layer for evaluating real customer conversations in production. Observability ingests transcripts via API, webhooks, or native integrations and scores them against custom metrics, detecting hallucinations, redundancy, and latency issues.See more: Observability Overview | Observability Deep Dive | Monitor

P

3 terms
Pulse-Code Modulation, signed 16-bit little-endian. A raw audio encoding format where each sample is a 16-bit signed integer stored in little-endian byte order. This is the audio format used by the CHIRP WebSocket protocol.See more: WebSocket Integration
A versioned system prompt with commit messages and labels, enabling prompt experimentation and A/B testing across simulation runs. Prompt versions can be pinned to environments via labels.See more: Quickstart | API Reference
Public Switched Telephone Network — traditional phone-number routing. In Bluejay, PSTN-based telephony simulations are contrasted with direct SIP connectivity, which enables richer tool-call tracking.See more: SIP Integration | Telephony Integration

R

4 terms
A Digital Human generation mode that creates adversarial or stress-test scenarios to probe agent edge cases, safety boundaries, and vulnerabilities.See more: Quickstart
Unnecessary repetition detected in agent responses during evaluation. Bluejay flags redundancy automatically and includes reasoning to help improve agent prompts.See more: Observability | Observability Overview
Running a fresh evaluation on an existing call log, typically after updating custom metrics or adding new ones. Re-evaluation rescores the original transcript without requiring a new conversation.See more: Re-evaluate API
Re-running simulations after agent changes (prompt updates, model swaps, configuration tweaks) to detect quality degradation before shipping to production.See more: Simulation Types | Test Overview

S

8 terms
The number of audio samples captured per second, measured in Hz. CHIRP uses 16,000 Hz (16 kHz), the industry standard for speech processing. Higher sample rates capture more frequency detail but increase bandwidth.See more: WebSocket Integration
A natural-language description defining a Digital Human’s goal, constraints, and expected behavior within a simulation. Scenario scripts guide how the synthetic customer interacts with the agent.See more: Advanced Testing Strategies | Digital Humans Overview
A cron-style recurring trigger for automated workflow execution. Schedules enable teams to run simulations or evaluations on a regular cadence without manual intervention.See more: Workflows Overview | API Reference
A controlled pre-production evaluation container that groups Digital Humans (and optionally Communities) to test agent behavior against realistic scenarios. Simulations are re-runnable and scored with custom metrics.See more: Simulations Overview | Simulations Deep Dive | API Reference
The per-Digital-Human outcome of a simulation run, containing evaluation scores, the conversation transcript, and metadata. Results can be enriched post-call with tool calls and custom metadata via the API.See more: API Reference | Tool Calls & Metadata
A specific execution of a simulation that produces transcripts, results, and diagnostics for each Digital Human. Each run is independently scored and can be compared against previous runs for regression detection.See more: Simulation Runs | Queue Simulation Run API
Session Initiation Protocol — a telephony protocol used to connect voice systems directly, bypassing PSTN. SIP integration enables custom headers like X-Simulation-Result-Id for tool-call enrichment.See more: SIP Integration | Tool Calls & Metadata
Conditions that define a successful outcome for a Digital Human scenario, used to evaluate whether the agent achieved the intended goal during a simulation.See more: Digital Humans Overview

T

5 terms
Phone-number-based simulation connectivity supporting inbound and outbound call testing via PSTN providers. Outbound telephony has Bluejay call the agent; inbound has the agent call a Bluejay number.See more: Telephony Integration | Simulation Types
The Bluejay product area focused on pre-production validation through simulations, Digital Human scenarios, custom metrics, and regression detection.See more: Test Overview | Simulations Overview
Logged agent API or function invocations during conversations, used to enrich evaluations with business-level context. Tool calls can be sent via the evaluate endpoint (observability) or patched onto simulation results post-call.See more: Tool Calls & Metadata (Simulations) | Tool Calls & Metadata (Observability) | Observability Cookbook
An execution record conforming to the OpenTelemetry standard that captures the internal flow of an agent during a conversation. Traces can be linked to evaluations via a trace_id for unified debugging.See more: Traces | Evaluate API
The text record of conversation utterances with speaker roles (e.g. AGENT, CUSTOMER), used as the primary input for evaluation. Transcripts can be submitted directly or derived from a recording URL.See more: Observability | Evaluate API

W

2 terms
A callback URL that receives Bluejay events (simulation results, evaluation completions, outbound simulation starts) as HTTP POST requests. Webhooks can be scoped to specific agents or folders and are verified via an HMAC signature header.See more: Webhooks | Events Webhook | Evaluate Webhook
A graph of connected nodes that models evaluation pipelines, branching logic, or downstream automation. Workflows support cron scheduling, retries, and webhook triggers, and can be built visually or via the API.See more: Workflows Overview | Workflow Cookbook | API Reference