49 terms
No matching terms found.
A
3 termsAgent
Agent
The voice or chat AI system you are testing or monitoring in Bluejay. An agent holds configuration such as goals, a knowledge base, system prompt, and connection details that anchor simulations, observability, and metrics.See more: Agents Overview | Agent Configuration | API Reference
Agent Development Lifecycle (ADLC)
Agent Development Lifecycle (ADLC)
The iterative loop — Build, Test, Monitor, Improve — that Bluejay enables for conversational AI teams. Each stage feeds back into the next to drive continuous agent quality.See more: Introduction
Alert
Alert
A notification triggered when a monitored custom metric crosses a configured threshold. Alerts can be delivered via Slack or email and appear as badges on dashboards.See more: Alerts Overview | Slack Integration
B
1 termBinary Frame
Binary Frame
A WebSocket frame carrying raw bytes rather than UTF-8 text. In the CHIRP protocol, binary frames transport PCM audio samples with no envelope or headers.See more: WebSocket Integration
C
7 termsCall Log
Call Log
A stored record of a production conversation along with its evaluation scores, latency metrics, and metadata. Call logs can be re-evaluated when custom metrics change.See more: Observability | Retrieve Call Log API
CHIRP
CHIRP
Conversational Handoff for Inter-agent Realtime Protocol — the message envelope used for WebSocket-based simulations. CHIRP defines message types including text, audio, status, and control.See more: WebSocket Integration
Community
Community
A reusable group of Digital Humans organized around a shared testing purpose. Communities enable consistent benchmarking across agents and simulation runs, and a Digital Human can belong to multiple communities.See more: Communities Overview | Communities Deep Dive | API Reference
Connection Type
Connection Type
The transport channel used to connect an agent to Bluejay for simulations. Supported types include telephony, SIP, LiveKit, WebSocket, and HTTP.See more: Agents Overview | Simulation Types
Custom Metric
Custom Metric
A user-defined evaluation signal used to measure what matters for a specific use case. Custom metrics can be LLM-judged (scored via a natural-language prompt) or formula-based (computed from other metrics), and they work identically across simulations and production observability.See more: Custom Metrics Overview | API Reference
Customer Persona
Customer Persona
A reusable profile that combines customer traits into a template for Digital Human generation. Personas can be created via the API and attached to agents.See more: Customer Traits | API Reference
Customer Traits
Customer Traits
Structured behavioral and contextual attributes — such as tone, language, urgency, and simulated PII — that shape how a Digital Human behaves during a simulation.See more: Customer Traits Overview | Digital Humans
D
4 termsDashboard
Dashboard
A consolidated view of agent health scores, trend sparklines, alert badges, and quality signals across simulation runs and live production traffic.See more: Dashboards Overview
Diarization
Diarization
The process of segmenting an audio recording by speaker, attributing each utterance to the correct participant (e.g. AGENT or CUSTOMER). Accurate diarization is a prerequisite for generating well-structured transcripts and reliable evaluation scores.
DTMF
DTMF
Dual-Tone Multi-Frequency signaling — the tones produced when a caller presses keys on a phone keypad (often called touch-tone). Used in IVR menus and voice agents to capture numeric input and symbols such as
* and #. In Bluejay workflow graphs, a turn can model DTMF input so simulations cover keypad paths.See more: Workflows cookbookDigital Human
Digital Human
A synthetic customer used to simulate realistic conversations during evaluation. Each Digital Human has a persona, scenario script, customer traits, and success criteria. They can be created manually or auto-generated.See more: Digital Humans Overview | Digital Humans Deep Dive | API Reference
E
1 termEvaluate
Evaluate
The API endpoint and process for submitting a conversation transcript or recording for scoring against custom metrics. Evaluation results feed into call logs, dashboards, and webhook notifications.See more: Evaluate API | Observability Cookbook | API Integration Tutorial
F
2 termsFolder
Folder
An organizational container for grouping agents. Folders also scope webhook subscriptions so notifications can be targeted to specific agent groups.See more: Webhooks | API Reference
Formula Metric
Formula Metric
A custom metric defined as an arithmetic combination of other metric scores, enabling composite quality signals without additional LLM evaluation.See more: Custom Metrics | Metrics Lab
H
1 termHallucination
Hallucination
When an agent provides incorrect or fabricated information that contradicts its knowledge base. Bluejay detects hallucinations automatically during evaluation and includes reasoning for each finding.See more: Observability | Observability Overview
K
1 termKnowledge Base
Knowledge Base
Structured reference content — FAQs, policies, product information — attached to an agent. The knowledge base serves as the ground truth for hallucination detection and grounded evaluations. Supports versioning with labels.See more: Agent Configuration | API Reference
L
4 termsLabel
Label
A tag (e.g. “production”, “staging”) applied to prompt or knowledge base versions for environment management and version pinning across simulation runs.See more: API Reference
Latency
Latency
The time elapsed between the end of a user’s utterance and the start of the agent’s spoken response. Latency is a key voice AI quality signal — high latency degrades perceived naturalness. Bluejay captures and surfaces latency metrics on call logs and dashboards.
LiveKit
LiveKit
A WebRTC-based integration for room-based voice simulations. Bluejay creates a LiveKit room and token, then a Digital Human joins the room alongside the agent for real-time voice testing.See more: LiveKit Integration | Simulation Types
LLM-as-a-Judge
LLM-as-a-Judge
A custom metric mode where a natural-language prompt is used to score conversations via an LLM. The prompt describes what to evaluate and how to score, and the LLM returns a structured result.See more: Custom Metrics
M
2 termsMetrics Lab
Metrics Lab
An environment where teams draft, human-annotate, test, and refine custom metrics before promoting them to production use. Metrics Lab supports side-by-side comparison of scoring approaches.See more: Metrics Lab Overview | Custom Metrics
Mono
Mono
Single-channel audio. WebSocket simulations use mono because voice is a single-source signal. All CHIRP audio frames are mono at 16 kHz.See more: WebSocket Integration
O
1 termObservability
Observability
Bluejay’s live-monitoring layer for evaluating real customer conversations in production. Observability ingests transcripts via API, webhooks, or native integrations and scores them against custom metrics, detecting hallucinations, redundancy, and latency issues.See more: Observability Overview | Observability Deep Dive | Monitor
P
3 termspcm_s16le
pcm_s16le
Pulse-Code Modulation, signed 16-bit little-endian. A raw audio encoding format where each sample is a 16-bit signed integer stored in little-endian byte order. This is the audio format used by the CHIRP WebSocket protocol.See more: WebSocket Integration
Prompt Version
Prompt Version
A versioned system prompt with commit messages and labels, enabling prompt experimentation and A/B testing across simulation runs. Prompt versions can be pinned to environments via labels.See more: Quickstart | API Reference
PSTN
PSTN
Public Switched Telephone Network — traditional phone-number routing. In Bluejay, PSTN-based telephony simulations are contrasted with direct SIP connectivity, which enables richer tool-call tracking.See more: SIP Integration | Telephony Integration
R
4 termsRed Teaming
Red Teaming
A Digital Human generation mode that creates adversarial or stress-test scenarios to probe agent edge cases, safety boundaries, and vulnerabilities.See more: Quickstart
Redundancy
Redundancy
Unnecessary repetition detected in agent responses during evaluation. Bluejay flags redundancy automatically and includes reasoning to help improve agent prompts.See more: Observability | Observability Overview
Re-evaluate
Re-evaluate
Running a fresh evaluation on an existing call log, typically after updating custom metrics or adding new ones. Re-evaluation rescores the original transcript without requiring a new conversation.See more: Re-evaluate API
Regression Testing
Regression Testing
Re-running simulations after agent changes (prompt updates, model swaps, configuration tweaks) to detect quality degradation before shipping to production.See more: Simulation Types | Test Overview
S
8 termsSample Rate
Sample Rate
The number of audio samples captured per second, measured in Hz. CHIRP uses 16,000 Hz (16 kHz), the industry standard for speech processing. Higher sample rates capture more frequency detail but increase bandwidth.See more: WebSocket Integration
Scenario Script
Scenario Script
A natural-language description defining a Digital Human’s goal, constraints, and expected behavior within a simulation. Scenario scripts guide how the synthetic customer interacts with the agent.See more: Advanced Testing Strategies | Digital Humans Overview
Schedule
Schedule
A cron-style recurring trigger for automated workflow execution. Schedules enable teams to run simulations or evaluations on a regular cadence without manual intervention.See more: Workflows Overview | API Reference
Simulation
Simulation
A controlled pre-production evaluation container that groups Digital Humans (and optionally Communities) to test agent behavior against realistic scenarios. Simulations are re-runnable and scored with custom metrics.See more: Simulations Overview | Simulations Deep Dive | API Reference
Simulation Result
Simulation Result
The per-Digital-Human outcome of a simulation run, containing evaluation scores, the conversation transcript, and metadata. Results can be enriched post-call with tool calls and custom metadata via the API.See more: API Reference | Tool Calls & Metadata
Simulation Run
Simulation Run
A specific execution of a simulation that produces transcripts, results, and diagnostics for each Digital Human. Each run is independently scored and can be compared against previous runs for regression detection.See more: Simulation Runs | Queue Simulation Run API
SIP
SIP
Session Initiation Protocol — a telephony protocol used to connect voice systems directly, bypassing PSTN. SIP integration enables custom headers like X-Simulation-Result-Id for tool-call enrichment.See more: SIP Integration | Tool Calls & Metadata
Success Criteria
Success Criteria
Conditions that define a successful outcome for a Digital Human scenario, used to evaluate whether the agent achieved the intended goal during a simulation.See more: Digital Humans Overview
T
5 termsTelephony
Telephony
Phone-number-based simulation connectivity supporting inbound and outbound call testing via PSTN providers. Outbound telephony has Bluejay call the agent; inbound has the agent call a Bluejay number.See more: Telephony Integration | Simulation Types
Test
Test
The Bluejay product area focused on pre-production validation through simulations, Digital Human scenarios, custom metrics, and regression detection.See more: Test Overview | Simulations Overview
Tool Calls
Tool Calls
Logged agent API or function invocations during conversations, used to enrich evaluations with business-level context. Tool calls can be sent via the evaluate endpoint (observability) or patched onto simulation results post-call.See more: Tool Calls & Metadata (Simulations) | Tool Calls & Metadata (Observability) | Observability Cookbook
Trace
Trace
An execution record conforming to the OpenTelemetry standard that captures the internal flow of an agent during a conversation. Traces can be linked to evaluations via a trace_id for unified debugging.See more: Traces | Evaluate API
Transcript
Transcript
The text record of conversation utterances with speaker roles (e.g. AGENT, CUSTOMER), used as the primary input for evaluation. Transcripts can be submitted directly or derived from a recording URL.See more: Observability | Evaluate API
W
2 termsWebhook
Webhook
A callback URL that receives Bluejay events (simulation results, evaluation completions, outbound simulation starts) as HTTP POST requests. Webhooks can be scoped to specific agents or folders and are verified via an HMAC signature header.See more: Webhooks | Events Webhook | Evaluate Webhook
Workflow
Workflow
A graph of connected nodes that models evaluation pipelines, branching logic, or downstream automation. Workflows support cron scheduling, retries, and webhook triggers, and can be built visually or via the API.See more: Workflows Overview | Workflow Cookbook | API Reference