Quickstart

Get from account creation to running Simulations, evaluating production calls, and optimizing your Agent quality in under 15 minutes.

Prerequisites

A Bluejay account (book a demo if you don’t have one)
For API steps: an API key from the API Keys page

Basic

Everything you need to run your first Simulation and connect production monitoring.

Add an Agent

An Agent represents the conversational AI system you want to test and monitor. Navigate to the Agents page in the Bluejay dashboard and click Add Agent. Fill in the Agent name, system prompt, knowledge base, and goals.Learn more in the Agents overview or jump to the Add Agent API reference.

Create a Simulation

A Simulation groups Digital Humans together so you can run them in parallel against your Agent — think of it as a test suite. Open your Agent and click Create Simulation. Give it a name and optionally configure settings like max call duration or sequential calling.Learn more in the Simulations overview or explore Simulation Types.

Create or Generate Digital Humans

Digital Humans are the synthetic callers that interact with your Agent during Simulations. Inside your Simulation, click Generate Digital Humans. Choose from goal adherence, red teaming, load testing, or Customer Persona modes. You can also manually create individual Digital Humans for specific scenarios.Learn more in the Digital Humans overview.

Run the Simulation

Click the Run button on your Simulation. Bluejay’s Digital Humans will call your Agent, and results are evaluated automatically. You can watch conversations happen in real time and review results as they complete.Learn more about Simulation Runs.

Add Custom Metrics

Custom Metrics define the exact quality signals you care about — compliance checks, empathy scoring, task completion, or anything specific to your domain. Navigate to the Custom Metrics section of your Agent, click Create Metric, and define the name, description, response type, and scoring guidance.Learn more in the Custom Metrics overview.

Hook Up Observability

Observability evaluates your production conversations against the same Custom Metrics you use in Simulations. Go to the Observability tab for your Agent. You can connect a native integration (Retell, Vapi, Bland, ElevenLabs) or configure webhook-based ingestion directly from the Dashboard.Learn more in the Observability overview or follow the API integration tutorial.

Advanced

Level up with Alerts, Dashboards, optimization, automated testing, and Prompt Versioning.

Create Alerts

Alerts notify your team when a Custom Metric crosses a threshold so issues get caught before they compound. Navigate to the Alerts configuration for your Agent. Set the Custom Metric, threshold, and delivery channel (Slack, email).Learn more in the Alerts overview.

Create Custom Dashboards

Dashboards aggregate Simulation and production data into a single view. Open the Dashboards section to see at-a-glance health scores, trend sparklines, and Alert badges across all your Agents. Customize views to focus on the Custom Metrics that matter most.Learn more in the Dashboards overview.

Optimize Metrics via Metrics Lab

Metrics Lab lets you prototype, test, and refine Custom Metrics before deploying them. Open Metrics Lab from the sidebar. Import or draft Custom Metrics, annotate sample conversations, and run side-by-side comparisons to find the most reliable scoring approach. Promote validated Custom Metrics to production with one click.Learn more in the Metrics Lab overview.

Test Agents via Workflows

Workflows let you chain Simulation Runs, evaluations, and notifications into automated pipelines. Navigate to Workflows and create a new Workflow. Use the visual builder to add steps — Simulation Runs, evaluations, and notification nodes — and connect them into a pipeline. Set a schedule to run automatically.Learn more in the Workflows overview.

Version Prompts via Prompt Versioning

Prompt Versioning tracks changes to your Agent’s system prompt over time. Open the Prompts section of your Agent. Create a new version with an updated prompt, add a commit message, and optionally tag it with labels like production or staging. Compare performance across Simulation Runs and roll back when needed.See the Create Prompt Version API reference.

Basic

Everything you need to run your first Simulation and connect production monitoring.

import requests

API_KEY = "your-api-key"
BASE_URL = "https://api.getbluejay.ai/v1"

Add an Agent

An Agent represents the conversational AI system you want to test and monitor. You’ll attach Simulations, Custom Metrics, and Observability pipelines to it.

agent = requests.post(
    f"{BASE_URL}/add-agent",
    headers={"X-API-Key": API_KEY},
    json={
        "name": "Support Agent",
        "system_prompt": "You are a helpful customer support agent for Acme Corp.",
        "knowledge_base": "Acme Corp FAQ and product documentation.",
        "goals": [
            "Resolve customer issues accurately",
            "Maintain a professional and empathetic tone",
            "Escalate when unable to resolve"
        ],
        "type": "INBOUND",
        "mode": "VOICE"
    }
).json()

agent_id = agent["agent_id"]
print(f"Agent created: {agent_id}")

Save the agent_id from the response — you’ll use it in every subsequent step.

Learn more in the Agents overview. See the full Add Agent API reference for all available fields like connection_type, phone_number, and external_agent_id.

Create a Simulation

A Simulation groups Digital Humans together so you can run them in parallel against your Agent — think of it as a test suite.

simulation = requests.post(
    f"{BASE_URL}/create-simulation",
    headers={"X-API-Key": API_KEY},
    json={
        "agent_id": str(agent_id),
        "name": "Support Agent — Regression Suite",
        "max_call_duration": 5,
        "max_call_duration_units": "minutes"
    }
).json()

simulation_id = simulation["simulation_id"]
print(f"Simulation created: {simulation_id}")

Learn more in the Simulations overview and explore Simulation Types. See the full Create Simulation API reference for options like sequential_calling, max_concurrent, and selected_custom_metrics.

Create or Generate Digital Humans

Digital Humans are the synthetic callers that interact with your Agent during Simulations. You can generate them automatically from your Agent’s goals or create them manually for specific scenarios.Generate automatically from goals:

digital_humans = requests.post(
    f"{BASE_URL}/generate-digital-humans",
    headers={"X-API-Key": API_KEY},
    json={
        "agent_id": str(agent_id),
        "simulation_id": str(simulation_id),
        "goal_adherence": [
            {"goal": "Resolve billing dispute", "num_calls": 3},
            {"goal": "Process a return request", "num_calls": 2}
        ],
        "red_teaming": 2
    }
).json()

print(f"Generated {len(digital_humans['digital_humans'])} Digital Humans")

Or create one manually:

dh = requests.post(
    f"{BASE_URL}/create-digital-human",
    headers={"X-API-Key": API_KEY},
    json={
        "simulation_id": str(simulation_id),
        "name": "Frustrated Customer",
        "scenario": "Customer received the wrong item and wants a full refund.",
        "success_criteria": "Agent processes the return and confirms refund timeline."
    }
).json()

Learn more in the Digital Humans overview. See Generate Digital Humans and Create Digital Human API references for all options.

Run the Simulation

Queue a Simulation Run to start the conversations. Bluejay’s Digital Humans will call your Agent, and results are evaluated automatically.

run = requests.post(
    f"{BASE_URL}/queue-simulation-run",
    headers={"X-API-Key": API_KEY},
    json={
        "simulation_id": str(simulation_id)
    }
).json()

print(f"Simulation run queued: {run['simulation_run_id']}")
print(f"Result IDs: {run['simulation_result_ids']}")

Use the Get Simulation Runs endpoint to poll for completion, or set up a webhook to get notified when runs finish.

Learn more about Simulation Runs. See the full Queue Simulation Run API reference for options like phone_number, digital_human_ids, and prompt_id.

Add Custom Metrics

Custom Metrics define the exact quality signals you care about — compliance checks, empathy scoring, task completion, or anything specific to your domain.

metric = requests.post(
    f"{BASE_URL}/create-custom-metric",
    headers={"X-API-Key": API_KEY},
    json={
        "agent_ids": [agent_id],
        "name": "Identity Verification",
        "description": "Did the agent verify the customer's identity before sharing account details?",
        "response_type": "pass_fail",
        "scoring_guidance": "Pass if the agent asks for at least two identifying pieces of information before disclosing any account-specific data. Fail otherwise."
    }
).json()

print(f"Metric created: {metric['id']}")

Learn more in the Custom Metrics overview. See the Create Custom Metric API reference for all response types (pass_fail, yes_no, qualitative, quantitative, json, enum).

Hook Up Observability

Observability evaluates your production conversations against the same Custom Metrics you use in Simulations. Send calls to Bluejay via the API, webhooks, or native integrations.

evaluation = requests.post(
    f"{BASE_URL}/evaluate",
    headers={"X-API-Key": API_KEY},
    json={
        "agent_id": str(agent_id),
        "transcript": [
            {"speaker": "CUSTOMER", "utterance": "Hi, I need help with my recent order."},
            {"speaker": "AGENT", "utterance": "Of course! Could you provide your order number?"},
            {"speaker": "CUSTOMER", "utterance": "It's 12345."},
            {"speaker": "AGENT", "utterance": "I see order 12345. It looks like it shipped yesterday. You should receive it by Friday."}
        ],
        "participants": [
            {"role": "AGENT", "spoke_first": False, "name": "Support Agent"},
            {"role": "CUSTOMER", "spoke_first": True, "name": "Customer"}
        ],
        "call_direction": "INBOUND"
    }
).json()

print(f"Evaluation queued: {evaluation['call_id']}")

You can also send audio recordings via recording_url or recording_base64 instead of transcripts. Learn more in the Observability overview or follow the API integration tutorial. See the full Evaluate API reference.

Ingestion Method	Best For
Evaluate API	Direct backend integration
Webhook ingestion	Platforms with outbound webhooks
Retell	Retell-powered agents
Vapi	Vapi-powered agents
Bland	Bland-powered agents
ElevenLabs	ElevenLabs Conversational AI

Advanced

Level up with Alerts, Dashboards, optimization, automated testing, and Prompt Versioning.

Create Alerts

Alerts notify your team when a Custom Metric crosses a threshold so issues get caught before they compound. Alerts are configured through the Bluejay Dashboard — set threshold-based triggers on any Custom Metric and route notifications to Slack channels or email.

Connect Bluejay to Slack via the Slack integration to receive alert notifications in your team channels.

Learn more in the Alerts overview.

Create Custom Dashboards

Dashboards aggregate Simulation and production data into a single view — health scores, trend sparklines, and Alert badges for every Agent. Dashboards are configured through the Bluejay UI and pull from both Simulation results and Observability evaluations, so once you’ve completed the basic steps, your Dashboards will populate automatically.Learn more in the Dashboards overview.

Optimize Metrics via Metrics Lab

Metrics Lab lets you prototype, test, and refine Custom Metrics before deploying them. It’s an interactive UI experience — import or draft Custom Metrics, annotate sample conversations, run side-by-side comparisons, and promote validated Custom Metrics to production. Once refined, you can manage Custom Metrics programmatically via the Custom Metrics API.Learn more in the Metrics Lab overview.

Test Agents via Workflows

Workflows let you chain Simulation Runs, evaluations, and notifications into automated pipelines. Configure pipelines in the Bluejay dashboard, then attach a cron cadence with the Create Schedule API for nightly regression suites or weekly quality checks.For conversation-path workflows (React Flow graphs used with simulations and digital humans), see the Workflows overview, the Workflows cookbook, and Create workflow (POST /v1/workflow).

Version Prompts via Prompt Versioning

Prompt Versioning tracks changes to your Agent’s system prompt over time. Create labeled versions, compare performance across Simulation Runs, and roll back when needed.

prompt_version = requests.post(
    f"{BASE_URL}/agents/{agent_id}/prompts/versions",
    headers={"X-API-Key": API_KEY},
    json={
        "prompt_text": "You are a helpful customer support agent for Acme Corp. Always verify the customer's identity before sharing account details.",
        "commit_message": "Added identity verification requirement",
        "labels": ["staging"]
    }
).json()

print(f"Prompt version {prompt_version['version']} created (id: {prompt_version['id']})")

Pass a prompt_id when queuing a Simulation Run to test a specific prompt version against your existing Digital Humans.

See Create Prompt Version for the full reference.

Next Steps

Key Concepts

Deep dive into Agents, Digital Humans, Custom Metrics, and more.

Simulation Types

Explore goal adherence, red teaming, load testing, and replay strategies.

Observability Guide

Step-by-step guide to connecting your production pipeline.

API Reference

Full endpoint documentation for every Bluejay API.

Getting Started

Key Concepts

Test

Monitor

Integrations

Prerequisites

Basic

Add an Agent

Create a Simulation

Create or Generate Digital Humans

Run the Simulation

Add Custom Metrics

Hook Up Observability

Advanced

Create Alerts

Create Custom Dashboards

Optimize Metrics via Metrics Lab

Test Agents via Workflows

Version Prompts via Prompt Versioning

Basic

Add an Agent

Create a Simulation

Create or Generate Digital Humans

Run the Simulation

Add Custom Metrics

Hook Up Observability

Advanced

Create Alerts

Create Custom Dashboards

Optimize Metrics via Metrics Lab

Test Agents via Workflows

Version Prompts via Prompt Versioning

Next Steps

Key Concepts

Simulation Types

Observability Guide

API Reference

Getting Started

Key Concepts

Test

Monitor

Integrations

​Prerequisites

​Basic

Add an Agent

Create a Simulation

Create or Generate Digital Humans

Run the Simulation

Add Custom Metrics

Hook Up Observability

​Advanced

Create Alerts

Create Custom Dashboards

Optimize Metrics via Metrics Lab

Test Agents via Workflows

Version Prompts via Prompt Versioning

​Basic

Add an Agent

Create a Simulation

Create or Generate Digital Humans

Run the Simulation

Add Custom Metrics

Hook Up Observability

​Advanced

Create Alerts

Create Custom Dashboards

Optimize Metrics via Metrics Lab

Test Agents via Workflows

Version Prompts via Prompt Versioning

​Next Steps

Key Concepts

Simulation Types

Observability Guide

API Reference

Prerequisites

Basic

Advanced

Basic

Advanced

Next Steps