Include tool call data and metadata when evaluating production calls
When you evaluate production calls through Bluejay’s Observability pipeline, you must include the tool calls your agent made during the conversation to give your Custom Metrics access to business-level context — not just what the agent said, but what it actually did.
Tool calls and metadata must be passed directly in the request body of the /v1/evaluate endpoint. There is no separate enrichment step — everything must be submitted together when you send a call for evaluation.
The /v1/evaluate endpoint accepts a tool_calls array alongside your transcript and recording data. Each tool call entry describes a single invocation your agent made during the conversation.
The top-level metadata field is a free-form key-value object that stores additional context alongside the evaluation. It also powers Dynamic Variables in Custom Metrics — any {{placeholder}} in your metric descriptions will be substituted with matching keys from metadata.
Keys in metadata that don’t match a placeholder in any metric are simply stored as call context — they won’t cause errors and are always accessible in the call trace.
You can also pass structured events that occurred during the call using the events array. Events are distinct from tool calls — they represent higher-level occurrences like escalations, hold periods, or sentiment shifts.
{ "events": [ { "title": "Customer Escalation", "start_offset_ms": 15000, "end_offset_ms": 18000, "description": "Customer requested to speak with a manager", "tags": ["escalation", "manager_request"], "metadata": {"escalation_reason": "unresolved_complaint"} } ]}
Include start_offset_ms — timing data lets Bluejay correlate tool calls with specific moments in the conversation, giving metrics richer context
Use descriptive names — tool call names should clearly indicate the action taken (e.g., check_order_status not api_call_1)
Add descriptions — the description field helps Custom Metrics understand what the tool does, improving evaluation accuracy
Send parameters — input parameters let you build metrics that check whether the agent used the correct inputs
Combine with metadata — use metadata for call-level context (duration, resolution status, customer tier) and tool_calls for action-level detail
Send everything in one request — unlike simulations, observability tool calls are submitted in the same /v1/evaluate call as the transcript and recording