What You’ll Learn
- What Custom Metrics are and why they matter
- How to define evaluation criteria for your specific use case
- How Custom Metrics integrate with simulations and observability
How Custom Metrics Work
You create Custom Metrics to score conversations on domain-specific behavior such as compliance, resolution quality, empathy, or escalation accuracy. Those metrics can then be reused across simulations and production evaluations. Custom Metrics support two evaluation modes: LLM-as-a-Judges that use a prompt to score conversations, and formula-based definitions that compute composite scores from other metrics. You can prototype and refine metrics in Metrics Lab before deploying them.Key Capabilities
- LLM-as-a-Judge — write a natural-language prompt that scores conversations on any criteria you define
- Formula metrics — combine existing metric scores using arithmetic expressions for composite indicators
- Cross-workflow reuse — the same metric works in both simulation and observability evaluations
- Metrics Lab integration — test scoring logic against sample transcripts before going live
Common Use Cases
- Score whether an agent correctly verified a customer’s identity before sharing account details
- Track empathy and de-escalation quality across production calls
- Create a composite “call quality” score that weights resolution, tone, and compliance together
Next Steps
Metric Types
Learn about the six response types and when to use each one.
Dynamic Variables
Inject call-specific context into your metrics at evaluation time.
Create Custom Metric API
Define a new Custom Metric programmatically.
Metrics Lab
Prototype and test metrics before deployment.
Evaluate Endpoint
Submit calls for evaluation and pass metadata for dynamic variable substitution.