What Are Evaluations?
Evaluations are rules that automatically score the output of your prompts. Instead of manually reviewing every response, define criteria once and Traceport evaluates every test run.Creating Evaluation Rules
Evaluation Types
Content Quality
Content Quality
Score the response’s relevance, coherence, and completeness relative to the user’s request. Catches off‑topic or low‑quality responses.
Safety & Compliance
Safety & Compliance
Check for harmful content, PII leakage, or policy violations. Essential for customer-facing applications.
Format Validation
Format Validation
Verify that the response follows a required format — JSON schema, specific structure, or required fields.
Custom Criteria
Custom Criteria
Define your own scoring logic using natural language descriptions. Traceport uses an evaluator model to grade responses against your criteria.
Evaluations + Datasets
The most powerful workflow combines Evaluations with Datasets:- Create a Dataset with diverse test inputs
- Define Evaluation Rules for quality, safety, and format
- Run the batch — Traceport evaluates every response against every rule
- Review the scorecard — identify which inputs produce failing outputs
This workflow is ideal for prompt optimization cycles: make a change, run the dataset, and compare evaluation scores before and after.
Continuous Evaluation
As your prompts evolve through new versions, evaluations serve as a quality gate:- Run evaluations before publishing a new version
- Compare scores across versions to detect regressions
- Use evaluation pass rates as confidence signals for deployment

