AI/ML · Global library
AI Evaluation Framework Builder
Build evaluation loops for AI systems with benchmark sets, rubric design, judge calibration, and human-review anchors.
CodexClaude CodeKimi Codeorchestrator-mcp
Best use case
Use AI Evaluation Framework Builder when you need to build evaluation loops for AI systems with benchmark sets, rubric design, judge calibration, and human-review anchors, especially when the work is driven by ai evaluation and benchmark suite.
Trigger signals
ai evaluationbenchmark suitellm judge
Validation hooks
evaluation-coverage-checkerjudge-calibration-validatorbenchmark-completeness-test
Install surface
Copy the exact command path you need.
Inspect
pip install "orchestrator-mcp[dashboard]"
orchestrator-mcp skills show ai-evaluation-framework-builder
Use
orchestrator-mcp skills export ai-evaluation-framework-builder --to ./skillforge-packs
# copy the exported pack into your preferred agent environment
Export
cp -R skills/ai-evaluation-framework-builder ./your-agent-skills/ai-evaluation-framework-builder
# or open skills/ai-evaluation-framework-builder/SKILL.md in a markdown-first client
File patterns
**/*.py**/*.ts**/*.json**/evals/**
Model preferences
deepseek-ai/deepseek-v3.2moonshotai/kimi-k2.5deepseek-r1:32b
Related skills
Adjacent packs to compose next.
Design robust communication protocols for agent systems with message schemas, serialization, and delivery guarantees
CodexClaude Code
Manage complete agent lifecycles from initialization through graceful shutdown with health monitoring, scaling, and resource optimization
CodexClaude Code
Design short-term, long-term, and episodic memory layers for agents without turning retrieval into an unbounded context leak.
CodexClaude Code