Skip to content

AI/ML · Global library

AI Evaluation Framework Builder

Build evaluation loops for AI systems with benchmark sets, rubric design, judge calibration, and human-review anchors.

CodexClaude CodeKimi Codeorchestrator-mcp

Best use case

Use AI Evaluation Framework Builder when you need to build evaluation loops for AI systems with benchmark sets, rubric design, judge calibration, and human-review anchors, especially when the work is driven by ai evaluation and benchmark suite.

Trigger signals

ai evaluationbenchmark suitellm judge

Validation hooks

evaluation-coverage-checkerjudge-calibration-validatorbenchmark-completeness-test

Install surface

Copy the exact command path you need.

Inspect

pip install "orchestrator-mcp[dashboard]"
orchestrator-mcp skills show ai-evaluation-framework-builder

Use

orchestrator-mcp skills export ai-evaluation-framework-builder --to ./skillforge-packs
# copy the exported pack into your preferred agent environment

Export

cp -R skills/ai-evaluation-framework-builder ./your-agent-skills/ai-evaluation-framework-builder
# or open skills/ai-evaluation-framework-builder/SKILL.md in a markdown-first client

File patterns

**/*.py**/*.ts**/*.json**/evals/**

Model preferences

deepseek-ai/deepseek-v3.2moonshotai/kimi-k2.5deepseek-r1:32b

Related skills

Adjacent packs to compose next.

AI/MLGlobal library

Agent Lifecycle Manager

Open pack

Manage complete agent lifecycles from initialization through graceful shutdown with health monitoring, scaling, and resource optimization

CodexClaude Code
AI/MLGlobal library

Agent Memory Designer

Open pack

Design short-term, long-term, and episodic memory layers for agents without turning retrieval into an unbounded context leak.

CodexClaude Code