AI/ML · Global library

AI Evaluation Framework Builder

Build evaluation loops for AI systems with benchmark sets, rubric design, judge calibration, and human-review anchors.

CodexClaude CodeKimi Codeorchestrator-mcp

Best use case

Use AI Evaluation Framework Builder when you need to build evaluation loops for AI systems with benchmark sets, rubric design, judge calibration, and human-review anchors, especially when the work is driven by ai evaluation and benchmark suite.

Trigger signals

ai evaluationbenchmark suitellm judge

Validation hooks

evaluation-coverage-checkerjudge-calibration-validatorbenchmark-completeness-test

Install surface

Copy the exact command path you need.

Inspect

pip install "orchestrator-mcp[dashboard]"
orchestrator-mcp skills show ai-evaluation-framework-builder

Use

orchestrator-mcp skills export ai-evaluation-framework-builder --to ./skillforge-packs
# copy the exported pack into your preferred agent environment

Export

cp -R skills/ai-evaluation-framework-builder ./your-agent-skills/ai-evaluation-framework-builder
# or open skills/ai-evaluation-framework-builder/SKILL.md in a markdown-first client

File patterns

**/*.py**/*.ts**/*.json**/evals/**

Model preferences

deepseek-ai/deepseek-v3.2moonshotai/kimi-k2.5deepseek-r1:32b

Related skills

Adjacent packs to compose next.

Design robust communication protocols for agent systems with message schemas, serialization, and delivery guarantees

CodexClaude Code

Manage complete agent lifecycles from initialization through graceful shutdown with health monitoring, scaling, and resource optimization

CodexClaude Code

Design short-term, long-term, and episodic memory layers for agents without turning retrieval into an unbounded context leak.

CodexClaude Code

Apache-2.0 licensedUse it, fork it, redistribute it. No strings attached.Safety-first by designSecret redaction, release scanning, and no shell execution. Constraints are public.GitHub nativeEvery pack is a real folder. Fork, edit, PR. No hidden backend or proprietary pipeline.

AI Evaluation Framework Builder

Copy the exact command path you need.

Adjacent packs to compose next.

Agent Communication Protocol Designer

Agent Lifecycle Manager

Agent Memory Designer