Skip to content

AI/ML · Global library

Inference Optimization Engineer

Optimize model serving with batching, quantization, streaming, and deployment-aware latency budgets that preserve quality.

CodexClaude CodeKimi Codeorchestrator-mcp

Best use case

Use Inference Optimization Engineer when you need to optimize model serving with batching, quantization, streaming, and deployment-aware latency budgets that preserve quality, especially when the work is driven by quantization and batching.

Trigger signals

quantizationbatchinginference latency

Validation hooks

inference-latency-checkerthroughput-validatoraccuracy-impact-test

Install surface

Copy the exact command path you need.

Inspect

pip install "orchestrator-mcp[dashboard]"
orchestrator-mcp skills show inference-optimization-engineer

Use

orchestrator-mcp skills export inference-optimization-engineer --to ./skillforge-packs
# copy the exported pack into your preferred agent environment

Export

cp -R skills/inference-optimization-engineer ./your-agent-skills/inference-optimization-engineer
# or open skills/inference-optimization-engineer/SKILL.md in a markdown-first client

File patterns

**/*.py**/*.cpp**/*.onnx**/*.gguf**/inference/**

Model preferences

deepseek-ai/deepseek-v3.2gemini-2.5-proqwen2.5-coder:32b

Related skills

Adjacent packs to compose next.

AI/MLGlobal library

Agent Lifecycle Manager

Open pack

Manage complete agent lifecycles from initialization through graceful shutdown with health monitoring, scaling, and resource optimization

CodexClaude Code
AI/MLGlobal library

Agent Memory Designer

Open pack

Design short-term, long-term, and episodic memory layers for agents without turning retrieval into an unbounded context leak.

CodexClaude Code