RAG System Architect
Design retrieval-augmented generation systems with chunking, ranking, citation, and context-budget discipline that hold up in production.
Collection
Prompt systems, retrieval, evaluation, model operations, and agentic AI infrastructure.
41 skills in this lane
Design retrieval-augmented generation systems with chunking, ranking, citation, and context-budget discipline that hold up in production.
Allocate work across local, cloud, and premium models so teams maximize capability coverage per dollar and per latency budget.
Reshape sprawling repositories and briefs into stable context lanes, memory checkpoints, and retrieval boundaries for long-horizon agent work.
Tune provider routing policy for quality, cost ceilings, and fallback behavior across multiple model subscriptions.
Build eval cases that expose fabricated citations, brittle reasoning chains, and ungrounded tool usage before they hit real workflows.
Tune timeout, retry, and concurrency budgets across multi-model routes so orchestration stays fast without silent quality collapse.
Improve chunking, metadata, and ranking design so agent answers stay grounded under larger repositories and longer tasks.
Design robust communication protocols for agent systems with message schemas, serialization, and delivery guarantees
Manage complete agent lifecycles from initialization through graceful shutdown with health monitoring, scaling, and resource optimization
Design short-term, long-term, and episodic memory layers for agents without turning retrieval into an unbounded context leak.
Optimize large-scale agent swarms for emergent problem-solving with dynamic task allocation and collective intelligence patterns
Build evaluation loops for AI systems with benchmark sets, rubric design, judge calibration, and human-review anchors.
Design comprehensive oversight systems for AI agents with monitoring, intervention, and escalation protocols
Design and execute comprehensive safety evaluations for AI systems with red-teaming, adversarial testing, and safety metric frameworks
Implement constitutional AI principles with self-critique, revision loops, and principled response generation
Optimize context window usage for RAG systems with intelligent chunking, relevance ranking, and dynamic context assembly
Build embedding pipelines with retrieval-aware chunking, vector index strategy, and similarity quality that can be measured.
Design effective few-shot prompts with example selection, formatting, and optimization for consistent high-quality outputs
Create fine-tuning workflows with dataset preparation, evaluation baselines, and rollback-ready deployment checkpoints.
Design robust function calling systems with schema validation, error handling, and multi-step tool orchestration
Design and implement GraphRAG systems that leverage knowledge graphs for enhanced retrieval and multi-hop reasoning
Build tree-structured agent hierarchies for complex decision-making with clear authority chains and delegation patterns
Design and implement hybrid search systems combining dense, sparse, and keyword retrieval for optimal relevance
Optimize model serving with batching, quantization, streaming, and deployment-aware latency budgets that preserve quality.
Build knowledge graphs from unstructured data with entity extraction, relationship identification, and graph construction
Design multi-layer caching strategies for LLM inference with semantic cache, prompt cache, and response cache optimization
Compose sophisticated LLM chains with conditional routing, parallel execution, and state management
Integrate hosted and local LLM providers with fallback, rate limiting, and spend-aware routing that remains debuggable in production.
Design intelligent load balancing for LLM inference with request routing, session affinity, and dynamic capacity management
Design and implement production-grade LLM serving infrastructure with optimal throughput, latency, and cost efficiency
Build comprehensive observability for LLM systems with tracing, metrics, logging, and cost analytics
Design sophisticated rate limiting for LLM APIs with token-based quotas, tiered limits, and burst handling
Build comprehensive testing frameworks for LLM applications with unit tests, integration tests, and evaluation metrics
Put model versioning, experiment tracking, drift detection, and rollback policy around production AI systems.
Design and orchestrate complex multi-agent systems where specialized agents collaborate to solve problems beyond single-agent capabilities
Integrate text, vision, audio, and document intelligence into one application surface with graceful modality-aware fallbacks.
Design system prompts, prompt contracts, and eval-backed example sets that improve LLM reliability without hiding failure modes.
Build comprehensive evaluation frameworks for RAG systems with retrieval metrics, generation metrics, and end-to-end assessment
Design robust reward functions and evaluation frameworks that prevent reward hacking and specification gaming
Design robust structured output systems with JSON schemas, validation, and parsing for reliable data extraction
Test and validate AI system alignment with organizational and societal values through systematic evaluation frameworks