ai-agents
npx skills add https://github.com/vasilyu1983/ai-agents-public --skill ai-agents
Agent 安装分布
Skill 文档
AI Agents Development â Production Skill Hub
Modern Best Practices (January 2026): deterministic control flow, bounded tools, auditable state, MCP-based tool integration, handoff-first orchestration, multi-layer guardrails, OpenTelemetry tracing, and human-in-the-loop controls (OWASP LLM Top 10: https://owasp.org/www-project-top-10-for-large-language-model-applications/).
This skill provides production-ready operational patterns for designing, building, evaluating, and deploying AI agents. It centralizes procedures, checklists, decision rules, and templates used across RAG agents, tool-using agents, OS agents, and multi-agent systems.
No theory. No narrative. Only operational steps and templates.
When to Use This Skill
Codex should activate this skill whenever the user asks for:
- Designing an agent (LLM-based, tool-based, OS-based, or multi-agent).
- Scoping capability maturity and rollout risk for new agent behaviors.
- Creating action loops, plans, workflows, or delegation logic.
- Writing tool definitions, MCP tools, schemas, or validation logic.
- Generating RAG pipelines, retrieval modules, or context injection.
- Building memory systems (session, long-term, episodic, task).
- Creating evaluation harnesses, observability plans, or safety gates.
- Preparing CI/CD, rollout, deployment, or production operational specs.
- Producing any template in
/references/or/assets/. - Implementing MCP servers or integrating Model Context Protocol.
- Setting up agent handoffs and orchestration patterns.
- Configuring multi-layer guardrails and safety controls.
- Evaluating whether to build an agent (build vs not decision).
- Calculating agent ROI, token costs, or cost/benefit analysis.
- Assessing hallucination risk and mitigation strategies.
- Deciding when to kill an agent project (kill triggers).
- For prompt scaffolds, retrieval tuning, or security depth, see Scope Boundaries below.
Scope Boundaries (Use These Skills for Depth)
- Prompt scaffolds & structured outputs â ai-prompt-engineering
- RAG retrieval & chunking â ai-rag
- Search tuning (BM25/HNSW/hybrid) â ai-rag
- Security/guardrails â ai-mlops
- Inference optimization â ai-llm-inference
Default Workflow (Production)
- Pick an architecture with the Decision Tree (below); default to workflow/FSM/DAG for production.
- Draft an agent spec with
assets/core/agent-template-standard.md(orassets/core/agent-template-quick.md). - Specify tools and handoffs with JSON Schema using
assets/tools/tool-definition.mdandreferences/api-contracts-for-agents.md. - Add retrieval only when needed; start with
assets/rag/rag-basic.mdand scale viaassets/rag/rag-advanced.md+references/rag-patterns.md. - Add eval + telemetry early via
references/evaluation-and-observability.md. - Run the go/no-go gate with
assets/checklists/agent-safety-checklist.md. - Plan deploy/rollback and safety controls via
references/deployment-ci-cd-and-safety.md.
Quick Reference
| Agent Type | Core Control Flow | Interfaces | MCP/A2A | When to Use |
|---|---|---|---|---|
| Workflow Agent (FSM/DAG) | Explicit state transitions | State store, tool allowlist | MCP | Deterministic, auditable flows |
| Tool-Using Agent | Route â call tool â observe | Tool schemas, retries/timeouts | MCP | External actions (APIs, DB, files) |
| RAG Agent | Retrieve â answer â cite | Retriever, citations, ACLs | MCP | Knowledge-grounded responses |
| Planner/Executor | Plan â execute steps with caps | Planner prompts, step budget | MCP (+A2A) | Multi-step problems with bounded autonomy |
| Multi-Agent (Orchestrated) | Delegate â merge â validate | Handoff contracts, eval gates | A2A | Specialization with explicit handoffs |
| OS Agent | Observe UI â act â verify | Sandbox, UI grounding | MCP | Desktop/browser control under strict guardrails |
| Code/SWE Agent | Branch â edit â test â PR | Repo access, CI gates | MCP | Coding tasks with review/merge controls |
Framework Selection (2026)
| Framework | Architecture | Best For | Ease |
|---|---|---|---|
| LangGraph | Graph-based, stateful | Enterprise, compliance, auditability | Medium |
| OpenAI Agents SDK | Tool-centric, lightweight | Fast prototyping, OpenAI ecosystem | Easy |
| Google ADK | Code-first, multi-language | Gemini/Vertex AI, polyglot teams | Medium |
| Pydantic AI | Type-safe, graph FSM | Production Python, type safety | Medium |
| CrewAI | Role-based crews | Team workflows, content generation | Easiest |
| AutoGen | Conversational | Code generation, research | Medium |
| AWS Bedrock Agents | Managed infrastructure | Enterprise AWS, knowledge bases | Easy |
See references/modern-best-practices.md for detailed framework comparison and selection guide.
Decision Tree: Choosing Agent Architecture
What does the agent need to do?
ââ Answer questions from knowledge base?
â ââ Simple lookup? â RAG Agent (LangChain/LlamaIndex + vector DB)
â ââ Complex multi-step? â Agentic RAG (iterative retrieval + reasoning)
â
ââ Perform external actions (APIs, tools, functions)?
â ââ 1-3 tools, linear flow? â Tool-Using Agent (LangGraph + MCP)
â ââ Complex workflows, branching? â Planning Agent (ReAct/Plan-Execute)
â
ââ Write/modify code autonomously?
â ââ Single file edits? â Tool-Using Agent with code tools
â ââ Multi-file, issue resolution? â Code/SWE Agent (HyperAgent pattern)
â
ââ Delegate tasks to specialists?
â ââ Fixed workflow? â Multi-Agent Sequential (A â B â C)
â ââ Manager-Worker? â Multi-Agent Hierarchical (Manager + Workers)
â ââ Dynamic routing? â Multi-Agent Group Chat (collaborative)
â
ââ Control desktop/browser?
â ââ OS Agent (Anthropic Computer Use + MCP for system access)
â
ââ Hybrid (combination of above)?
ââ Planning Agent that coordinates:
- Tool-using for actions (MCP)
- RAG for knowledge (MCP)
- Multi-agent for delegation (A2A)
- Code agents for implementation
Protocol Selection:
- Use MCP for: Tool access, data retrieval, single-agent integration
- Use A2A for: Agent-to-agent handoffs, multi-agent coordination, task delegation
Core Concepts (Vendor-Agnostic)
Control Flow Options
- Reactive: direct tool routing per user request (fast, brittle if unbounded).
- Workflow (FSM/DAG): explicit states and transitions (default for deterministic production).
- Planner/Executor: plan with strict budgets, then execute step-by-step (use when branching is unavoidable).
- Orchestrated multi-agent: separate roles with validated handoffs (use when specialization is required).
Memory Types (Tradeoffs)
- Short-term (session): cheap, ephemeral; best for conversational continuity.
- Episodic (task): scoped to a case/ticket; supports audit and replay.
- Long-term (profile/knowledge): high risk; requires consent, retention limits, and provenance.
Failure Handling (Production Defaults)
- Classify errors: retriable vs fatal vs needs-human.
- Bound retries: max attempts, backoff, jitter; avoid retry storms.
- Fallbacks: degraded mode, smaller model, cached answers, or safe refusal.
Do / Avoid
Do
- Do keep state explicit and serializable (replayable runs).
- Do enforce tool allowlists, scopes, and idempotency for side effects.
- Do log traces/metrics for model calls and tool calls (OpenTelemetry GenAI semantic conventions: https://opentelemetry.io/docs/specs/semconv/gen-ai/).
Avoid
- Avoid runaway autonomy (unbounded loops or step counts).
- Avoid hidden state (implicit memory that cannot be audited).
- Avoid untrusted tool outputs without validation/sanitization.
Navigation: Economics & Decision Framework
Should You Build an Agent?
- Build vs Not Decision Framework –
references/build-vs-not-decision.md- 10-second test (volume, cost, error tolerance)
- Red flags and immediate disqualifiers
- Alternatives to agents (usually better)
- Full decision tree with stage gates
- Kill triggers during development and post-launch
- Pre-build validation checklist
Agent ROI & Token Economics
- Agent Economics –
references/agent-economics.md- Token pricing by model (January 2026)
- Cost per task by agent type
- ROI calculation formula and tiers
- Hallucination cost framework and mitigation ROI
- Investment decision matrix
- Monthly tracking dashboard
Navigation: Core Concepts & Patterns
Governance & Maturity
- Agent Maturity & Governance –
references/agent-maturity-governance.md- Capability maturity levels (L0-L4)
- Identity & policy enforcement
- Fleet control and registry management
- Deprecation rules and kill switches
Modern Best Practices
- Modern Best Practices –
references/modern-best-practices.md- Model Context Protocol (MCP)
- Agent-to-Agent Protocol (A2A)
- Agentic RAG (Dynamic Retrieval)
- Multi-layer guardrails
- LangGraph over LangChain
- OpenTelemetry for agents
Context Management
- Context Engineering –
references/context-engineering.md- Progressive disclosure
- Session management
- Memory provenance
- Retrieval timing
- Multimodal context
Core Operational Patterns
- Operational Patterns –
references/operational-patterns.md- Agent loop pattern (PLAN â ACT â OBSERVE â UPDATE)
- OS agent action loop
- RAG pipeline pattern
- Tool specification
- Memory system pattern
- Multi-agent workflow
- Safety & guardrails
- Observability
- Evaluation patterns
- Deployment & CI/CD
Navigation: Protocol Implementation
-
MCP Practical Guide –
references/mcp-practical-guide.mdBuilding MCP servers, tool integration, and standardized data access -
MCP Server Builder –
references/mcp-server-builder.mdEnd-to-end checklist for workflow-focused MCP servers (design â build â test) -
A2A Handoff Patterns –
references/a2a-handoff-patterns.mdAgent-to-agent communication, task delegation, and coordination protocols -
Protocol Decision Tree –
references/protocol-decision-tree.mdWhen to use MCP vs A2A, decision framework, and selection criteria
Navigation: Agent Capabilities
-
Agent Operations –
references/agent-operations-best-practices.mdAction loops, planning, observation, and execution patterns -
RAG Patterns –
references/rag-patterns.mdContextual retrieval, agentic RAG, and hybrid search strategies -
Memory Systems –
references/memory-systems.mdSession, long-term, episodic, and task memory architectures -
Tool Design & Validation –
references/tool-design-specs.mdTool schemas, validation, error handling, and MCP integration
Skill Packaging & Sharing
-
Skill Lifecycle –
references/skill-lifecycle.mdScaffold, validate, package, and share skills with teams (Slack-ready) -
API Contracts for Agents –
references/api-contracts-for-agents.mdRequest/response envelopes, safety gates, streaming/async patterns, error taxonomy -
Multi-Agent Patterns –
references/multi-agent-patterns.mdManager-worker, sequential, handoff, and group chat orchestration -
OS Agent Capabilities –
references/os-agent-capabilities.mdDesktop automation, UI grounding, and computer use patterns -
Code/SWE Agents –
references/code-swe-agents.mdSE 3.0 paradigm, autonomous coding patterns, SWE-Bench, HyperAgent architecture
Navigation: Production Operations
-
Evaluation & Observability –
references/evaluation-and-observability.mdOpenTelemetry GenAI, metrics, LLM-as-judge, and monitoring -
Deployment, CI/CD & Safety –
references/deployment-ci-cd-and-safety.mdMulti-layer guardrails, HITL controls, NIST AI RMF, production checklists
Navigation: Templates (Copy-Paste Ready)
Checklists
- Agent Design & Safety Checklist –
assets/checklists/agent-safety-checklist.mdGo/No-Go safety gate: permissions, HITL triggers, eval gates, observability, rollback
Core Agent Templates
-
Standard Agent Template –
assets/core/agent-template-standard.mdFull production spec: memory, tools, RAG, evaluation, observability, safety -
Specialized Agent Template –
assets/core/agent-template-specialized.mdDomain-specific agents with custom capabilities and constraints -
Quick Agent Template –
assets/core/agent-template-quick.mdMinimal viable agent for rapid prototyping
RAG Templates
-
Basic RAG –
assets/rag/rag-basic.mdSimple retrieval-augmented generation pipeline -
Advanced RAG –
assets/rag/rag-advanced.mdContextual retrieval, reranking, and agentic RAG patterns -
Hybrid Retrieval –
assets/rag/hybrid-retrieval.mdSemantic + keyword search with BM25 fusion
Tool Templates
-
Tool Definition –
assets/tools/tool-definition.mdMCP-compatible tool schemas with validation and error handling -
Tool Validation Checklist –
assets/tools/tool-validation-checklist.mdTesting, security, and production readiness checks
Multi-Agent Templates
-
Manager-Worker Template –
assets/multi-agent/manager-worker-template.mdOrchestration pattern with task delegation and result aggregation -
Evaluator-Router Template –
assets/multi-agent/evaluator-router-template.mdDynamic routing with quality assessment and domain classification
Service Layer Templates
- FastAPI Agent Service –
../dev-api-design/assets/fastapi/fastapi-complete-api.mdAuth, pagination, validation, error handling; extend with model lifespan loads, SSE, background tasks
External Sources Metadata
- Curated References –
data/sources.jsonAuthoritative sources spanning standards, protocols, and production agent frameworks
Shared Utilities (Centralized patterns â extract, don’t duplicate)
- ../software-clean-code-standard/utilities/llm-utilities.md â Token counting, streaming, cost estimation
- ../software-clean-code-standard/utilities/error-handling.md â Effect Result types, correlation IDs
- ../software-clean-code-standard/utilities/resilience-utilities.md â p-retry v6, circuit breaker for API calls
- ../software-clean-code-standard/utilities/logging-utilities.md â pino v9 + OpenTelemetry integration
- ../software-clean-code-standard/utilities/observability-utilities.md â OpenTelemetry SDK, tracing, metrics
- ../software-clean-code-standard/utilities/testing-utilities.md â Test factories, fixtures, mocks
- ../software-clean-code-standard/references/clean-code-standard.md â Canonical clean code rules (
CC-*) for citation
Trend Awareness Protocol
IMPORTANT: When users ask recommendation questions about AI agents, you MUST use WebSearch to check current trends before answering.
If WebSearch is unavailable, use data/sources.json + any available web browsing tools, and explicitly state what you verified vs assumed.
Trigger Conditions
- “What’s the best agent framework for [use case]?”
- “What should I use for [multi-agent/tool use/orchestration]?”
- “What’s the latest in AI agents?”
- “Current best practices for [agent architecture/MCP/A2A]?”
- “Is [LangGraph/CrewAI/AutoGen] still relevant in 2026?”
- “[Agent framework A] vs [Agent framework B]?”
- “Best way to build [coding agent/RAG agent/OS agent]?”
- “What MCP servers are available?”
Required Searches
- Search:
"AI agent frameworks best practices 2026" - Search:
"[LangGraph/CrewAI/AutoGen/Semantic Kernel] comparison 2026" - Search:
"AI agent trends January 2026" - Search:
"MCP servers available 2026"
What to Report
After searching, provide:
- Current landscape: What agent frameworks are popular NOW
- Emerging trends: New patterns gaining traction (MCP, A2A, agentic coding)
- Deprecated/declining: Frameworks or patterns losing relevance
- Recommendation: Based on fresh data, not just static knowledge
Example Topics (verify with fresh search)
- Agent frameworks (LangGraph, CrewAI, AutoGen, Semantic Kernel, Pydantic AI)
- MCP ecosystem (available servers, new integrations)
- Agentic coding (Codex CLI, Claude Code, Cursor, Windsurf, Cline)
- Multi-agent patterns (hierarchical, collaborative, competitive)
- Tool use protocols (MCP, function calling)
- Agent evaluation (SWE-Bench, AgentBench, GAIA)
- OS/computer use agents (computer-use APIs, browser automation)
Related Skills
This skill integrates with complementary skills:
Core Dependencies
../ai-llm/– LLM patterns, prompt engineering, and model selection for agents../ai-rag/– Deep RAG implementation: chunking, embedding, reranking../ai-prompt-engineering/– System prompt design, few-shot patterns, reasoning strategies
Production & Operations
../qa-observability/– OpenTelemetry, metrics, distributed tracing../software-security-appsec/– OWASP Top 10, input validation, secure tool design../ops-devops-platform/– CI/CD pipelines, deployment strategies, infrastructure
Supporting Patterns
../dev-api-design/– REST/GraphQL design for agent APIs and tool interfaces../ai-mlops/– Model deployment, monitoring, drift detection../qa-debugging/– Agent debugging, error analysis, root cause investigation
Usage pattern: Start here for agent architecture, then reference specialized skills for deep implementation details.
Usage Notes
- Modern Standards: Default to MCP for tools, agentic RAG for retrieval, handoff-first for multi-agent
- Lightweight SKILL.md: Use this file for quick reference and navigation
- Drill-down resources: Reference detailed resources for implementation guidance
- Copy-paste templates: Use templates when the user asks for structured artifacts
- External sources: Reference
data/sources.jsonfor authoritative documentation links - No theory: Never include theoretical explanations; only operational steps
Key Modern Migrations
Traditional â Modern:
- Custom APIs â Model Context Protocol (MCP)
- Static RAG â Agentic RAG with contextual retrieval
- Ad-hoc handoffs â Versioned handoff APIs with JSON Schema
- Single guardrail â Multi-layer defense (5+ layers)
- LangChain agents â LangGraph stateful workflows
- Custom observability â OpenTelemetry GenAI standards
- Model-centric â Context engineering-centric
AI-Native SDLC Template
- Use
assets/agent-template-ainative-sdlc.mdfor the Delegate â Review â Own runbook (guardrails + outputs checklist).