ai-agent-prd
npx skills add https://github.com/okwinds/miscellany --skill ai-agent-prd
Agent 安装分布
Skill 文档
AI Agent PRD Guide
Overview
Write PRDs for AI Agent products that define not just what the agent does, but how it thinks, decides, and acts.
Relationship with Other Skills
This skill extends prd-writing-guide for AI Agent products specifically. You should:
- Apply
prd-writing-guide‘s Seven Lenses to each agent capability - Follow
prd-writing-guide‘s Writing Style Guide for requirement clarity - Use
prd-writing-guide‘s Developer Test as your quality bar
Handoff: The Agent PRD this skill produces feeds into prd-to-engineering-spec for technical design. That skill includes an Agent-specific validation branch for converting agent capabilities into engineering specs.
Traditional PRD: Input â Deterministic Logic â Output
Agent PRD: Goal â Perceive â Think â Decide â Act â Learn
â â
ââââââââââ Feedback âââââââââ
You're not defining a function. You're defining a cognitive architecture.
Quality Test
Can your engineering team answer these without asking you?
- What is the agent’s purpose and identity?
- What capabilities (skills/tools) does it have?
- How does it decide what to do?
- What can it NOT do? (boundaries)
- When should humans intervene?
- How do we know if it’s working well?
Quick Start
-
Generate a document skeleton:
bash scripts/generate_agent_prd_skeleton.sh ./docs/agent-prd "Customer Support Agent" -
Fill in using templates from references
-
Validate completeness with checklist
Note: The skeleton generator writes a set of .md files into your output directory. Use a new/empty folder to avoid accidental overwrites.
Workflow
Phase 1: Agent Identity ââââââ⺠Who is the agent? What's its purpose?
â
Phase 2: Capability Architecture ââ⺠Skills, Tools, Memory, RAG, Workflows
â
Phase 3: Behavior & System Prompt â⺠How does it think? What's its DNA?
â
Phase 4: Conversation Design ââââ⺠Golden conversations, example behaviors
â
Phase 5: Safety & Guardrails ââââ⺠What can't it do? Human oversight?
â
Phase 6: Evaluation Framework âââ⺠How do we measure success?
â
Phase 7: Operational Model ââââââ⺠Cost, scaling, iteration
Phase 1: Agent Identity
Goal: Define who the agent is and its relationship with users.
Key Elements
| Element | Questions to Answer |
|---|---|
| Persona | Name, role, personality, expertise domain |
| Mission | Why does this agent exist? |
| Boundaries | What it IS vs what it is NOT |
| User Relationship | Copilot, Autopilot, Peer, Expert, or Executor? |
User-Agent Relationship Models
| Model | Description | Example |
|---|---|---|
| Copilot | Human leads, agent assists | Code completion |
| Autopilot | Agent leads, human monitors | Customer support |
| Peer | Equal collaboration | Brainstorming |
| Expert | Agent advises, human decides | Medical advisor |
| Executor | Human commands, agent executes | Task automation |
Phase 2: Capability Architecture
Goal: Define the building blocks that enable agent capabilities.
Capability Stack
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
â SKILLS TOOLS WORKFLOWS â
â (What it (External (Multi-step â
â can do) actions) processes) â
â ââââââââââââââââ¼âââââââââââââââ â
â â â
â AGENT CORE (Reasoning, Planning) â
â â â
â ââââââââââââââââ¼âââââââââââââââ â
â MEMORY RAG CONTEXT â
â (State/History) (Knowledge) (Awareness) â
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
2.1 Skills
Reusable capability modules. See skills-specification.md.
Per skill, document:
- Purpose & trigger conditions
- Input/output specification
- Process logic
- Examples & boundaries
2.2 Tools
External actions the agent can invoke. See tools-specification.md.
Per tool, document:
- Interface definition (JSON schema)
- Execution details (endpoint, auth, timeout)
- Response handling
- Safety requirements (confirmation, audit)
2.3 Memory
Stateful, context-aware behavior. See memory-patterns.md.
| Type | Scope | Example |
|---|---|---|
| Working | Current request | Context window |
| Session | Current session | Conversation history |
| Long-term | Cross-session | User preferences |
2.4 Knowledge (RAG)
Knowledge grounding via retrieval. See memory-patterns.md for architecture patterns.
Per knowledge source, document:
| Attribute | Specify |
|---|---|
| Source | What data source? (docs, DB, API, web) |
| Format | Document types, data structure |
| Volume | How much data? Growth rate? |
| Freshness | Update frequency? Acceptable staleness? |
| Authority | Is this authoritative? What if conflicting sources? |
Retrieval configuration:
- Chunking strategy (semantic, fixed-size, hybrid) and chunk size rationale
- Embedding model and dimension
- Retrieval method (dense, sparse, hybrid) and top-k range
- Re-ranking strategy (if any)
- Quality threshold (minimum similarity score for inclusion)
Knowledge gap handling:
- How does the agent detect it doesn’t know something?
- Response when knowledge is insufficient (admit? search? escalate?)
- Citation requirements (when must it cite? format? inline or footnote?)
Knowledge conflict resolution:
- When multiple sources disagree, which takes priority?
- Should the agent present conflicting views or choose one?
2.5 Workflows
Multi-step orchestrated processes. Document:
- Trigger and steps with success criteria
- Human checkpoints
- Timeout and cancellation handling
Phase 3: Behavior & System Prompt
Goal: Define how the agent thinks, decides, communicatesâand encode it into a System Prompt specification.
Reasoning Strategies
| Strategy | Description | Use When |
|---|---|---|
| ReAct | Think â Act â Observe â Repeat | Most tasks |
| Plan-then-Execute | Full plan upfront â Execute | Complex multi-step |
| Tree of Thought | Explore multiple paths | Exploration needed |
| Reflexion | Self-critique and improve | Quality-critical |
See agent-patterns.md for detailed patterns.
Decision Framework
Define priority order for agent decisions:
- Safety first
- User intent
- Efficiency
- Quality
Conversation Design
| Aspect | Define |
|---|---|
| Voice & Tone | Persona, formality, verbosity |
| Response Patterns | By scenario (simple, complex, error, out-of-scope) |
| Multi-turn | Context retention, topic switching, reference resolution |
System Prompt Specification â Core Deliverable
The System Prompt is the agent’s DNA. The PRD must produce a System Prompt Design Spec (not the final prompt text, but its design intent). See system-prompt-design.md.
Required sections in the System Prompt Spec:
| Section | Content | Example |
|---|---|---|
| Identity Declaration | Who the agent is, role, personality | “You are Aria, a senior financial advisor…” |
| Capability Declaration | What tools/skills are available, when to use each | “You have access to: search_docs, calculate…” |
| Behavioral Instructions | How to reason, when to ask vs act, output style | “Always explain your reasoning before acting…” |
| Constraint Boundaries | What the agent must never do | “Never provide medical diagnoses…” |
| Output Format Rules | Response structure, length, formatting | “Use bullet points for lists of 3+…” |
| Escalation Rules | When and how to hand off to humans | “If user mentions legal action, transfer to…” |
Phase 4: Example Conversations (Golden Conversations)
Goal: Define concrete conversation examples that serve as both behavioral spec and acceptance criteria.
See conversation-design.md for detailed methodology.
Why Golden Conversations Matter
For Agent products, example conversations are the most precise behavioral specification. They are:
- Acceptance criteria (does the agent behave like this example?)
- Training signals (few-shot examples in the system prompt)
- Evaluation dataset (automated quality testing)
- Stakeholder alignment tool (shows exactly what “good” looks like)
Coverage Requirements
Design golden conversations for each of these scenario types:
| Scenario Type | Count | Purpose |
|---|---|---|
| Happy path | 2-3 per use case | Shows ideal agent behavior |
| Edge cases | 1-2 per use case | Shows boundary handling |
| Safety boundaries | 3-5 total | Shows refusal/escalation |
| Multi-turn complex | 2-3 total | Shows context management |
| Context switching | 1-2 total | Shows topic change handling |
| Error recovery | 2-3 total | Shows tool failure handling |
| Out-of-scope | 2-3 total | Shows graceful boundary enforcement |
Conversation Annotation Format
Each golden conversation should include:
## Conversation: [Scenario Name]
**Type:** [happy-path | edge-case | safety | multi-turn | error]
**Tests:** [Which capabilities/rules this validates]
### Dialogue
User: [input]
Agent: [expected response]
// Annotation: [Why this response is correct. What rules apply.]
User: [follow-up]
Agent: [expected response]
// Annotation: [Key behavior being demonstrated]
### Unacceptable Alternatives
- Agent should NOT: [describe bad behavior]
- Agent should NOT: [describe bad behavior]
### Evaluation Criteria
- [ ] [Checkable criterion 1]
- [ ] [Checkable criterion 2]
Phase 5: Safety & Guardrails
Goal: Define boundaries, controls, and human oversight.
See safety-checklist.md for comprehensive checklist.
5.1 Capability Boundaries
| Category | Document |
|---|---|
| CAN DO | Authorized actions with conditions |
| CANNOT DO | Prohibited actions with response |
| MUST ASK | Actions requiring confirmation |
5.2 Human-in-the-Loop
Define when humans must intervene:
- Approval triggers and workflow
- Escalation paths
- Override capabilities
5.3 Guardrails
Input Guardrails:
- Prompt injection protection
- Harmful request detection
- Input validation
Output Guardrails:
- Harmful content filtering
- PII leakage prevention
- Hallucination detection
5.4 Error Handling
| Error Type | Document |
|---|---|
| Tool failure | Detection, message, recovery |
| Knowledge gap | Detection, message, fallback |
| Reasoning failure | Detection, restart/escalate |
Phase 6: Evaluation Framework
Goal: Define how to measure agent quality and success.
See evaluation-rubrics.md for detailed rubrics.
Core Metrics
| Dimension | Metrics |
|---|---|
| Task Success | Completion rate, first-turn resolution |
| Quality | Accuracy, relevance, completeness |
| Safety | Harmful response rate, boundary violations |
| Efficiency | Latency, token usage, cost |
| User Experience | CSAT, NPS, escalation rate |
Evaluation Methods
| Method | Purpose | Frequency |
|---|---|---|
| Automated Testing | Regression, benchmarks | Every change |
| Human Evaluation | Quality assessment | Weekly |
| LLM-as-Judge | Scalable quality scoring | Continuous |
| Red Team Testing | Adversarial testing | Quarterly |
| A/B Testing | Compare variants | As needed |
Phase 7: Operational Model
7.1 Cost Model
| Component | Document |
|---|---|
| Per-request costs | LLM tokens, embeddings, tool calls |
| Projected costs | By scale (launch, 6 months, 1 year) |
| Cost controls | Budgets, alerts, throttling |
7.2 Scaling & Iteration
- Scaling strategy (horizontal, rate limiting, caching)
- Feedback collection mechanisms
- Continuous improvement cycle
- Version management
Output Structure
agent-prd/
âââ AGENT_PRD.md # Main document
âââ IDENTITY.md # Agent persona & boundaries
âââ USE_CASES.md # Users and use cases
âââ SKILLS.md # Skills specification
âââ TOOLS.md # Tools specification
âââ MEMORY.md # Memory architecture
âââ KNOWLEDGE.md # RAG configuration
âââ WORKFLOWS.md # Workflow definitions
âââ BEHAVIOR.md # Reasoning & conversation
âââ SYSTEM_PROMPT_SPEC.md # System prompt design specification â
âââ CONVERSATIONS.md # Golden conversations â
âââ SAFETY.md # Guardrails
âââ EVALUATION.md # Metrics & testing
âââ EXAMPLES.md # Additional example interactions
âââ CHECKLIST.md # Completion checklist
Resources
Scripts:
scripts/generate_agent_prd_skeleton.sh– Generate PRD structure
Core References:
references/agent-prd-template.md– Complete PRD templatereferences/skills-specification.md– Skill definition guidereferences/tools-specification.md– Tool definition guidereferences/memory-patterns.md– Memory architecture patternsreferences/agent-patterns.md– Reasoning & architecture patternsreferences/conversation-design.md– Golden conversation methodology âreferences/worked-example.md– End-to-end worked example (HelpBot agent) â
Safety & Evaluation:
references/safety-checklist.md– Safety requirementsreferences/evaluation-rubrics.md– Evaluation frameworks
Advanced Topics:
references/multi-agent-design.md– Multi-agent system designreferences/system-prompt-design.md– System prompt engineeringreferences/multimodal-design.md– Multi-modal agent designreferences/observability-operations.md– Monitoring & operationsreferences/protocols-standards.md– MCP, protocols, standardsreferences/domain-specific-design.md– Domain-specific guidance
Extensibility & Future-Proofing
This skill is designed to evolve with Agent technology:
| Current | Future-Ready |
|---|---|
| Text I/O | Multimodal (vision, audio, video) |
| Single Agent | Multi-Agent orchestration |
| Custom tools | Protocol standards (MCP, Agent Protocol) |
| Basic metrics | Full observability stack |
| Generic | Domain-specific extensions |
Adding new capabilities:
- Add reference file in
references/ - Update SKILL.md Resources section
- Extend PRD template if needed
Summary: Agent PRD Principles
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
â 1. DEFINE IDENTITY - Who is this agent? Not just features. â
â 2. SPECIFY CAPABILITIES - Skills, Tools, Memory, Knowledge. â
â 3. DESIGN THE PROMPT - System Prompt is the agent's DNA. â
â 4. SHOW, DON'T TELL - Golden conversations are the spec. â
â 5. BOUND THE BEHAVIOR - What it CAN'T do matters equally. â
â 6. EVALUATE CONTINUOUSLY - Define metrics before building. â
â 7. HUMANS IN THE LOOP - Know when to escalate, always. â
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
The goal is to architect cognitionâdefine how an intelligent system should think, decide, and act within safe boundaries.