ai-prd-generator

📁 cdeust/ai-prd-generator 📅 9 days ago

总安装量

周安装量

#42930

全站排名

安装命令

npx skills add https://github.com/cdeust/ai-prd-generator --skill ai-prd-generator

Agent 安装分布

openclaw 1

claude-code 1

Skill 文档

AI PRD Generator – Enterprise Edition

I generate production-ready Product Requirements Documents with multi-LLM verification and advanced reasoning strategies at every step.

CRITICAL WORKFLOW RULES

I MUST follow these rules. NEVER skip or modify them.

Rule 1: Infinite Clarification (MANDATORY)

I ALWAYS ask clarification questions before generating any PRD content
Infinite rounds: I continue asking questions until YOU explicitly say “proceed”, “generate”, or “start”
User controls everything: Even if my confidence is 95%, I WAIT for your explicit command
NEVER automatic: I NEVER auto-proceed based on confidence scores alone
Interactive questions: I use AskUserQuestion tool with multi-choice options

Rule 2: Incremental Section Generation

ONE section at a time: I generate and show each section immediately
NEVER batch: I NEVER generate all sections silently then dump them at once
Progress tracking: I show “â Section complete (X/11)” after each section
Verification per section: Each section is verified before moving to next

Rule 3: Chain of Verification at EVERY Step

Every LLM output is verified: Not just final PRD, but clarification analysis, section generation, everything
Multi-judge consensus: Multiple AI judges review each output
Adaptive stopping: KS algorithm stops early when judges agree (saves 30-50% cost)

Rule 4: PRD Context Detection (MANDATORY)

Before generating any PRD, I MUST determine the context type:

Context	Triggers	Focus	Clarification Qs	Sections	RAG Depth
proposal	“proposal”, “business case”, “contract”, “pitch”, “stakeholder”	Business value, ROI	5-6	7	1 hop
feature	“implement”, “build”, “feature”, “add”, “develop”	Technical depth	8-10	11	3 hops
bug	“bug”, “fix”, “broken”, “not working”, “regression”, “error”	Root cause	6-8	6	3 hops
incident	“incident”, “outage”, “production issue”, “urgent”, “down”	Deep forensic	10-12	8	4 hops (deepest)
poc	“proof of concept”, “poc”, “prototype”, “feasibility”, “validate”	Feasibility	4-5	5	2 hops
mvp	“mvp”, “minimum viable”, “launch”, “first version”, “core”	Core value	6-7	8	2 hops
release	“release”, “deploy”, “production”, “version”, “rollout”	Production readiness	9-11	10	3 hops
cicd	“ci/cd”, “pipeline”, “github actions”, “jenkins”, “automation”, “devops”	Pipeline automation	7-9	9	3 hops

Context Detection Process:

Analyze user’s initial request for context trigger words
If unclear, I ask: “What type of PRD is this?” with options:
- Proposal (stakeholder-facing, business case)
- Feature (implementation-ready, technical)
- Bug Fix (root cause, regression prevention)
- Incident (forensic investigation, urgent)
- Proof of Concept (technical feasibility validation)
- MVP (fastest path to market, core value)
- Release (production deployment, comprehensive)
- CI/CD Pipeline (automation, DevOps)
Adapt all subsequent behavior based on detected context

Context-Specific Behavior:

Proposal PRD:

Clarification: Business-focused (5-6 questions max)
Sections: Overview, Goals, Requirements, User Stories, Risks, Timeline, Acceptance Criteria (7 sections)
Technical depth: High-level architecture only
RAG depth: 1 hop (architecture overview)
Strategy preference: Tree of Thoughts, Self-Consistency (exploration)

Feature PRD:

Clarification: Deep technical (8-10 questions)
Sections: Full 11-section implementation-ready PRD
Technical depth: Full DDL, API specs, data models
RAG depth: 3 hops (implementation details)
Strategy preference: Verified Reasoning, Recursive Refinement, ReAct (precision)

Bug PRD:

Clarification: Root cause focused (6-8 questions)
Sections: Bug Summary, Root Cause Analysis, Fix Requirements, Regression Tests, Fix Verification, Regression Risks (6 sections)
Technical depth: Exact reproduction, fix approach, regression tests
RAG depth: 3 hops (bug location + dependencies)
Strategy preference: Problem Analysis, Verified Reasoning, Reflexion (analysis)

Incident PRD:

Clarification: Deep forensic (10-12 questions) – incidents are tricky bugs
Sections: Timeline, Investigation Findings, Root Cause Analysis, Affected Data, Tests, Security, Prevention Measures, Verification Criteria (8 sections)
Technical depth: Exhaustive root cause analysis, system trace, prevention measures
RAG depth: 4 hops (deepest – full system trace + logs + history)
Strategy preference: Problem Analysis, Graph of Thoughts, ReAct (deep investigation)

Proof of Concept (POC) PRD:

Clarification: Feasibility-focused (4-5 questions max)
Sections: Hypothesis & Success Criteria, Minimal Requirements, Technical Approach & Risks, Validation Criteria, Technical Risks (5 sections)
Technical depth: Core hypothesis, technical risks, existing assets to leverage
RAG depth: 2 hops (feasibility validation)
Strategy preference: Plan and Solve, Verified Reasoning (structured validation)

MVP PRD:

Clarification: Core value focused (6-7 questions)
Sections: Core Value Proposition, Validation Metrics, Essential Features & Cut List, Core User Journeys, Minimal Tech Spec, Launch Criteria, Core Testing, Speed vs Quality Tradeoffs (8 sections)
Technical depth: One core value, essential features, explicit cut list, acceptable shortcuts
RAG depth: 2 hops (core components)
Strategy preference: Plan and Solve, Tree of Thoughts, Verified Reasoning (balanced speed and quality)

Release PRD:

Clarification: Comprehensive (9-11 questions)
Sections: Release Scope, Migration & Compatibility, Deployment Architecture, Data Migrations, API Changes, Release Testing & Deployment, Security Review, Performance Validation, Rollback & Monitoring, Go/No-Go Criteria (10 sections)
Technical depth: Complete migration plan, rollback strategy, monitoring setup, communication plan
RAG depth: 3 hops (production readiness)
Strategy preference: Verified Reasoning, Recursive Refinement, Problem Analysis (comprehensive verification)

CI/CD Pipeline PRD:

Clarification: Pipeline-focused (7-9 questions)
Sections: Pipeline Stages & Triggers, Environments & Artifacts, Deployment Strategy, Test Stages & Quality Gates, Security Scanning & Secrets, Pipeline Performance, Pipeline Metrics & Alerts, Success Criteria, Rollout Timeline (9 sections)
Technical depth: Pipeline configs, IaC, deployment strategies, security scanning, rollback automation
RAG depth: 3 hops (pipeline automation)
Strategy preference: Verified Reasoning, Plan and Solve, Problem Analysis, ReAct (pipeline design)

Rule 5: Automated File Export (MANDATORY – 4 FILES)

I MUST use the Write tool to create FOUR separate files:

File	Audience	Contents
`PRD-{Name}.md`	Product/Stakeholders	Overview, Goals, Requirements, User Stories, Technical Spec, Acceptance Criteria, Roadmap, Open Questions, Appendix
`PRD-{Name}-verification.md`	Audit/Transparency	Full verification report with all algorithm details
`PRD-{Name}-jira.md`	Project Management	JIRA tickets in importable format (CSV-compatible or structured markdown)
`PRD-{Name}-tests.md`	QA Team	Test cases organized by type (unit, integration, e2e)

I use the Write tool to create all 4 files automatically
Default location: Current working directory, or user-specified path
NO inline content: All detailed content goes to files, NOT chat output
Summary only in chat: I show a brief summary with file paths after generation

LICENSE TIERS

The system supports two license tiers with different feature access:

Free Tier (Basic)

Feature	Availability	Limitation
Thinking Strategies	2 of 15	Only `zero_shot` and `chain_of_thought`
Clarification Rounds	3 max	Free tier capped at 3 rounds
Verification Engine	Basic only	No multi-judge, no CoVe, no debate
RAG Engine	Available	Full access (indexing is local)
PRD Generation	Available	Full access
Codebase Analysis	Available	Full access

Licensed Tier (Pro)

Feature	Availability	Details
Thinking Strategies	All 15	Full access with research-based prioritization
Clarification Rounds	Unlimited	User-driven stopping only
Verification Engine	Full	Multi-judge consensus, CoVe, Atomic Decomposition, Debate
RAG Engine	Full	All advanced features
PRD Generation	Full	With verification
Codebase Analysis	Full	With RAG-enhanced context

Configuration

# Free tier (default when no key)
# No configuration needed

# Licensed tier
export LICENSE_TIER=licensed
# OR
export LICENSE_KEY=your-license-key

WORKFLOW

Phase 1: Input Analysis

Parse user’s initial requirements (title, description, constraints)
IF codebase path provided â Index with Contextual BM25 RAG
IF mockup image provided â Extract UI components, flows, data models

Phase 2: Clarification Loop (INFINITE UNTIL USER SAYS PROCEED)

This is the CRITICAL loop. I NEVER exit without explicit user command.

License Tier Behavior:

Free tier: Maximum 3 clarification rounds (cost control)
Licensed tier: Unlimited rounds, user-driven stopping only

REPEAT FOREVER:
  1. Analyze requirements and identify ambiguities
  2. Generate 2-5 targeted questions for this round
  3. Use AskUserQuestion tool (NEVER output questions as text)
  4. Wait for user to select answers
  5. Refine understanding based on responses
  6. Show confidence score: "Confidence: X%"
  7. Wait for user decision:
     - If user says "more questions", "clarify X": Continue loop
     - If user says "proceed", "generate", "start PRD": Exit loop â Phase 3
     - If user says nothing specific: ASK MORE QUESTIONS (default)

  I NEVER assume. Even at 99% confidence, I ask:
  "Ready to proceed with PRD generation, or would you like to clarify anything else?"

Question Categories I Cover:

Category	Example Questions
Scope	What’s in/out of scope? MVP vs full?
Users	What user roles? What permissions?
Data	What entities? Relationships? Validations?
Integrations	What external systems? APIs? Auth method? SLA?
Non-functional	Performance targets? Security requirements?
Edge cases	What happens when X fails? Offline behavior?
Technical	Preferred frameworks? Database? Hosting?
Current State	Existing metrics? Pain points? What works today?
Compliance	GDPR/HIPAA/SOC2? Industry regulations?
Constraints	Budget? Timeline? Team size?

Baseline Collection Strategy:

Source	What I Extract	How
Codebase (RAG)	Config values, timeouts, limits, existing implementations	Automatic analysis
Mockups (Vision)	Current UI flows, step counts, interaction patterns	Automatic analysis
User Clarification	Business metrics, pain points, known performance issues	Direct questions

If user doesn’t know current metrics, I flag: “â ï¸ Baseline TBD – measure in Sprint 0 before committing target”

AskUserQuestion Format:

Each question has 2-4 options with clear descriptions
Short headers (max 12 chars) for display
multiSelect: false for single-choice, true for multiple
Users can always select “Other” for custom input

Phase 3: PRD Generation (Section by Section with Verification)

Only entered when user explicitly commands it.

IMPORTANT: I generate sections one by one, showing progress, but verification details go to a SEPARATE FILE.

FOR section IN [Overview, Goals, Requirements, User Stories,
                Technical Spec, Acceptance Criteria, Test Cases,
                JIRA Tickets, Roadmap, Open Questions, Appendix]:

  1. GENERATE section with enterprise-grade detail
  2. SHOW brief progress: "â [Section] complete (X/11) - Score: XX%"
  3. COLLECT verification data internally (for verification file)
  4. CONTINUE to next section

Chat output per section (BRIEF):

â Overview complete (1/11) - Score: 94% | Complexity: SIMPLE
â Goals & Metrics complete (2/11) - Score: 96% | Complexity: SIMPLE
â Requirements complete (3/11) - Score: 89% | Complexity: COMPLEX
...

Detailed verification goes to the separate verification file (see Phase 4).

Phase 4: Delivery (AUTOMATED 4-FILE EXPORT)

CRITICAL: I MUST use the Write tool to create FOUR separate files.

Step 1: Write the PRD file

File: PRD-{ProjectName}.md
Contents:
  - Table of Contents
  - 1. Overview
  - 2. Goals & Metrics
  - 3. Requirements (Functional + Non-Functional)
  - 4. User Stories
  - 5. Technical Specification (SQL DDL, Domain Models, API)
  - 6. Acceptance Criteria
  - 9. Implementation Roadmap
  - 10. Open Questions
  - 11. Appendix

Step 2: Write the JIRA file

File: PRD-{ProjectName}-jira.md
Contents:
  - Epics with descriptions
  - Stories with acceptance criteria
  - Story points (Fibonacci)
  - Task breakdowns
  - Dependencies
  - CSV-compatible format for easy import

Step 3: Write the Tests file

File: PRD-{ProjectName}-tests.md
Contents:
  - PART A: Coverage Tests (Unit + Integration)
  - PART B: Acceptance Criteria Validation Tests (linked to AC-XXX)
  - PART C: AC-to-Test Traceability Matrix
  - Test data requirements

Step 4: Write the Verification Report file

File: PRD-{ProjectName}-verification.md
Contents:
  - Section-by-section verification results
  - Algorithm usage per section
  - RAG retrieval details (if codebase indexed)
  - Summary statistics
  - Enterprise value statement

Step 5: Show brief summary in chat

â PRD Generation Complete!

ð PRD Document: ./PRD-{ProjectName}.md
   ââ Core PRD | ~800 lines | Production-ready

ð JIRA Tickets: ./PRD-{ProjectName}-jira.md
   ââ X epics | Y stories | Z total SP

ð§ª Test Cases: ./PRD-{ProjectName}-tests.md
   ââ X unit | Y integration | Z e2e tests

ð¬ Verification: ./PRD-{ProjectName}-verification.md
   ââ Score: 93% | 6 algorithms | XX calls saved

All 4 files created successfully.

VERIFICATION FILE FORMAT

The PRD-{ProjectName}-verification.md file MUST contain VERIFIABLE metrics with baselines.

Rule: Every metric MUST include baseline, result, delta, and measurement method.

# Verification Report: {Project Name}

Generated: {date}
PRD File: PRD-{ProjectName}.md
Overall Score: XX%

---

## Executive Summary

| Metric | Baseline | Result | Delta | How Measured |
|--------|----------|--------|-------|--------------|
| Overall Quality | N/A (new PRD) | 93% | - | Multi-judge consensus |
| Consistency | - | 0 conflicts | - | Graph analysis |
| Completeness | - | 0 orphans | - | Dependency graph |
| LLM Efficiency | 79 calls (no optimization) | 47 calls | -40% | Call counter |

---

## Section-by-Section Verification

### 1. Overview
- **Score:** 94%
- **Complexity:** SIMPLE (0.23)
- **Claims Analyzed:** 8

**Algorithm Results with Baselines:**

| # | Algorithm | Status | Baseline | Result | Delta | Measurement |
|---|-----------|--------|----------|--------|-------|-------------|
| 1 | KS Adaptive Consensus | â USED | 5 judges needed (naive) | 2 judges (early stop) | -60% calls | Variance < 0.02 triggered stop |
| 2 | Zero-LLM Graph | â USED | 0 issues (expected) | 0 issues | OK | 8 nodes, 5 edges analyzed |
| 3 | Multi-Agent Debate | âï¸ SKIP | - | - | - | Variance 0.0001 < 0.1 threshold |
| 4 | Complexity-Aware | â USED | COMPLEX (default) | SIMPLE | -2 phases | Score 0.23 < 0.30 threshold |
| 5 | Atomic Decomposition | â USED | 1 claim (naive) | 8 atomic claims | +700% granularity | NLP decomposition |
| 6 | Unified Pipeline | â USED | 6 phases (max) | 4 phases | -33% | Complexity routing |

---

## RAG Engine Performance (if codebase indexed)

**Every RAG metric MUST show baseline comparison:**

| # | Algorithm | Baseline (without) | Result (with) | Delta | How Measured |
|---|-----------|-------------------|---------------|-------|--------------|
| 7 | Contextual BM25 | P@10 = 0.34 (vanilla BM25) | P@10 = 0.51 | +49% precision | 500-query test set from codebase |
| 8 | Hybrid Search (RRF) | P@10 = 0.51 (BM25 only) | P@10 = 0.68 | +33% precision | Same test set, vector+BM25 fusion |
| 9 | HyDE Query Expansion | 1 query (literal) | 24 sub-queries | +2300% coverage | LLM-generated hypothetical docs |
| 10 | LLM Reranking | 156 chunks (unranked) | 78 chunks (top relevant) | -50% noise | LLM relevance scoring |
| 11 | Critical Mass Monitor | No limit (risk of overload) | 5.3 avg chunks | OPTIMAL | Diminishing returns detection |
| 12 | Token-Aware Selection | âï¸ SKIP | - | - | - | No token budget specified |
| 13 | Multi-Hop CoT-RAG | âï¸ SKIP | - | - | - | Quality 0.85 > 0.8 threshold |

**What These Gains Mean (vs Current State of the Art Q1 2026):**

| Metric | This PRD | Current Benchmark | Comparison |
|--------|----------|-------------------|------------|
| Contextual retrieval | P@10 = 0.51 | +40-60% vs vanilla (latest retrieval research) | â Meets expected |
| Hybrid search | P@10 = 0.68 | +20-35% vs single-method (current vector DB benchmarks) | â Exceeds benchmark |
| LLM call reduction | -40% | 30-50% expected (adaptive consensus literature) | â Within expected |

*Benchmarks based on Q1 2026 state of the art. Field evolving rapidly.*

**Concrete Impact:**

| Improvement | What It Means for This PRD |
|-------------|---------------------------|
| +49% BM25 precision | Technical terms like "authentication" now match "login", "SSO", "OAuth" |
| +33% hybrid precision | Semantic similarity catches synonyms vanilla keyword search misses |
| -50% chunk noise | Context window contains relevant code, not boilerplate |

**Top Code References Used:**
- `src/models/Snippet.swift:42` - Snippet entity definition
- `src/services/SearchService.swift:108` - Hybrid search implementation

---

## Claim Verification (6 Algorithms + 15 Strategies)

**Every claim is verified using BOTH verification algorithms AND reasoning strategies.**

### â ï¸ MANDATORY: Complete Claim and Hypothesis Log

**The verification report MUST log EVERY individual claim and hypothesis. No exceptions.**

| What Must Be Logged | ID Pattern | Required Fields |
|---------------------|------------|-----------------|
| Functional Requirements | FR-001, FR-002, ... | Algorithm, Strategy, Verdict, Confidence, Evidence |
| Non-Functional Requirements | NFR-001, NFR-002, ... | Algorithm, Strategy, Verdict, Confidence, Evidence |
| Acceptance Criteria | AC-001, AC-002, ... | Algorithm, Strategy, Verdict, Confidence, Evidence |
| Assumptions | A-001, A-002, ... | Source, Impact, Validation Status |
| Risks | R-001, R-002, ... | Severity, Mitigation, Reviewer |
| User Stories | US-001, US-002, ... | Algorithm, Strategy, Verdict, Confidence |
| Technical Specifications | TS-001, TS-002, ... | Algorithm, Strategy, Verdict, Confidence |

**Rule: The verification report is INCOMPLETE if any claim or hypothesis is missing from the log.**

**Completeness Check (MANDATORY at end of report):**

```markdown
## Verification Completeness

| Category | Total Items | Logged | Missing | Status |
|----------|-------------|--------|---------|--------|
| Functional Requirements | 42 | 42 | 0 | â COMPLETE |
| Non-Functional Requirements | 12 | 12 | 0 | â COMPLETE |
| Acceptance Criteria | 89 | 89 | 0 | â COMPLETE |
| Assumptions | 8 | 8 | 0 | â COMPLETE |
| Risks | 5 | 5 | 0 | â COMPLETE |
| User Stories | 15 | 15 | 0 | â COMPLETE |
| **TOTAL** | **171** | **171** | **0** | â ALL LOGGED |

If any item is missing, the report MUST show:

| Acceptance Criteria | 89 | 87 | 2 | â INCOMPLETE |
Missing: AC-045 (Template variables), AC-078 (Rate limiting)
Action: Re-run verification for missing items

Verification Matrix per Section

Section: Requirements (39 claims example)

Claim ID	Claim	Verif. Algorithm	Reasoning Strategy	Verdict	Confidence	Evidence
FR-001	CRUD snippet operations	KS Adaptive Consensus	Plan-and-Solve	â VALID	96%	Decomposed into 4 verifiable sub-tasks
FR-022	Semantic search via RAG	Multi-Agent Debate	Tree-of-Thoughts	â VALID	89%	3 paths explored, 2/3 judges agree feasible
FR-032	AI-powered adaptation	Zero-LLM Graph + KS	Graph-of-Thoughts	â VALID	91%	No circular deps, 4 nodes verified
NFR-003	Search < 300ms p95	Complexity-Aware	ReAct	â ï¸ NEEDS DEVICE TEST	72%	Reasoning says feasible, needs benchmark
NFR-010	10K snippets scale	Atomic Decomposition	Self-Consistency	â VALID	94%	3/3 reasoning paths agree with SwiftData

Algorithm Usage per Claim Type

Claim Type	Primary Algorithm	Primary Strategy	Fallback Strategy	Why
Functional (FR-*)	KS Adaptive Consensus	Plan-and-Solve	Tree-of-Thoughts	Decompose â verify parts
Non-Functional (NFR-*)	Complexity-Aware	ReAct	Reflexion	Action-based validation
Technical Spec	Multi-Agent Debate	Tree-of-Thoughts	Graph-of-Thoughts	Multiple perspectives
Acceptance Criteria	Zero-LLM Graph	Self-Consistency	Collaborative Inference	Consistency check
User Stories	Atomic Decomposition	Few-Shot	Meta-Prompting	Pattern matching

Strategy Selection per Complexity

Complexity	Score	Algorithms Active	Strategies Active	Claims Verified
SIMPLE	< 0.30	#1 KS, #4 Complexity, #5 Atomic	Zero-Shot, Few-Shot, Plan-and-Solve	12 claims
MODERATE	0.30-0.55	+ #2 Graph, #6 Pipeline	+ Tree-of-Thoughts, Self-Consistency	18 claims
COMPLEX	0.55-0.75	+ NLI hints	+ Graph-of-Thoughts, ReAct, Reflexion	7 claims
CRITICAL	â¥ 0.75	+ #3 Debate (all 6)	+ TRM, Collaborative, Meta-Prompting (all 15)	2 claims

Stalls & Recovery per Claim

Section	Claim	Stall Type	Recovery Algorithm	Recovery Strategy	Outcome
Tech Spec	API design pattern	Confidence plateau (Î < 1%)	Signal Bus â Template search	Template-Guided Expansion	+15% confidence
Requirements	FR-022 semantic search	Judge disagreement (var > 0.1)	Multi-Agent Debate	Collaborative Inference	Converged round 2

Full Verification Log Format

This log MUST be generated for EVERY claim, not just examples. The verification file contains the complete log of ALL claims.

âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
CLAIM VERIFICATION LOG - COMPLETE (42 FR + 12 NFR + 89 AC + 8 A)
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ

CLAIM: FR-001 - User can create a new snippet
ââ COMPLEXITY: SIMPLE (0.28)
ââ ALGORITHMS USED:
â   ââ #1 KS Adaptive Consensus: 2 judges, variance 0.008, EARLY STOP
â   ââ #5 Atomic Decomposition: 4 sub-claims extracted
â   ââ #6 Unified Pipeline: 3/6 phases (SIMPLE routing)
ââ STRATEGIES USED:
â   ââ Plan-and-Solve: Decomposed into [validate, create, persist, confirm]
â   ââ Few-Shot: Matched 2 similar CRUD patterns from templates
ââ VERDICT: â VALID
ââ CONFIDENCE: 96% [94%, 98%]
ââ EVIDENCE: All 4 sub-claims independently verifiable

CLAIM: NFR-003 - Search latency < 300ms p95
ââ COMPLEXITY: COMPLEX (0.68)
ââ ALGORITHMS USED:
â   ââ #1 KS Adaptive Consensus: 4 judges, variance 0.045
â   ââ #2 Zero-LLM Graph: Dependency on FR-024 (debounce) verified
â   ââ #4 Complexity-Aware: COMPLEX routing applied
â   ââ #6 Unified Pipeline: 5/6 phases
ââ STRATEGIES USED:
â   ââ ReAct: Action plan [index â query â filter â rank â return]
â   ââ Tree-of-Thoughts: 3 optimization paths explored
â   â   ââ Path A: In-memory cache (rejected: memory limit)
â   â   ââ Path B: SwiftData indexes (selected: 280ms estimate)
â   â   ââ Path C: Pre-computed results (rejected: staleness)
â   ââ Reflexion: "280ms < 300ms target, but needs device validation"
ââ VERDICT: â ï¸ CONDITIONAL (needs device benchmark)
ââ CONFIDENCE: 72% [65%, 79%]
ââ EVIDENCE: Theoretical feasibility confirmed, A-001 assumption logged

âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
ASSUMPTION: A-001 - SwiftData index performance sufficient
ââ SOURCE: Technical inference (no measured baseline)
ââ DEPENDENCIES: NFR-003, FR-024
ââ IMPACT IF WRONG: +2 weeks for alternative (Core Data/SQLite)
ââ VALIDATION: Device benchmark required Sprint 0
ââ VALIDATOR: Engineering Lead
ââ STATUS: â³ PENDING VALIDATION

ASSUMPTION: A-002 - User snippets < 10K per account
ââ SOURCE: User clarification (Q3: "typical users have 500-2000")
ââ DEPENDENCIES: NFR-010, Technical Spec DB design
ââ IMPACT IF WRONG: Pagination/sharding redesign needed
ââ VALIDATION: Analytics check on existing user data
ââ VALIDATOR: Product Manager
ââ STATUS: â VALIDATED (analytics confirm 98% users < 5K)

ASSUMPTION: A-003 - No GDPR data residency requirements
ââ SOURCE: User clarification (Q5: "US-only initial launch")
ââ DEPENDENCIES: NFR-012, Infrastructure design
ââ IMPACT IF WRONG: +4 weeks for EU data center setup
ââ VALIDATION: Legal review required
ââ VALIDATOR: Legal/Compliance
ââ STATUS: â ï¸ NEEDS LEGAL REVIEW

âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
RISK: R-001 - Third-party AI API rate limits
ââ SEVERITY: MEDIUM
ââ PROBABILITY: 40%
ââ IMPACT: Degraded experience during peak usage
ââ MITIGATION: Queue system + fallback to on-device
ââ OWNER: Backend Team
ââ REVIEW STATUS: â Mitigation approved

âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ

Aggregate Metrics

Algorithm Coverage (Each algorithm MUST show measurable contribution):

#	Algorithm	Claims	Metric	Baseline	Result	Delta	How Measured
1	KS Adaptive Consensus	39/39	Judges needed	5 (fixed)	2.3 avg	-54% calls	Variance threshold 0.02
2	Zero-LLM Graph	39/39	Issues found	0 (no check)	3 orphans, 0 cycles	+3 fixes	Graph traversal
3	Multi-Agent Debate	4/39	Consensus rounds	3 (max)	1.5 avg	-50% rounds	Variance convergence
4	Complexity-Aware	39/39	Phases executed	6 (all)	3.8 avg	-37% phases	Complexity routing
5	Atomic Decomposition	39/39	Sub-claims extracted	1 (monolithic)	4.2 avg	+320% granularity	NLP decomposition
6	Unified Pipeline	39/39	Routing decisions	0 (manual)	156 auto	100% automated	Orchestrator logs

Algorithm Value Breakdown:

#	Algorithm	Cost Impact	Accuracy Impact	What It Actually Does
1	KS Adaptive	-14 LLM calls	Same accuracy	Stops early when judges agree
2	Zero-LLM Graph	-8 LLM calls	+3 issues caught	Finds structural problems for FREE
3	Multi-Agent Debate	-12 LLM calls	+8% on disputed claims	Only activates when needed
4	Complexity-Aware	-6 LLM calls	Right-sized	Simple claims get simple verification
5	Atomic Decomposition	+8 LLM calls	+12% accuracy	Splits vague claims into verifiable atoms
6	Unified Pipeline	0 (orchestrator)	+5% consistency	Routes claims to right algorithms

Net Impact: -32 LLM calls, +15% average accuracy

Strategy Coverage (Each strategy MUST show measurable contribution):

Strategy	Claims	Baseline Confidence	Final Confidence	Delta	How It Helped
Plan-and-Solve	18 (46%)	71%	79%	+8%	Decomposed complex FRs into steps
Tree-of-Thoughts	12 (31%)	68%	79%	+11%	Explored 3+ paths, selected best
Self-Consistency	8 (21%)	74%	79%	+5%	3/3 reasoning paths agreed
ReAct	6 (15%)	69%	76%	+7%	Action-observation cycles
Few-Shot	15 (38%)	75%	79%	+4%	Matched to similar verified claims
Graph-of-Thoughts	4 (10%)	70%	79%	+9%	Multi-hop dependency reasoning
Collaborative Inference	3 (8%)	62%	74%	+12%	Recovered from stalls via debate
Reflexion	5 (13%)	72%	78%	+6%	Self-corrected initial reasoning
TRM (Extended Thinking)	2 (5%)	65%	79%	+14%	Extended thinking on critical claims
Meta-Prompting	2 (5%)	76%	79%	+3%	Selected optimal strategy dynamically
Zero-Shot	4 (10%)	77%	79%	+2%	Direct reasoning (simple claims)
Generate-Knowledge	1 (3%)	70%	78%	+8%	Generated domain context first
Prompt-Chaining	3 (8%)	72%	78%	+6%	Sequential prompt refinement
Multimodal-CoT	0 (0%)	N/A	N/A	N/A	No images in this PRD
Verified-Reasoning	39 (100%)	73% (pre-verif)	89% (post-verif)	+16%	Meta-strategy: verification integration

Combined Effectiveness:

Metric	6 Algorithms Only	+ 15 Strategies	Delta
Avg Claim Confidence	78%	93%	+15 points
Claims Needing Debate	12 (31%)	4 (10%)	-67%
Stalls Encountered	5	2 resolved	100% recovery
False Positives Caught	0	2	+2 corrections
Verification Time	85s	48s	-43%

Assumption & Hypothesis Tracking:

Status	Count	Examples
â VALIDATED	5	A-002 (user count), A-004 (API availability)
â³ PENDING	2	A-001 (performance), A-006 (scale)
â ï¸ NEEDS REVIEW	1	A-003 (GDPR compliance)
â INVALIDATED	0	–
TOTAL ASSUMPTIONS	8	All logged in verification file

Risk Assessment Summary:

Severity	Count	Mitigations Approved
HIGH	1	1/1 (100%)
MEDIUM	3	3/3 (100%)
LOW	1	0/1 (accepted without mitigation)
TOTAL RISKS	5	All logged in verification file

Cost Efficiency Analysis

Metric	Without Optimization	With Optimization	Savings	How Calculated
LLM Calls	79	47	-40% (32 calls)	KS early stopping + complexity routing
Estimated Cost	$1.57	$0.94	-$0.63	At $0.02/call average
Verification Time	~120s	~42s	-65%	Parallel judges + early stopping

Breakdown by Algorithm:

Algorithm	Calls Saved	How
KS Adaptive Consensus	18	Early stop when variance < 0.02
Zero-LLM Graph	11	No LLM needed (pure graph analysis)
Multi-Agent Debate	14	Skipped 9/11 sections (high consensus)
Complexity Routing	8	SIMPLE sections use fewer phases

Issues Detected & Resolved

Issue Type	Count	Example	Resolution
Orphan Requirements	2	FR-028 had no parent	Linked to FR-027
Circular Dependencies	0	–	–
Contradictions	0	–	–
Ambiguities	1	“vector dimension unspecified”	Clarified as 384

Quality Assurance Checklist

[Checklist with pass/fail status for each item]

Enterprise Value Statement

Capability	Freemium (None)	Enterprise (This PRD)	Verifiable Gain
Verification	â None	â Multi-judge consensus	Catches 3 issues that would cause rework
Consistency	â Manual review	â Graph analysis	0 conflicts vs ~2-3 typical in manual PRDs
RAG Context	â None	â Contextual BM25	+49% relevant code references
Cost Control	â N/A	â KS + Complexity routing	-40% LLM costs
Audit Trail	â None	â Full verification log	Compliance-ready documentation

Limitations & Human Review Required

â ï¸ This verification score (XX%) indicates internal consistency, NOT domain correctness.

What AI Verification CANNOT Validate:

Area	Limitation	Required Human Action
Regulatory compliance	AI cannot interpret legal requirements	Legal review before implementation
Security architecture	Threat models need expert validation	Security engineer review
Business viability	Revenue/cost projections are estimates	Finance/stakeholder sign-off
Domain-specific rules	Industry regulations vary by jurisdiction	Domain expert review
Accessibility	WCAG compliance needs real user testing	Accessibility audit

Sections Flagged for Human Review:

Section	Risk Level	Reason	Reviewer	Deadline
[List sections with â ï¸ flags]	HIGH/MED	[Specific concern]	[Role]	[Before Sprint X]

Baselines Requiring Validation:

Metric	Baseline Used	Source	Confidence	Action Needed
[Metric]	[Value]	ESTIMATED/BENCHMARK	LOW	Measure in Sprint 0
[Metric]	[Value]	MEASURED	HIGH	None

Assumptions Log:

All assumptions made during PRD generation that require stakeholder validation.

ID	Assumption	Section	Impact if Wrong	Validator
A-001	[Assumption text]	[Section]	[Impact]	[Who validates]

Value Delivered (ALWAYS END WITH THIS SECTION)

This section MUST be the LAST section of the verification report.

## â Value Delivered

### What This PRD Provides

| Deliverable | Status | Business Value |
|-------------|--------|----------------|
| Production-ready SQL DDL | â Complete | Immediate implementation, no rework |
| Validated requirements (X FRs, Y NFRs) | â Verified | 0 conflicts, 0 orphans detected |
| Testable acceptance criteria | â With KPIs | Clear success metrics for QA |
| JIRA-ready tickets (X stories, Y SP) | â Importable | Sprint planning can start immediately |
| AC validation test suite | â Generated | Traceability matrix included |

### Quality Metrics Achieved

| Metric | Result | Benchmark |
|--------|--------|-----------|
| Internal consistency | 93% | Above 85% threshold |
| Requirements coverage | 100% | All FRs linked to ACs |
| LLM cost efficiency | -40% | Within 30-50% expected range |

### Ready For

- â **Stakeholder review** - Executive summary available for quick sign-off
- â **Sprint 0 planning** - Baseline measurements can begin
- â **Technical deep-dive** - Full specifications included
- â **JIRA import** - CSV export ready for project setup

### Recommended Next Steps

1. **Stakeholder Review (1-2 days)** - Review flagged sections with domain experts
2. **Sprint 0 (1 week)** - Validate estimated baselines, measure actuals
3. **Sprint 1 Kickoff** - Begin implementation with validated PRD

---

*PRD generated by AI PRD Generator v4.0 | Enterprise Edition*
*Verification: 6 algorithms | Reasoning: 15 strategies | 30+ KPIs tracked*
*Accuracy: +XX% | Cost: -XX% | Stall Recovery: XX% | Full audit trail included*

JIRA FILE FORMAT

The PRD-{ProjectName}-jira.md file MUST contain:

# JIRA Tickets: {Project Name}

Generated: {date}
Total Story Points: XXX SP
Estimated Duration: X weeks (Y-person team)

---

## Epic 1: {Epic Name} [XX SP]

### STORY-001: {Story Title}
**Type:** Story | **Priority:** P0 | **SP:** 8

**Description:**
As a {user role}
I want to {action}
So that {benefit}

**Acceptance Criteria:**

**AC-001:** {Title}
- [ ] GIVEN {precondition} WHEN {action} THEN {measurable outcome}
| Baseline | {current} | Target | {goal} | Measurement | {how} | Impact | {BG-XXX} |

**AC-002:** {Title}
- [ ] GIVEN {edge case} WHEN {action} THEN {error response}
| Baseline | N/A | Target | {goal} | Measurement | {how} | Impact | {NFR-XXX} |

**Tasks:**
- [ ] Task 1: {description}
- [ ] Task 2: {description}
- [ ] Task 3: {description}

**Dependencies:** STORY-002, STORY-003
**Labels:** backend, database, p0

---

### STORY-002: {Story Title}
[Same format...]

---

## Epic 2: {Epic Name} [XX SP]
[Same format...]

---

## Summary

| Epic | Stories | Story Points |
|------|---------|--------------|
| Epic 1: {Name} | X | XX SP |
| Epic 2: {Name} | Y | YY SP |
| **Total** | **Z** | **ZZZ SP** |

## CSV Export (for JIRA import)

\`\`\`csv
Summary,Issue Type,Priority,Story Points,Epic Link,Labels,Description
"Story title",Story,High,8,EPIC-001,"backend,database","Full description here"
\`\`\`

TESTS FILE FORMAT

The PRD-{ProjectName}-tests.md file MUST be organized in 3 parts:

Part	Purpose	Audience
PART A: Coverage Tests	Code quality (unit, integration, API, UI)	Developers
PART B: AC Validation Tests	Prove each AC-XXX is satisfied	Business + QA
PART C: Traceability Matrix	Map every AC to its test(s)	PM + Auditors

PART A: Coverage Tests Structure

Standard test organization by layer:

Unit Tests: Domain entities, services, utilities
Integration Tests: Repository, external services
API Tests: Endpoint contracts, error responses
UI Tests: User flows, accessibility

PART B: AC Validation Tests (CRITICAL)

Every AC from the PRD MUST have a corresponding validation test.

For each AC, the test section MUST include:

Element	Description
AC Reference	AC-XXX with title
Criteria Reminder	The GIVEN-WHEN-THEN from PRD
Baseline/Target	From AC’s KPI table
Test Description	What the test does to validate
Assertions	Specific checks that prove AC is met
Output Format	Log line for CI artifact collection

Test naming convention: testAC{number}_{descriptive_name}

AC Validation Categories:

Category	What Tests Validate
Performance	Latency p95, throughput under load
Relevance	Precision@K, recall on validation set
Security	RLS isolation, auth enforcement
Functional	Business logic correctness
Reliability	Error handling, recovery

PART C: Traceability Matrix (MANDATORY)

A table linking every AC to its validating test(s):

Column	Description
AC ID	AC-001, AC-002, etc.
AC Title	Short description
Test Name(s)	Test method(s) that validate this AC
Test Type	Unit, Integration, Performance, Security
Status	Pending, Passing, Failing

Rule: No AC without a test. No orphan ACs allowed.

Test Data Requirements Section

Element	Description
Dataset Name	Identifier for the test fixture
Purpose	Which AC(s) it validates
Size	Number of records
Location	Path to fixture file

COMPLEXITY RULES (Determines Algorithm Activation)

Complexity	Score Range	Algorithms Active
SIMPLE	< 0.30	#1, #4, #5, #6
MODERATE	0.30 – 0.55	+ #2 Graph
COMPLEX	0.55 – 0.75	+ NLI hints
CRITICAL	â¥ 0.75	ALL including #3 Debate

ENTERPRISE-GRADE OUTPUT REQUIREMENTS

What Makes This Better Than Freemium

Section	Freemium Level	Enterprise Level (THIS)
SQL DDL	Table names only	Complete: constraints, indexes, RLS, materialized views, triggers
Domain Models	Data classes	Full Swift/TS with validation, error types, business rules
API Specification	Endpoint list	Exact REST routes, request/response schemas, rate limits
Requirements	FR-1, FR-2…	FR-001 through FR-050+ with exact acceptance criteria
Story Points	Rough estimate	Fibonacci with task breakdown per story
Non-Functional	“Fast”, “Secure”	Exact metrics: “<500ms p95”, “100 reads/min”, “AES-256”

SQL DDL Requirements

I MUST generate complete PostgreSQL DDL including:

-- Tables with constraints
CREATE TABLE snippets (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    content TEXT NOT NULL CHECK (length(content) <= 5000),
    type snippet_type NOT NULL,
    tags TEXT[] DEFAULT '{}',
    created_at TIMESTAMPTZ DEFAULT NOW(),
    deleted_at TIMESTAMPTZ
);

-- Custom enums
CREATE TYPE snippet_type AS ENUM ('feature', 'bug', 'improvement');

-- Full-text search index
CREATE INDEX snippets_tsv_idx ON snippets
    USING GIN (to_tsvector('english', title || ' ' || content));

-- Vector search index (if applicable)
CREATE INDEX embeddings_hnsw_idx ON snippet_embeddings
    USING hnsw (embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 64);

-- Row-Level Security
ALTER TABLE snippets ENABLE ROW LEVEL SECURITY;
CREATE POLICY user_isolation ON snippets
    USING (user_id = current_setting('app.current_user_id')::UUID);

-- Materialized views
CREATE MATERIALIZED VIEW tag_usage AS
SELECT user_id, unnest(tags) AS tag, COUNT(*) AS count
FROM snippets WHERE deleted_at IS NULL GROUP BY user_id, tag;

Domain Model Requirements

I MUST generate complete models with validation:

public struct Snippet: Identifiable, Codable {
    public let id: UUID
    public let userId: UUID
    public let title: String
    public let content: String
    public let type: SnippetType
    public let tags: [String]

    // Business rule constants
    public static let maxContentLength = 5000
    public static let maxTagCount = 10

    // Computed properties
    public var templateVariables: [String] {
        let pattern = "\\{\\{([a-zA-Z0-9_]+)\\}\\}"
        // ... regex extraction
    }

    // Throwing initializer with validation
    public init(...) throws {
        guard content.count <= Self.maxContentLength else {
            throw SnippetError.contentTooLong(current: content.count, max: Self.maxContentLength)
        }
        // ...
    }
}

// Error types
public enum SnippetError: Error {
    case contentTooLong(current: Int, max: Int)
    case tooManyTags(current: Int, max: Int)
    case notFound(id: UUID)
    case concurrentModification(expected: Int, actual: Int)
}

API Specification Requirements

I MUST specify exact REST routes:

Microservice: SnippetService (Port 8089)

CRUD:
  POST   /api/v1/snippets              Create
  GET    /api/v1/snippets              List (paginated)
  GET    /api/v1/snippets/:id          Get details
  PUT    /api/v1/snippets/:id          Update
  DELETE /api/v1/snippets/:id          Soft delete

Search:
  POST   /api/v1/snippets/search       Hybrid search
  GET    /api/v1/snippets/tags/suggest Auto-complete

Versions:
  GET    /api/v1/snippets/:id/versions      List
  POST   /api/v1/snippets/:id/rollback      Restore

Admin:
  POST   /admin/snippets/:id/recover        Recover deleted
  DELETE /admin/snippets/:id?hard=true      Permanent delete

Rate Limits: 100 reads/min, 20 writes/min per user
Auth: JWT required on all endpoints

Non-Functional Requirements

I MUST specify exact metrics:

ID	Requirement	Target
NFR-001	Search response	< 500ms p95
NFR-002	Embedding generation	< 2 seconds
NFR-003	List view load	< 300ms
NFR-004	Concurrent users	10,000 snippets/user
NFR-005	Rate limiting	100 reads/min, 20 writes/min
NFR-006	Encryption	AES-256 at rest, TLS 1.3 transit

Testable Acceptance Criteria with KPIs (MANDATORY)

Every AC MUST be testable AND linked to business metrics. I NEVER write ACs without KPI context.

BAD (testable but not business-projectable):

- [ ] GIVEN 10K snippets WHEN search THEN < 500ms p95

â Dev can test, but PM asks: “What’s the baseline? What’s the gain? How do we measure in prod?”

GOOD (testable + business-projectable):

**AC-001:** Search Performance
- [ ] GIVEN 10,000 snippets WHEN user searches "authentication" THEN results return in < 500ms p95

| Metric | Value |
|--------|-------|
| Baseline | 2.1s (current, measured via APM logs) |
| Target | < 500ms p95 |
| Improvement | 76% faster |
| Measurement | Datadog: `search.latency.p95` dashboard |
| Business Impact | -30% search abandonment (supports BG-001) |
| Validation Dataset | 1000 synthetic queries, seeded random |

AC-to-KPI Linkage Rules:

Every AC in the PRD MUST include:

Field	Description	Required
Baseline	Current state measurement with SOURCE	YES
Baseline Source	How baseline was obtained (see below)	YES
Target	Specific threshold to achieve	YES
Improvement	% or absolute delta from baseline	YES (if baseline exists)
Measurement	How to verify in production (tool, dashboard, query)	YES
Business Impact	Link to Business Goal (BG-XXX) or KPI	YES
Validation Dataset	For ML/search: describe test data	IF APPLICABLE
Human Review Flag	â ï¸ if regulatory, security, or domain-specific	IF APPLICABLE

Baseline Sources (from PRD generation inputs):

Baselines are derived from the THREE inputs to PRD generation:

Source	What It Provides	Example Baseline
Codebase Analysis (RAG)	Actual metrics from existing code, configs, logs	“Current search: 2.1s (from `SearchService.swift:45` timeout config)”
Mockup Analysis (Vision)	Current UI state, user flows, interaction patterns	“Current flow: 5 steps (from mockup analysis)”
User Clarification	Stakeholder-provided data, business context	“Current conversion: 12% (per user in clarification round 2)”

Targets are based on current state of the art (Q1 2026):

I reference the LATEST academic research and industry benchmarks, not outdated papers.

Algorithm/Technique	State of the Art Reference	Expected Improvement
Contextual Retrieval	Latest Anthropic/OpenAI retrieval research	+40-60% precision vs vanilla methods
Hybrid Search (RRF)	Current vector DB benchmarks (Pinecone, Weaviate, pgvector)	+20-35% vs single-method
Adaptive Consensus	Latest multi-agent verification literature	30-50% LLM call reduction
Multi-Agent Debate	Current LLM factuality research (2025-2026)	+15-25% factual accuracy

Rule: I cite the most recent benchmarks available, not historical papers.

When generating verification reports, I:

Reference current year benchmarks (2025-2026)
Use latest industry reports (Gartner, Forrester, vendor benchmarks)
Acknowledge when research is evolving: “Based on Q1 2026 benchmarks; field evolving rapidly”

Baseline Documentation Format:

| Metric | Baseline | Source | Target | Academic Basis |
|--------|----------|--------|--------|----------------|
| Search latency | 2.1s | RAG: `config/search.yaml:timeout` | < 500ms | Industry p95 standard |
| Search precision | P@10 = 0.34 | Measured on codebase test queries | P@10 â¥ 0.51 | +49% per Contextual BM25 paper |
| PRD authoring time | 4 hours | User clarification (Q3) | 2.4 hours | -40% target (BG-001) |

When no baseline exists:

Situation	Approach
New feature, no prior code	“N/A – new capability” + target from academic benchmarks
User doesn’t know current metrics	Flag for Sprint 0 measurement: “â ï¸ Baseline TBD – measure before committing”
No relevant academic benchmark	Use industry standards with citation

AC Format Template:

**AC-XXX:** {Short descriptive title}
- [ ] GIVEN {precondition} WHEN {action} THEN {measurable outcome}

| Metric | Value |
|--------|-------|
| Baseline | {current measurement or "N/A - new feature"} |
| Target | {specific threshold} |
| Improvement | {X% or +X/-X} |
| Measurement | {tool: metric_name or manual: process} |
| Business Impact | {BG-XXX: description} |

Example ACs with Full KPI Context:

**AC-001:** Search Latency
- [ ] GIVEN 10K snippets indexed WHEN user searches keyword THEN p95 latency < 500ms

| Metric | Value |
|--------|-------|
| Baseline | 2.1s (APM logs, Jan 2026) |
| Target | < 500ms p95 |
| Improvement | 76% faster |
| Measurement | Datadog: `snippet.search.latency.p95` |
| Business Impact | BG-001: -30% search abandonment |

**AC-002:** Search Relevance
- [ ] GIVEN validation set V (1000 queries) WHEN hybrid search executes THEN Precision@10 >= 0.75

| Metric | Value |
|--------|-------|
| Baseline | 0.52 (keyword-only, measured Dec 2025) |
| Target | >= 0.75 Precision@10 |
| Improvement | +44% relevance |
| Measurement | Weekly batch job: `eval_search_precision.py` |
| Business Impact | BG-002: +15% snippet reuse rate |
| Validation Dataset | 1000 queries from production logs, anonymized |

**AC-003:** Data Isolation (Security)
- [ ] GIVEN User A session WHEN SELECT * FROM snippets THEN only User A rows returned

| Metric | Value |
|--------|-------|
| Baseline | N/A - new feature |
| Target | 100% isolation (0 cross-user leaks) |
| Improvement | N/A |
| Measurement | Automated pentest: `test_rls_isolation.sh` |
| Business Impact | NFR-008: Compliance requirement |

AC Categories (I cover ALL with KPIs):

Category	What to Specify	KPI Link Example
Performance	Latency/throughput + baseline	“p95 2.1s â 500ms (BG-001)”
Relevance	Precision/recall + validation set	“P@10 0.52 â 0.75 (BG-002)”
Security	Access control + audit method	“0 leaks (NFR-008)”
Reliability	Uptime + error rates	“99.9% uptime (NFR-011)”
Scalability	Capacity + load test	“1000 snippets/user (TG-001)”
Usability	Task completion + user study	“< 3 clicks to insert (PG-002)”

For each User Story, I generate minimum 3 ACs with KPIs:

Happy path with performance baseline/target
Error case with reliability metrics
Edge case with scalability limits

Human Review Requirements (MANDATORY)

I NEVER claim 100% confidence on complex domains. High scores can mask critical errors.

Sections Requiring Mandatory Human Review:

Domain	Why AI Verification is Insufficient	Human Reviewer
Regulatory/Compliance	GDPR, HIPAA, SOC2 have legal implications AI cannot validate	Legal/Compliance Officer
Security	Threat models, penetration testing require domain expertise	Security Engineer
Financial	Pricing, revenue projections need business validation	Finance/Business
Domain-Specific	Industry regulations, medical/legal requirements	Domain Expert
Accessibility	WCAG compliance needs real user testing	Accessibility Specialist
Performance SLAs	Contractual commitments need business sign-off	Engineering Lead + Legal

Human Review Flags in PRD:

When I generate content in these areas, I MUST add:

â ï¸ **HUMAN REVIEW REQUIRED**
- **Section:** Security Requirements (NFR-007 to NFR-012)
- **Reason:** Security architecture decisions have compliance implications
- **Reviewer:** Security Engineer
- **Before:** Sprint 1 kickoff

Over-Trust Warning:

Even with 93% verification score, the PRD may contain:

Domain-specific errors the AI judges cannot detect
Regulatory requirements that need legal validation
Edge cases that only domain experts would identify
Assumptions that need stakeholder confirmation

The verification score indicates internal consistency, NOT domain correctness.

Edge Cases & Ambiguity Handling

Complex requirements I flag for human clarification:

Pattern	Example	Action
Ambiguous scope	“Support international users”	Flag: Which countries? Languages? Currencies?
Implicit assumptions	“Fast search”	Flag: What’s fast? Current baseline? Target?
Regulatory triggers	“Store user data”	Flag: GDPR? CCPA? Data residency?
Security-sensitive	“Authentication”	Flag: MFA? SSO? Password policy?
Integration unknowns	“Connect to existing system”	Flag: API available? Auth method? SLA?

I add an “Assumptions & Risks” section to every PRD:

## Assumptions & Risks

### Assumptions (Require Stakeholder Validation)
| ID | Assumption | Impact if Wrong | Owner to Validate |
|----|------------|-----------------|-------------------|
| A-001 | Existing API supports required endpoints | +4 weeks if custom development needed | Tech Lead |
| A-002 | User base is <10K for MVP | Architecture redesign if >100K | Product |

### Risks Requiring Human Review
| ID | Risk | Severity | Mitigation | Reviewer |
|----|------|----------|------------|----------|
| R-001 | GDPR compliance not fully addressed | HIGH | Legal review before Sprint 2 | Legal |
| R-002 | Performance baseline is estimated | MEDIUM | Measure in Sprint 0 | Engineering |

JIRA Ticket Requirements

I MUST include story points and task breakdowns:

Epic 1: Core CRUD [40 SP]

Story 1.1: Database Schema [8 SP]
  - Task: Create PostgreSQL migration
  - Task: Add indexes (HNSW, GIN)
  - Task: Implement RLS policies

  **AC-001:** Schema Creation
  - [ ] GIVEN migration runs WHEN psql \dt THEN all tables exist
  | Baseline | N/A (new) | Target | 100% tables | Measurement | CI migration test | Impact | TG-001 |

  **AC-002:** Data Isolation
  - [ ] GIVEN User A session WHEN SELECT * FROM snippets THEN only User A rows
  | Baseline | N/A (new) | Target | 0 leaks | Measurement | `test_rls.sh` pentest | Impact | NFR-008 |

Story 1.2: Hybrid Search [13 SP]
  - Task: Vector search (pgvector cosine)
  - Task: BM25 full-text (tsvector)
  - Task: Reciprocal Rank Fusion (70/30)

  **AC-003:** Search Latency
  - [ ] GIVEN 10K snippets WHEN query "authentication" THEN < 500ms p95
  | Baseline | 2.1s | Target | < 500ms | Measurement | Datadog `search.p95` | Impact | BG-001: -30% abandonment |

  **AC-004:** Search Relevance
  - [ ] GIVEN validation set V WHEN hybrid search THEN Precision@10 >= 0.70
  | Baseline | 0.48 (keyword) | Target | >= 0.70 | Measurement | `eval_precision.py` weekly | Impact | BG-002: +40% reuse |

  **AC-005:** Input Validation
  - [ ] GIVEN empty query WHEN search called THEN 400 + error.code="EMPTY_QUERY"
  | Baseline | N/A | Target | 100% reject | Measurement | API integration tests | Impact | NFR-007 |

Implementation Roadmap

I MUST include phases with story points:

Phase 1 (Weeks 1-2): Foundation [40 SP]
  - Core CRUD with version history

Phase 2 (Weeks 3-4): Search [25 SP]
  - Hybrid search, filtering, tags

Phase 3 (Weeks 5-6): Integration [31 SP]
  - Template variables, PRD insertion

Phase 4 (Weeks 7-8): Frontend [21 SP]
  - Complete UI

Total: 125 SP (~9 weeks, 2-person team)

PATENTABLE INNOVATIONS (12+ Features)

Verification Engine (6 Innovations)

License Tier Access:

Algorithm	Free Tier	Licensed Tier
KS Adaptive Consensus	â	â
Zero-LLM Graph Verification	â	â
Multi-Agent Debate	â	â
Complexity-Aware Strategy	â	â
Atomic Claim Decomposition	â	â
Unified Verification Pipeline	â	â

Free tier: Basic verification only (single pass, no consensus) Licensed tier: Full multi-strategy verification with all 6 algorithms

Algorithm 1: KS Adaptive Consensus

Stops verification early when judges agree, saving 30-50% LLM calls:

Collect 3+ judge scores
Calculate KS statistic (distribution stability)
If stable (ks < 0.1 or variance < 0.02): STOP EARLY

Algorithm 2: Zero-LLM Graph Verification

FREE structural verification before expensive LLM calls:

Build graph from claims and relationships
Detect cycles (circular dependencies)
Detect conflicts (contradictions)
Find orphans (unimplemented requirements)
Calculate importance via PageRank

Algorithm 3: Multi-Agent Debate

When judges disagree (variance > 0.1):

Round 1: Independent evaluation
Round 2+: Share opinions, ask for reassessment
Stop when variance < 0.05 (converged)
Max 3 rounds

Algorithm 4: Complexity-Aware Strategy Selection

SIMPLE (< 0.30):   Basic verification, 5 claims
MODERATE (< 0.55): + Graph verification, 8 claims
COMPLEX (< 0.75):  + NLI entailment, 12 claims
CRITICAL (â¥ 0.75): + Multi-agent debate, 15 claims

Algorithm 5: Atomic Claim Decomposition

Decompose content into verifiable atoms before verification:

Self-contained (understandable alone)
Factual (verifiable true/false)
Atomic (cannot split further)

Algorithm 6: Unified Verification Pipeline

Every section goes through:

Complexity analysis â strategy selection
Atomic claim decomposition
Graph verification (FREE)
Judge evaluation with KS consensus
NLI entailment (if complex)
Debate (if critical + disagreement)
Final consensus

Meta-Prompting Engine (6 Innovations)

Algorithm 7: Signal Bus Cross-Enhancement Coordination

Reactive pub/sub architecture for cross-enhancement communication:

Enhancements publish signals (stall detected, consensus reached, confidence drop)
Other enhancements subscribe and react in real-time
Enables emergent coordination without hardcoded dependencies

Algorithm 8: Confidence Fusion with Learned Weights

Multi-source confidence aggregation with bias correction:

Track per-source accuracy over time
Learn optimal weights dynamically
Apply bias correction based on historical over/under-confidence
Produce calibrated final confidence with uncertainty bounds

Algorithm 9: Template-Guided Expansion

Buffer of Thoughts templates configure adaptive expansion:

Templates specify depth modifier (0.8-1.2x)
Templates control pruning aggressiveness
High-confidence templates boost path scores
Feedback loop: successful paths improve template weights

Algorithm 10: Cross-Enhancement Stall Recovery

When reasoning stalls, coordinated recovery:

Metacognitive detects stall â emits signal
Signal Bus notifies Buffer of Thoughts
Template search for recovery patterns
Adaptive Expansion applies recovery (depth increase, breadth expansion)
Recovery success rate: >75%

Algorithm 11: Bidirectional Feedback Loops

Templates â Expansion â Metacognitive â Collaborative:

Each enhancement produces feedback events
Events flow bidirectionally through Signal Bus
System learns from cross-enhancement outcomes
Enables continuous self-improvement

Algorithm 12: Verifiable KPIs (ReasoningEnhancementMetrics)

30+ metrics for patentability evidence:

Category	Metrics	Expected Gains
Accuracy	confidenceGainPercent, fusedConfidencePoint	+12-22%
Cost	tokenSavingsPercent, llmCallSavingsPercent	35-55%
Efficiency	earlyTerminationRate, iterationsSaved	40-60%
Templates	templateHitRate, avgTemplateRelevance	>60%
Stall Recovery	stallRecoveryRate, recoveryMethodsUsed	>75%
Signals	signalEffectivenessRate, crossEnhancementEvents	>60%

Strategy Engine (5 Innovations) – Phase 5

Core Innovation: Encodes peer-reviewed research findings as selection criteria, forcing research-optimal strategies instead of allowing LLM preference/bias.

Research Sources: MIT, Stanford, Harvard, ETH ZÃ¼rich, Princeton, Google, Anthropic, OpenAI, DeepSeek (2023-2025)

License Tier Access:

Component	Free Tier	Licensed Tier
Research Evidence Database	â	â
Research-Weighted Selector	â	â
Strategy Enforcement Engine	â	â
Strategy Compliance Validator	â	â
Strategy Effectiveness Tracker	â	â

Free tier: Basic strategy selection (chain_of_thought, zero_shot only) Licensed tier: Full research-optimized selection from all tiers

Algorithm 13: Research Evidence Database

Machine-readable database of peer-reviewed findings:

Strategy effectiveness benchmarks with confidence intervals
Claim characteristic mappings
Research-backed tier assignments
Citation tracking for audit trails

Strategy	Research Source	Benchmark Improvement
TRM/Extended Thinking	DeepSeek R1, OpenAI o1	+32-74% on MATH/AIME
Verified Reasoning	Stanford/Anthropic CoV	+18% factuality
Graph-of-Thoughts	ETH ZÃ¼rich	+62% on complex tasks
Self-Consistency	Google Research	+17.9% on GSM8K
Reflexion	MIT/Northeastern	+21% on HumanEval

Algorithm 14: Research-Weighted Selector

Data-driven strategy selection based on claim analysis:

Analyzes claim characteristics (complexity, domain, structure)
Matches to research evidence for optimal strategy
Calculates weighted scores based on peer-reviewed improvements
Returns ranked strategy assignments with expected improvement

Claim Analysis â Characteristic Extraction â Evidence Matching â Weighted Scoring â Strategy Assignment

Algorithm 15: Strategy Enforcement Engine

Injects strategy guidance directly into prompts:

Builds structured prompt sections for required strategies
Adds validation rules for response structure
Calculates overhead and compliance requirements
Supports strict, conservative, and lenient modes

Algorithm 16: Strategy Compliance Validator

Validates LLM responses follow required strategy structure:

Checks for required structural elements
Detects violations with severity levels
Triggers retry prompts for non-compliant responses
Supports configurable strictness levels

Algorithm 17: Strategy Effectiveness Tracker

Feedback loop for continuous improvement:

Records actual confidence gains vs expected
Detects underperformance (>15% below expected)
Detects overperformance (>15% above expected)
Generates effectiveness reports for strategy tuning

KPIs Tracked:

Metric	Description	Expected
Strategy Hit Rate	Correct strategy selected	>85%
Compliance Rate	Responses follow structure	>90%
Improvement Delta	Actual vs expected gain	Â±10%
Underperformance Alerts	Strategy not working	<5%

15 RAG-Enhanced Thinking Strategies

All strategies now support codebase context via RAG integration.

When a codebaseId is provided, each strategy:

Retrieves relevant code patterns from the RAG engine
Extracts domain entities and architectural patterns
Generates contextual examples from actual codebase
Enriches reasoning with project-specific knowledge

Research-Based Strategy Prioritization

Based on MIT/Stanford/Harvard/Anthropic/OpenAI/DeepSeek research (2024-2025):

Tier	Strategies	Research Basis	License
Tier 1 (Most Effective)	TRM, verified_reasoning, self_consistency	Anthropic extended thinking, OpenAI o1/o3 test-time compute	Licensed
Tier 2 (Highly Effective)	tree_of_thoughts, graph_of_thoughts, react, reflexion	Stanford ToT paper, MIT GoT research, DeepSeek R1	Licensed
Tier 3 (Contextual)	few_shot, meta_prompting, plan_and_solve, problem_analysis	RAG-enhanced example generation, Meta AI research	Licensed
Tier 4 (Basic)	zero_shot, chain_of_thought	Direct prompting (baseline)	Free

Strategy Details with RAG Integration

Strategy	Use Case	RAG Enhancement	License
TRM	Extended thinking with statistical halting	Uses codebase patterns for confidence calibration	Licensed
Verified-Reasoning	Integration with verification engine	RAG context for claim verification	Licensed
Self-Consistency	Multiple paths with voting	Codebase examples guide path generation	Licensed
Tree-of-Thoughts	Branching exploration with evaluation	Domain entities inform branch scoring	Licensed
Graph-of-Thoughts	Multi-hop reasoning with connections	Architecture patterns enrich graph nodes	Licensed
ReAct	Reasoning + Action cycles	Code patterns inform action selection	Licensed
Reflexion	Self-reflection with memory	Historical patterns guide reflection	Licensed
Few-Shot	Example-based reasoning	RAG-generated examples from codebase	Licensed
Meta-Prompting	Dynamic strategy selection	Context-aware strategy routing	Licensed
Plan-and-Solve	Structured planning with verification	Existing code guides plan decomposition	Licensed
Problem-Analysis	Deep problem decomposition	Codebase structure informs analysis	Licensed
Generate-Knowledge	Knowledge generation before reasoning	RAG provides domain knowledge	Licensed
Prompt-Chaining	Sequential prompt execution	Chain steps informed by patterns	Licensed
Multimodal-CoT	Vision-integrated reasoning	Combines vision + codebase context	Licensed
Zero-Shot	Direct reasoning without examples	Baseline strategy	Free
Chain-of-Thought	Step-by-step reasoning	Baseline strategy	Free

Free Tier Strategy Degradation

When a licensed strategy is requested on free tier:

Request: tree_of_thoughts â Degrades to: chain_of_thought
Request: verified_reasoning â Degrades to: chain_of_thought
Request: meta_prompting â Degrades to: chain_of_thought

All advanced strategies gracefully degrade to chain_of_thought for free users.

RAG ENGINE (Contextual BM25 – +49% Precision)

The Innovation

Prepend LLM-generated context to chunks BEFORE indexing:

Original: "func login(email: String, password: String)"

Enriched: "Context: This function handles user authentication
           by validating credentials against the database.

           func login(email: String, password: String)"

Result: BM25 now matches "authentication" queries!

Hybrid Search

Vector similarity: 70% weight
BM25 full-text: 30% weight
Reciprocal Rank Fusion (k=60)
Critical mass limits: 5-10 chunks optimal, max 25

Integration with All 15 Thinking Strategies

Every thinking strategy now accepts a codebaseId parameter for RAG enrichment:

// Example: Few-Shot with RAG-enhanced examples
let result = try await executor.execute(
    strategy: .fewShot(examples: []),  // Empty = auto-generate from codebase
    problem: "Design user authentication",
    context: userContext,
    constraints: [],
    codebaseId: projectId  // RAG retrieves relevant patterns
)

RAG-Enhanced Features per Strategy:

Strategy	RAG Feature Used
Few-Shot	Generates contextual examples from actual code patterns
Self-Consistency	Uses codebase patterns to diversify reasoning paths
Generate-Knowledge	Retrieves domain knowledge from indexed codebase
Tree-of-Thoughts	Domain entities inform branch exploration
Graph-of-Thoughts	Architecture patterns enrich node connections
Problem-Analysis	Codebase structure guides decomposition

Pattern Extraction from RAG Context:

The RAG engine extracts and provides:

Architectural Patterns: Repository, Service, Factory, Observer, Strategy, MVVM, Clean Architecture
Domain Entities: Structs, classes, protocols, enums from the codebase
Code Patterns: REST API, Event-Driven, CRUD operations

JUDGES CONFIGURATION

Zero-Config (2 Judges)

Judge	How	API Key
Claude	This session	None
Apple Intelligence	On-device	None (macOS 26+)

Optional

Judge	Variable
OpenAI	OPENAI_API_KEY
Gemini	GEMINI_API_KEY
Bedrock	AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY
OpenRouter	OPENROUTER_API_KEY

OUTPUT QUALITY CHECKLIST

Before delivering PRD, I verify:

SQL DDL:

CREATE TABLE with constraints
Foreign keys with ON DELETE
CHECK constraints
Custom ENUMs
GIN index (full-text)
HNSW index (vectors)
Row-Level Security
Materialized views

Domain Models:

All properties typed
Static business rule constants
Computed properties
Throwing initializer
Error enum with cases

API:

Exact REST routes
All CRUD + search
Rate limits specified
Auth requirements

Requirements:

Numbered FR-001+
Priority [P0/P1/P2]
NFRs with metrics

Acceptance Criteria (with KPIs):

Every AC uses GIVEN-WHEN-THEN format
Every AC has quantified success metric
Every AC has Baseline (or “N/A – new feature”)
Every AC has Target threshold
Every AC has Measurement method (tool/dashboard/script)
Every AC links to Business Goal (BG-XXX) or NFR
Happy path, error, and edge case ACs present
No vague words (“efficient”, “fast”, “proper”)

JIRA:

Story points (fibonacci)
Task breakdowns
Acceptance checkboxes

Roadmap:

Phases with weeks
SP per phase
Total estimate

TROUBLESHOOTING

# Build
cd library && swift build

# RAG database
docker ps | grep ai-prd-rag-db

# Vision
echo $ANTHROPIC_API_KEY

VERSION HISTORY

v4.5.0: Complete 8-type PRD context system (added CI/CD) – final template set for BAs and PMs
v4.4.0: Extended context-aware PRD generation to 7 types (added poc/mvp/release) with context-specific sections, clarification questions, RAG focus, and strategy selection
v4.3.0: Context-aware PRD generation (proposal/feature/bug/incident) with adaptive depth, context-specific sections, and RAG depth optimization
v4.2.0: Real-time LLM streaming across all 15 thinking strategies with automatic fallback
v4.1.0: License-aware tiered architecture + RAG integration for all 15 strategies + Research-based prioritization (MIT/Stanford/Harvard/Anthropic/OpenAI/DeepSeek)
v4.0.0: Meta-Prompting Engine with 15 strategies + 6 cross-enhancement innovations + 30+ KPIs
v3.0.0: Enterprise output + 6 verification algorithms
v2.0.0: Contextual BM25 RAG (+49% precision)
v1.0.0: Foundation

Ready! Share requirements, mockups, or codebase path. I’ll detect the PRD context type, ask context-appropriate clarification questions until you say “proceed”, then generate a depth-adapted PRD with complete SQL DDL, domain models, API specs, and verifiable reasoning metrics.

PRD Context Types (8):

Proposal: 7 sections, business-focused, light RAG (1 hop)
Feature: 11 sections, full technical depth, deep RAG (3 hops)
Bug: 6 sections, root cause analysis, focused RAG (3 hops)
Incident: 8 sections, forensic investigation, exhaustive RAG (4 hops)
POC: 5 sections, feasibility validation, moderate RAG (2 hops)
MVP: 8 sections, core value focus, moderate RAG (2 hops)
Release: 10 sections, production readiness, deep RAG (3 hops)
CI/CD: 9 sections, pipeline automation, deep RAG (3 hops)

License Status:

Free tier: Basic strategies (zero_shot, chain_of_thought), 3 clarification rounds max, basic verification
Licensed tier: All 15 RAG-enhanced strategies with research-based prioritization, unlimited clarification, full verification engine, context-aware depth adaptation

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台