ai-rag
29
总安装量
29
周安装量
#7094
全站排名
安装命令
npx skills add https://github.com/vasilyu1983/ai-agents-public --skill ai-rag
Agent 安装分布
claude-code
19
cursor
18
gemini-cli
17
opencode
16
antigravity
15
codex
15
Skill 文档
RAG & Search Engineering â Complete Reference
Build production-grade retrieval systems with hybrid search, grounded generation, and measurable quality.
This skill covers:
- RAG: Chunking, contextual retrieval, grounding, adaptive/self-correcting systems
- Search: BM25, vector search, hybrid fusion, ranking pipelines
- Evaluation: recall@k, nDCG, MRR, groundedness metrics
Modern Best Practices (Jan 2026):
- Separate retrieval quality from answer quality; evaluate both (RAG: https://arxiv.org/abs/2005.11401).
- Default to hybrid retrieval (sparse + dense) with reranking when precision matters (DPR: https://arxiv.org/abs/2004.04906).
- Use a failure taxonomy to debug systematically (Seven Failure Points in RAG: https://arxiv.org/abs/2401.05856).
- Treat freshness/invalidation as first-class; staleness is a correctness bug, not a UX issue.
- Add grounding gates: answerability checks, citation coverage checks, and refusal-on-missing-context defaults.
- Threat-model RAG: retrieved text is untrusted input (OWASP LLM Top 10: https://owasp.org/www-project-top-10-for-large-language-model-applications/).
Default posture: deterministic pipeline, bounded context, explicit failure handling, and telemetry for every stage.
Scope note: For prompt structure and output contracts used in the generation phase, see ai-prompt-engineering.
Quick Reference
| Task | Tool/Framework | Command/Pattern | When to Use |
|---|---|---|---|
| Decide RAG vs alternatives | Decision framework | RAG if: freshness + citations + corpus size; else: fine-tune/caching | Avoid unnecessary retrieval latency/complexity |
| Chunking & parsing | Chunker + parser | Start simple; add structure-aware chunking per doc type | Ingestion for docs, code, tables, PDFs |
| Retrieval | Sparse + dense (hybrid) | Fusion (e.g., RRF) + metadata filters + top-k tuning | Mixed query styles; high recall requirements |
| Precision boost | Reranker | Cross-encoder/LLM rerank of top-k candidates | When top-k contains near-misses/noise |
| Grounding | Output contract + citations | Quote/ID citations; answerability gate; refuse on missing evidence | Compliance, trust, and auditability |
| Evaluation | Offline + online eval | Retrieval metrics + answer metrics + regression tests | Prevent silent regressions and staleness failures |
Decision Tree: RAG Architecture Selection
Building RAG system: [Architecture Path]
ââ Document type?
â ââ Page/section-structured? â Structure-aware chunking (pages/sections + metadata)
â ââ Technical docs/code? â Structure-aware + code-aware chunking (symbols, headers)
â ââ Simple content? â Fixed-size token chunking with overlap (baseline)
â
ââ Retrieval accuracy low?
â ââ Query ambiguity? â Query rewriting + multi-query expansion + filters
â ââ Noisy results? â Add reranker + better metadata filters
â ââ Mixed queries? â Hybrid retrieval (sparse + dense) + reranking
â
ââ Dataset size?
â ââ <100k chunks? â Flat index (exact search)
â ââ 100k-10M? â HNSW (low latency)
â ââ >10M? â IVF/ScaNN/DiskANN (scalable)
â
ââ Production quality?
ââ Add: ACLs, freshness/invalidation, eval gates, and telemetry (end-to-end)
Core Concepts (Vendor-Agnostic)
- Pipeline stages: ingest â chunk â embed â index â retrieve â rerank â pack context â generate â verify.
- Two evaluation planes: retrieval relevance (did we fetch the right evidence?) vs generation fidelity (did we use it correctly?).
- Freshness model: staleness budget, invalidation triggers, and rebuild strategy (incremental vs full).
- Trust boundaries: retrieved content is untrusted; apply the same rigor as user input (OWASP LLM Top 10: https://owasp.org/www-project-top-10-for-large-language-model-applications/).
Implementation Practices (Tooling Examples)
- Use a retrieval API contract: query, filters, top_k, trace_id, and returned evidence IDs.
- Instrument each stage with tracing/metrics (OpenTelemetry GenAI semantic conventions: https://opentelemetry.io/docs/specs/semconv/gen-ai/).
- Add caches deliberately: embeddings cache, retrieval cache (query+filters), and response cache (with invalidation).
Do / Avoid
Do
- Do keep retrieval deterministic: fixed top_k, stable ranking, explicit filters.
- Do enforce document-level ACLs at retrieval time (not only at generation time).
- Do include citations with stable IDs and verify citation coverage in tests.
Avoid
- Avoid shipping RAG without a test set and regression gate.
- Avoid “stuff everything” context packing; it increases cost and can reduce accuracy.
- Avoid mixing corpora without metadata and tenant isolation.
When to Use This Skill
Use this skill when the user asks:
- “Help me design a RAG pipeline.”
- “How should I chunk this document?”
- “Optimize retrieval for my use case.”
- “My RAG system is hallucinating â fix it.”
- “Choose the right vector database / index type.”
- “Create a RAG evaluation framework.”
- “Debug why retrieval gives irrelevant results.”
Tool/Model Recommendation Protocol
When users ask for vendor/model/framework recommendations, validate claims against current primary sources.
Triggers
- “What’s the best vector database for [use case]?”
- “What should I use for [chunking/embedding/reranking]?”
- “What’s the latest in RAG development?”
- “Current best practices for [retrieval/grounding/evaluation]?”
- “Is [Pinecone/Qdrant/Chroma] still relevant in 2026?”
- “[Vector DB A] vs [Vector DB B]?”
- “Best embedding model for [use case]?”
- “What RAG framework should I use?”
Required Checks
- Read
data/sources.jsonand start from sources with"add_as_web_search": true. - Verify 1-2 primary docs per recommendation (release notes, benchmarks, docs).
- If browsing isn’t available, state assumptions and give a verification checklist.
What to Report
After checking, provide:
- Current landscape: What vector DBs/embeddings are popular NOW (not 6 months ago)
- Emerging trends: Techniques gaining traction (late interaction, agentic RAG, graph RAG)
- Deprecated/declining: Approaches or tools losing relevance
- Recommendation: Based on fresh data, not just static knowledge
Example Topics (verify with current sources)
- Vector databases (Pinecone, Qdrant, Weaviate, Milvus, pgvector, LanceDB)
- Embedding models (OpenAI, Cohere, Voyage AI, Jina, Sentence Transformers)
- Reranking (Cohere Rerank, Jina Reranker, FlashRank, RankGPT)
- RAG frameworks (LlamaIndex, LangChain, Haystack, txtai)
- Advanced RAG (contextual retrieval, agentic RAG, graph RAG, CRAG)
- Evaluation (RAGAS, TruLens, DeepEval, BEIR)
Related Skills
For adjacent topics, reference these skills:
- ai-llm – Prompting, fine-tuning, instruction datasets
- ai-agents – Agentic RAG workflows and tool routing
- ai-llm-inference – Serving performance, quantization, batching
- ai-mlops – Deployment, monitoring, security, privacy, and governance
- ai-prompt-engineering – Prompt patterns for RAG generation phase
Templates
System Design (Start Here)
Chunking & Ingestion
Embedding & Indexing
Retrieval & Reranking
Context Packaging & Grounding
Evaluation
Search Configuration
Query Rewriting
Navigation
Resources
- references/advanced-rag-patterns.md
- references/agentic-rag-patterns.md
- references/bm25-tuning.md
- references/chunking-patterns.md
- references/chunking-strategies.md
- references/rag-evaluation-guide.md
- references/rag-troubleshooting.md
- references/contextual-retrieval-guide.md
- references/distributed-search-slos.md
- references/grounding-checklists.md
- references/hybrid-fusion-patterns.md
- references/index-selection-guide.md
- references/multilingual-domain-patterns.md
- references/pipeline-architecture.md
- references/query-rewriting-patterns.md
- references/ranking-pipeline-guide.md
- references/retrieval-patterns.md
- references/search-debugging.md
- references/search-evaluation-guide.md
- references/user-feedback-learning.md
- references/vector-search-patterns.md
Templates
- assets/context/template-context-packing.md
- assets/context/template-grounding.md
- assets/design/rag-system-design.md
- assets/chunking/template-basic-chunking.md
- assets/chunking/template-code-chunking.md
- assets/chunking/template-long-doc-chunking.md
- assets/retrieval/template-retrieval-pipeline.md
- assets/retrieval/template-hybrid-search.md
- assets/retrieval/template-reranking.md
- assets/eval/template-rag-eval.md
- assets/eval/template-rag-testset.jsonl
- assets/eval/template-search-eval.md
- assets/eval/template-search-testset.jsonl
- assets/indexing/template-index-config.md
- assets/indexing/template-metadata-schema.md
- assets/query/template-query-rewrite.md
- assets/ranking/template-ranking-pipeline.md
- assets/ranking/template-reranker.md
- assets/search/template-bm25-config.md
- assets/search/template-hnsw-config.md
- assets/search/template-ivf-config.md
- assets/search/template-hybrid-config.md
Data
- data/sources.json â Curated external references
Use this skill whenever the user needs retrieval-augmented system design or debugging, not prompt work or deployment.