ai-rag

📁 vasilyu1983/ai-agents-public 📅 Jan 23, 2026

总安装量

周安装量

#7094

全站排名

安装命令

npx skills add https://github.com/vasilyu1983/ai-agents-public --skill ai-rag

Agent 安装分布

claude-code 19

cursor 18

gemini-cli 17

opencode 16

antigravity 15

codex 15

Skill 文档

RAG & Search Engineering â Complete Reference

Build production-grade retrieval systems with hybrid search, grounded generation, and measurable quality.

This skill covers:

RAG: Chunking, contextual retrieval, grounding, adaptive/self-correcting systems
Search: BM25, vector search, hybrid fusion, ranking pipelines
Evaluation: recall@k, nDCG, MRR, groundedness metrics

Modern Best Practices (Jan 2026):

Separate retrieval quality from answer quality; evaluate both (RAG: https://arxiv.org/abs/2005.11401).
Default to hybrid retrieval (sparse + dense) with reranking when precision matters (DPR: https://arxiv.org/abs/2004.04906).
Use a failure taxonomy to debug systematically (Seven Failure Points in RAG: https://arxiv.org/abs/2401.05856).
Treat freshness/invalidation as first-class; staleness is a correctness bug, not a UX issue.
Add grounding gates: answerability checks, citation coverage checks, and refusal-on-missing-context defaults.
Threat-model RAG: retrieved text is untrusted input (OWASP LLM Top 10: https://owasp.org/www-project-top-10-for-large-language-model-applications/).

Default posture: deterministic pipeline, bounded context, explicit failure handling, and telemetry for every stage.

Scope note: For prompt structure and output contracts used in the generation phase, see ai-prompt-engineering.

Quick Reference

Task	Tool/Framework	Command/Pattern	When to Use
Decide RAG vs alternatives	Decision framework	RAG if: freshness + citations + corpus size; else: fine-tune/caching	Avoid unnecessary retrieval latency/complexity
Chunking & parsing	Chunker + parser	Start simple; add structure-aware chunking per doc type	Ingestion for docs, code, tables, PDFs
Retrieval	Sparse + dense (hybrid)	Fusion (e.g., RRF) + metadata filters + top-k tuning	Mixed query styles; high recall requirements
Precision boost	Reranker	Cross-encoder/LLM rerank of top-k candidates	When top-k contains near-misses/noise
Grounding	Output contract + citations	Quote/ID citations; answerability gate; refuse on missing evidence	Compliance, trust, and auditability
Evaluation	Offline + online eval	Retrieval metrics + answer metrics + regression tests	Prevent silent regressions and staleness failures

Decision Tree: RAG Architecture Selection

Building RAG system: [Architecture Path]
    ââ Document type?
    â   ââ Page/section-structured? â Structure-aware chunking (pages/sections + metadata)
    â   ââ Technical docs/code? â Structure-aware + code-aware chunking (symbols, headers)
    â   ââ Simple content? â Fixed-size token chunking with overlap (baseline)
    â
    ââ Retrieval accuracy low?
    â   ââ Query ambiguity? â Query rewriting + multi-query expansion + filters
    â   ââ Noisy results? â Add reranker + better metadata filters
    â   ââ Mixed queries? â Hybrid retrieval (sparse + dense) + reranking
    â
    ââ Dataset size?
    â   ââ <100k chunks? â Flat index (exact search)
    â   ââ 100k-10M? â HNSW (low latency)
    â   ââ >10M? â IVF/ScaNN/DiskANN (scalable)
    â
    ââ Production quality?
        ââ Add: ACLs, freshness/invalidation, eval gates, and telemetry (end-to-end)

Core Concepts (Vendor-Agnostic)

Pipeline stages: ingest â chunk â embed â index â retrieve â rerank â pack context â generate â verify.
Two evaluation planes: retrieval relevance (did we fetch the right evidence?) vs generation fidelity (did we use it correctly?).
Freshness model: staleness budget, invalidation triggers, and rebuild strategy (incremental vs full).
Trust boundaries: retrieved content is untrusted; apply the same rigor as user input (OWASP LLM Top 10: https://owasp.org/www-project-top-10-for-large-language-model-applications/).

Implementation Practices (Tooling Examples)

Use a retrieval API contract: query, filters, top_k, trace_id, and returned evidence IDs.
Instrument each stage with tracing/metrics (OpenTelemetry GenAI semantic conventions: https://opentelemetry.io/docs/specs/semconv/gen-ai/).
Add caches deliberately: embeddings cache, retrieval cache (query+filters), and response cache (with invalidation).

Do / Avoid

Do keep retrieval deterministic: fixed top_k, stable ranking, explicit filters.
Do enforce document-level ACLs at retrieval time (not only at generation time).
Do include citations with stable IDs and verify citation coverage in tests.

Avoid

Avoid shipping RAG without a test set and regression gate.
Avoid “stuff everything” context packing; it increases cost and can reduce accuracy.
Avoid mixing corpora without metadata and tenant isolation.

When to Use This Skill

Use this skill when the user asks:

“Help me design a RAG pipeline.”
“How should I chunk this document?”
“Optimize retrieval for my use case.”
“My RAG system is hallucinating â fix it.”
“Choose the right vector database / index type.”
“Create a RAG evaluation framework.”
“Debug why retrieval gives irrelevant results.”

Tool/Model Recommendation Protocol

When users ask for vendor/model/framework recommendations, validate claims against current primary sources.

Triggers

“What’s the best vector database for [use case]?”
“What should I use for [chunking/embedding/reranking]?”
“What’s the latest in RAG development?”
“Current best practices for [retrieval/grounding/evaluation]?”
“Is [Pinecone/Qdrant/Chroma] still relevant in 2026?”
“[Vector DB A] vs [Vector DB B]?”
“Best embedding model for [use case]?”
“What RAG framework should I use?”

Required Checks

Read data/sources.json and start from sources with "add_as_web_search": true.
Verify 1-2 primary docs per recommendation (release notes, benchmarks, docs).
If browsing isn’t available, state assumptions and give a verification checklist.

What to Report

After checking, provide:

Current landscape: What vector DBs/embeddings are popular NOW (not 6 months ago)
Emerging trends: Techniques gaining traction (late interaction, agentic RAG, graph RAG)
Deprecated/declining: Approaches or tools losing relevance
Recommendation: Based on fresh data, not just static knowledge

Example Topics (verify with current sources)

Vector databases (Pinecone, Qdrant, Weaviate, Milvus, pgvector, LanceDB)
Embedding models (OpenAI, Cohere, Voyage AI, Jina, Sentence Transformers)
Reranking (Cohere Rerank, Jina Reranker, FlashRank, RankGPT)
RAG frameworks (LlamaIndex, LangChain, Haystack, txtai)
Advanced RAG (contextual retrieval, agentic RAG, graph RAG, CRAG)
Evaluation (RAGAS, TruLens, DeepEval, BEIR)

Related Skills

For adjacent topics, reference these skills:

ai-llm – Prompting, fine-tuning, instruction datasets
ai-agents – Agentic RAG workflows and tool routing
ai-llm-inference – Serving performance, quantization, batching
ai-mlops – Deployment, monitoring, security, privacy, and governance
ai-prompt-engineering – Prompt patterns for RAG generation phase

Templates

System Design (Start Here)

RAG System Design

Chunking & Ingestion

Embedding & Indexing

Retrieval & Reranking

Context Packaging & Grounding

Evaluation

Search Configuration

Query Rewriting

Query Rewrite

Navigation

Resources

Templates

Data

data/sources.json â Curated external references

Use this skill whenever the user needs retrieval-augmented system design or debugging, not prompt work or deployment.

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台