ai-rag

📁 vasilyu1983/ai-agents-public 📅 Jan 23, 2026
29
总安装量
29
周安装量
#7094
全站排名
安装命令
npx skills add https://github.com/vasilyu1983/ai-agents-public --skill ai-rag

Agent 安装分布

claude-code 19
cursor 18
gemini-cli 17
opencode 16
antigravity 15
codex 15

Skill 文档

RAG & Search Engineering — Complete Reference

Build production-grade retrieval systems with hybrid search, grounded generation, and measurable quality.

This skill covers:

  • RAG: Chunking, contextual retrieval, grounding, adaptive/self-correcting systems
  • Search: BM25, vector search, hybrid fusion, ranking pipelines
  • Evaluation: recall@k, nDCG, MRR, groundedness metrics

Modern Best Practices (Jan 2026):

Default posture: deterministic pipeline, bounded context, explicit failure handling, and telemetry for every stage.

Scope note: For prompt structure and output contracts used in the generation phase, see ai-prompt-engineering.

Quick Reference

Task Tool/Framework Command/Pattern When to Use
Decide RAG vs alternatives Decision framework RAG if: freshness + citations + corpus size; else: fine-tune/caching Avoid unnecessary retrieval latency/complexity
Chunking & parsing Chunker + parser Start simple; add structure-aware chunking per doc type Ingestion for docs, code, tables, PDFs
Retrieval Sparse + dense (hybrid) Fusion (e.g., RRF) + metadata filters + top-k tuning Mixed query styles; high recall requirements
Precision boost Reranker Cross-encoder/LLM rerank of top-k candidates When top-k contains near-misses/noise
Grounding Output contract + citations Quote/ID citations; answerability gate; refuse on missing evidence Compliance, trust, and auditability
Evaluation Offline + online eval Retrieval metrics + answer metrics + regression tests Prevent silent regressions and staleness failures

Decision Tree: RAG Architecture Selection

Building RAG system: [Architecture Path]
    ├─ Document type?
    │   ├─ Page/section-structured? → Structure-aware chunking (pages/sections + metadata)
    │   ├─ Technical docs/code? → Structure-aware + code-aware chunking (symbols, headers)
    │   └─ Simple content? → Fixed-size token chunking with overlap (baseline)
    │
    ├─ Retrieval accuracy low?
    │   ├─ Query ambiguity? → Query rewriting + multi-query expansion + filters
    │   ├─ Noisy results? → Add reranker + better metadata filters
    │   └─ Mixed queries? → Hybrid retrieval (sparse + dense) + reranking
    │
    ├─ Dataset size?
    │   ├─ <100k chunks? → Flat index (exact search)
    │   ├─ 100k-10M? → HNSW (low latency)
    │   └─ >10M? → IVF/ScaNN/DiskANN (scalable)
    │
    └─ Production quality?
        └─ Add: ACLs, freshness/invalidation, eval gates, and telemetry (end-to-end)

Core Concepts (Vendor-Agnostic)

  • Pipeline stages: ingest → chunk → embed → index → retrieve → rerank → pack context → generate → verify.
  • Two evaluation planes: retrieval relevance (did we fetch the right evidence?) vs generation fidelity (did we use it correctly?).
  • Freshness model: staleness budget, invalidation triggers, and rebuild strategy (incremental vs full).
  • Trust boundaries: retrieved content is untrusted; apply the same rigor as user input (OWASP LLM Top 10: https://owasp.org/www-project-top-10-for-large-language-model-applications/).

Implementation Practices (Tooling Examples)

  • Use a retrieval API contract: query, filters, top_k, trace_id, and returned evidence IDs.
  • Instrument each stage with tracing/metrics (OpenTelemetry GenAI semantic conventions: https://opentelemetry.io/docs/specs/semconv/gen-ai/).
  • Add caches deliberately: embeddings cache, retrieval cache (query+filters), and response cache (with invalidation).

Do / Avoid

Do

  • Do keep retrieval deterministic: fixed top_k, stable ranking, explicit filters.
  • Do enforce document-level ACLs at retrieval time (not only at generation time).
  • Do include citations with stable IDs and verify citation coverage in tests.

Avoid

  • Avoid shipping RAG without a test set and regression gate.
  • Avoid “stuff everything” context packing; it increases cost and can reduce accuracy.
  • Avoid mixing corpora without metadata and tenant isolation.

When to Use This Skill

Use this skill when the user asks:

  • “Help me design a RAG pipeline.”
  • “How should I chunk this document?”
  • “Optimize retrieval for my use case.”
  • “My RAG system is hallucinating — fix it.”
  • “Choose the right vector database / index type.”
  • “Create a RAG evaluation framework.”
  • “Debug why retrieval gives irrelevant results.”

Tool/Model Recommendation Protocol

When users ask for vendor/model/framework recommendations, validate claims against current primary sources.

Triggers

  • “What’s the best vector database for [use case]?”
  • “What should I use for [chunking/embedding/reranking]?”
  • “What’s the latest in RAG development?”
  • “Current best practices for [retrieval/grounding/evaluation]?”
  • “Is [Pinecone/Qdrant/Chroma] still relevant in 2026?”
  • “[Vector DB A] vs [Vector DB B]?”
  • “Best embedding model for [use case]?”
  • “What RAG framework should I use?”

Required Checks

  1. Read data/sources.json and start from sources with "add_as_web_search": true.
  2. Verify 1-2 primary docs per recommendation (release notes, benchmarks, docs).
  3. If browsing isn’t available, state assumptions and give a verification checklist.

What to Report

After checking, provide:

  • Current landscape: What vector DBs/embeddings are popular NOW (not 6 months ago)
  • Emerging trends: Techniques gaining traction (late interaction, agentic RAG, graph RAG)
  • Deprecated/declining: Approaches or tools losing relevance
  • Recommendation: Based on fresh data, not just static knowledge

Example Topics (verify with current sources)

  • Vector databases (Pinecone, Qdrant, Weaviate, Milvus, pgvector, LanceDB)
  • Embedding models (OpenAI, Cohere, Voyage AI, Jina, Sentence Transformers)
  • Reranking (Cohere Rerank, Jina Reranker, FlashRank, RankGPT)
  • RAG frameworks (LlamaIndex, LangChain, Haystack, txtai)
  • Advanced RAG (contextual retrieval, agentic RAG, graph RAG, CRAG)
  • Evaluation (RAGAS, TruLens, DeepEval, BEIR)

Related Skills

For adjacent topics, reference these skills:

  • ai-llm – Prompting, fine-tuning, instruction datasets
  • ai-agents – Agentic RAG workflows and tool routing
  • ai-llm-inference – Serving performance, quantization, batching
  • ai-mlops – Deployment, monitoring, security, privacy, and governance
  • ai-prompt-engineering – Prompt patterns for RAG generation phase

Templates

System Design (Start Here)

Chunking & Ingestion

Embedding & Indexing

Retrieval & Reranking

Context Packaging & Grounding

Evaluation

Search Configuration

Query Rewriting

Navigation

Resources

Templates

Data

Use this skill whenever the user needs retrieval-augmented system design or debugging, not prompt work or deployment.