rag-patterns
2
总安装量
1
周安装量
#63945
全站排名
安装命令
npx skills add https://github.com/daniel-da-silva-alves/antigravity-agent-toolkit --skill rag-patterns
Agent 安装分布
amp
1
cline
1
openclaw
1
opencode
1
cursor
1
kimi-cli
1
Skill 文档
RAG Patterns & Best Practices
This skill provides a comprehensive guide to implementing robust RAG systems, from basic prototypes to advanced enterprise solutions.
ðï¸ RAG Architectures
1. Naive RAG (Prototype)
- Flow: User Query -> Embed -> Vector Search -> Retrieve Top-K -> LLM Generation.
- Use Case: Simple Q&A on small documents.
- Pros: Easy to implement.
- Cons: Low precision, hallucinations, no context awareness.
2. Advanced RAG (Optimization Focus)
- Pre-Retrieval: Query Expansion (Hypothetical Document Embeddings – HyDE), Query Decomposition, Query Routing.
- Retrieval: Hybrid Search (Dense Vector + Sparse Keyword/BM25).
- Post-Retrieval: Reranking (Cohere Rerank, Cross-Encoders) to re-sort results by relevance. context Compression.
- Use Case: Production Q&A, Knowledge bases.
3. Modular RAG (Component-Based)
- Concept: Treat retrieval, generation, and processing as separate modules that can be swapped or chained.
- Modules:
- Search Module: SerpAPI, Bing Search, Internal Docs.
- Memory Module: Conversation history.
- Fusion Module: Combine multiple retrieval sources (Reciprocal Rank Fusion).
- Use Case: Complex systems requiring multiple data sources.
4. GraphRAG (Structured Knowledge)
- Concept: Combine Vector Search with Knowledge Graphs (Neo4j, ArangoDB).
- Flow:
- Extract entities and relationships from docs to build a Graph.
- Query: Search both vector store (unstructured) and Graph (structured relationships).
- Traverse graph to find multi-hop answers.
- Use Case: Complex domains (Medical, Legal, Finance) where relationships matter more than semantic similarity.
5. Agentic RAG (Autonomous)
- Concept: An Agent uses “Tools” to retrieve information. It can formulate its own search queries, critique its own results, and iterate.
- Tools: Vector Store Tool, Web Search Tool, SQL Database Tool.
- Flow: User Query -> Agent Plan -> Call Tool -> Analyze Result -> (Loop) -> Answer.
- Use Case: Complex reasoning tasks requiring multi-step investigation.
ð Embedding & Chunking Strategies
Chunking
- Fixed-size: Simple character/token count (e.g., 512 tokens). Cons: Breaks semantic meaning.
- Recursive: Split by separators (Markdown headers, paragraphs) to keep context. Recommended.
- Semantic: Group text by semantic similarity (using embedding distance).
- Agentic Chunking: Use an LLM to propositionally chunk text.
Embedding Models
- OpenAI:
text-embedding-3-small(Fast, cheap),text-embedding-3-large(High accuracy). - Open Source:
bge-m3,e5-mistral-7b-instruct. excellent for multilingual or specific domains.
ð Retrieval Strategies
- Dense Retrieval: Semantic search using embeddings. Good for concept matching.
- Sparse Retrieval: Keyword search (BM25/Splade). Good for exact term matching (e.g., part numbers, names).
- Hybrid Search: Combine Dense + Sparse scores (alpha parameter). Gold Standard.
- Reranking: Always rerank top 50-100 results to top 5-10 for the LLM. Drastically improves accuracy.
ð¦ Context Assembly (Feeding the LLM)
The retrieved chunks are useless if poorly formatted. Context Assembly is how you build the prompt context from raw chunks.
The Pattern: XML-Delimited Documents
def assemble_context(retrieved_chunks: list[dict]) -> str:
"""Format retrieved chunks into a structured context block.
Uses XML tags for clear document boundaries and includes
source metadata for citation support.
"""
# Sort by relevance: most relevant at START and END (Lost-in-the-Middle defense)
if len(retrieved_chunks) > 4:
mid = len(retrieved_chunks) // 2
reordered = (
retrieved_chunks[:2] + # Top 2 at start
retrieved_chunks[mid:-2] + # Lower relevance in middle
retrieved_chunks[-2:] # Top 3-4 at end
)
else:
reordered = retrieved_chunks
context_parts = []
for i, chunk in enumerate(reordered, 1):
context_parts.append(
f'<document id="{i}" source="{chunk["source"]}" relevance="{chunk["score"]:.2f}">\n'
f'{chunk["text"]}\n'
f'</document>'
)
return "\n\n".join(context_parts)
# Usage:
# context = assemble_context(reranked_results)
# prompt = f"Based on the following documents:\n{context}\n\nAnswer: {user_query}"
Rules
- Always use delimiters (XML tags, not just newlines) â models understand boundaries.
- Include source metadata â enables citation in the answer.
- Reorder for Lost-in-the-Middle â put best docs at start and end.
- Limit total tokens â Keep context under 60% of the model’s window.
ðï¸ Context Compression
When retrieved chunks are too verbose, compress them before sending to the LLM. Saves tokens and improves focus.
Strategy 1: Extractive Compression (Fast, Cheap)
Keep only the most relevant sentences from each chunk.
from langchain.retrievers.document_compressors import LLMChainExtractor
from langchain_openai import ChatOpenAI
compressor = LLMChainExtractor.from_llm(ChatOpenAI(model="gpt-4o-mini", temperature=0))
# Compresses each chunk to only the sentences relevant to the query
compressed_docs = compressor.compress_documents(
documents=retrieved_docs,
query="What is the return policy?"
)
# 500-token chunk â 80-token extract (only relevant sentences)
Strategy 2: Abstractive Compression (Higher quality)
Summarize chunks into a dense context block.
COMPRESSION_PROMPT = """Summarize the following document in 2-3 sentences,
keeping ONLY information relevant to this question: {query}
Document:
{document}
Relevant summary:"""
When to Compress
- Yes: Chunks > 500 tokens, or > 5 chunks retrieved.
- No: Chunks are already concise (< 200 tokens), or factual precision is critical (compression may lose details).
â ï¸ Common Pitfalls
- “Lost in the Middle”: LLMs tend to forget information in the middle of a long context window. Put the most relevant context at the beginning and end.
- Hallucination: The model answers confidently but incorrectly. Fix: Use “stick to the context” instructions and implementation citations.
- Retrieval Failures: Relevant doc not retrieved. Fix: Improve chunking, use hybrid search, or query expansion.
ð Related Skills
| Need | Skill |
|---|---|
| Vector DB selection & schema | vector-databases |
| Data parsing (PDF, Web, APIs) | data-ingestion |
| Prompt design for generation step | prompt-engineering |
| Evaluation metrics & golden sets | rag-evaluation |
| Reduce cost with caching | semantic-cache |
| Security (RBAC, prompt injection) | ai-security |
| Tracing & monitoring | ai-observability |