using-vector-databases
12
总安装量
6
周安装量
#26513
全站排名
安装命令
npx skills add https://github.com/ancoleman/ai-design-components --skill using-vector-databases
Agent 安装分布
opencode
6
gemini-cli
6
antigravity
5
github-copilot
5
codex
5
Skill 文档
Vector Databases for AI Applications
When to Use This Skill
Use this skill when implementing:
- RAG (Retrieval-Augmented Generation) systems for AI chatbots
- Semantic search capabilities (meaning-based, not just keyword)
- Recommendation systems based on similarity
- Multi-modal AI (unified search across text, images, audio)
- Document similarity and deduplication
- Question answering over private knowledge bases
Quick Decision Framework
1. Vector Database Selection
START: Choosing a Vector Database
EXISTING INFRASTRUCTURE?
ââ Using PostgreSQL already?
â ââ pgvector (<10M vectors, tight budget)
â See: references/pgvector.md
â
ââ No existing vector database?
â
ââ OPERATIONAL PREFERENCE?
â â
â ââ Zero-ops managed only
â â ââ Pinecone (fully managed, excellent DX)
â â See: references/pinecone.md
â â
â ââ Flexible (self-hosted or managed)
â â
â ââ SCALE: <100M vectors + complex filtering â
â â ââ Qdrant (RECOMMENDED)
â â ⢠Best metadata filtering
â â ⢠Built-in hybrid search (BM25 + Vector)
â â ⢠Self-host: Docker/K8s
â â ⢠Managed: Qdrant Cloud
â â See: references/qdrant.md
â â
â ââ SCALE: >100M vectors + GPU acceleration
â â ââ Milvus / Zilliz Cloud
â â See: references/milvus.md
â â
â ââ Embedded / No server
â â ââ LanceDB (serverless, edge deployment)
â â
â ââ Local prototyping
â ââ Chroma (simple API, in-memory)
2. Embedding Model Selection
REQUIREMENTS?
ââ Best quality (cost no object)
â ââ Voyage AI voyage-3 (1024d)
â ⢠9.74% better than OpenAI on MTEB
â ⢠~$0.12/1M tokens
â See: references/embedding-strategies.md
â
ââ Enterprise reliability
â ââ OpenAI text-embedding-3-large (3072d)
â ⢠Industry standard
â ⢠~$0.13/1M tokens
â ⢠Maturity shortening: reduce to 256/512/1024d
â
ââ Cost-optimized
â ââ OpenAI text-embedding-3-small (1536d)
â ⢠~$0.02/1M tokens (6x cheaper)
â ⢠90-95% of large model performance
â
ââ Multilingual (100+ languages)
â ââ Cohere embed-v3 (1024d)
â ⢠~$0.10/1M tokens
â
ââ Self-hosted / Privacy-critical
ââ English: nomic-embed-text-v1.5 (768d, Apache 2.0)
ââ Multilingual: BAAI/bge-m3 (1024d, MIT)
ââ Long docs: jina-embeddings-v2 (768d, 8K context)
Core Concepts
Document Chunking Strategy
Recommended defaults for most RAG systems:
- Chunk size: 512 tokens (not characters)
- Overlap: 50 tokens (10% overlap)
Why these numbers?
- 512 tokens balances context vs. precision
- Too small (128-256): Fragments concepts, loses context
- Too large (1024-2048): Dilutes relevance, wastes LLM tokens
- 50 token overlap ensures sentences aren’t split mid-context
See references/chunking-patterns.md for advanced strategies by content type.
Hybrid Search (Vector + Keyword)
Hybrid Search = Vector Similarity + BM25 Keyword Matching
User Query: "OAuth refresh token implementation"
â
ââââââââ´âââââââ
â â
Vector Search Keyword Search
(Semantic) (BM25)
â â
Top 20 docs Top 20 docs
â â
ââââââââ¬âââââââ
â
Reciprocal Rank Fusion
(Merge + Re-rank)
â
Final Top 5 Results
Why hybrid matters:
- Vector captures semantic meaning (“OAuth refresh” â “token renewal”)
- Keyword ensures exact matches (“refresh_token” literal)
- Combined provides best retrieval quality
See references/hybrid-search.md for implementation details.
Getting Started
Python + Qdrant Example
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
# 1. Initialize client
client = QdrantClient("localhost", port=6333)
# 2. Create collection
client.create_collection(
collection_name="documents",
vectors_config=VectorParams(size=1024, distance=Distance.COSINE)
)
# 3. Insert documents with embeddings
points = [
PointStruct(
id=idx,
vector=embedding, # From OpenAI/Voyage/etc
payload={
"text": chunk_text,
"source": "docs/api.md",
"section": "Authentication"
}
)
for idx, (embedding, chunk_text) in enumerate(chunks)
]
client.upsert(collection_name="documents", points=points)
# 4. Search with metadata filtering
results = client.search(
collection_name="documents",
query_vector=query_embedding,
limit=5,
query_filter={
"must": [
{"key": "section", "match": {"value": "Authentication"}}
]
}
)
For complete examples, see examples/qdrant-python/.
TypeScript + Qdrant Example
import { QdrantClient } from '@qdrant/js-client-rest';
const client = new QdrantClient({ url: 'http://localhost:6333' });
// Create collection
await client.createCollection('documents', {
vectors: { size: 1024, distance: 'Cosine' }
});
// Insert documents
await client.upsert('documents', {
points: chunks.map((chunk, idx) => ({
id: idx,
vector: chunk.embedding,
payload: {
text: chunk.text,
source: chunk.source
}
}))
});
// Search
const results = await client.search('documents', {
vector: queryEmbedding,
limit: 5,
filter: {
must: [
{ key: 'source', match: { value: 'docs/api.md' } }
]
}
});
For complete examples, see examples/typescript-rag/.
RAG Pipeline Architecture
Complete Pipeline Components
1. INGESTION
ââ Document Loading (PDF, web, code, Office)
ââ Text Extraction & Cleaning
ââ Chunking (semantic, recursive, code-aware)
ââ Embedding Generation (batch, rate-limited)
2. INDEXING
ââ Vector Store Insertion (batch upsert)
ââ Index Configuration (HNSW, distance metric)
ââ Keyword Index (BM25 for hybrid search)
3. RETRIEVAL (Query Time)
ââ Query Processing (expansion, embedding)
ââ Hybrid Search (vector + keyword)
ââ Filtering & Post-Processing (metadata, MMR)
ââ Re-Ranking (cross-encoder, LLM-based)
4. GENERATION
ââ Context Construction (format chunks, citations)
ââ Prompt Engineering (system + context + query)
ââ LLM Inference (streaming, temperature tuning)
ââ Response Post-Processing (citations, validation)
5. EVALUATION (Production Critical)
ââ Retrieval Metrics (precision, recall, relevancy)
ââ Generation Metrics (faithfulness, correctness)
ââ System Metrics (latency, cost, satisfaction)
Essential Metadata for Production RAG
Critical for filtering and relevance:
metadata = {
# SOURCE TRACKING
"source": "docs/api-reference.md",
"source_type": "documentation", # code, docs, logs, chat
"last_updated": "2025-12-01T12:00:00Z",
# HIERARCHICAL CONTEXT
"section": "Authentication",
"subsection": "OAuth 2.1",
"heading_hierarchy": ["API Reference", "Authentication", "OAuth 2.1"],
# CONTENT CLASSIFICATION
"content_type": "code_example", # prose, code, table, list
"programming_language": "python",
# FILTERING DIMENSIONS
"product_version": "v2.0",
"audience": "enterprise", # free, pro, enterprise
# RETRIEVAL HINTS
"chunk_index": 3,
"total_chunks": 12,
"has_code": True
}
Why metadata matters:
- Enables filtering BEFORE vector search (reduces search space)
- Improves relevance through targeted retrieval
- Supports multi-tenant systems (filter by user/org)
- Enables versioned documentation (filter by product version)
Evaluation with RAGAS
Use scripts/evaluate_rag.py for automated evaluation:
from ragas import evaluate
from ragas.metrics import (
faithfulness, # Answer grounded in context
answer_relevancy, # Answer addresses query
context_recall, # Retrieved docs cover ground truth
context_precision # Retrieved docs are relevant
)
# Test dataset
test_data = {
"question": ["How do I refresh OAuth tokens?"],
"answer": ["Use /token with refresh_token grant..."],
"contexts": [["OAuth refresh documentation..."]],
"ground_truth": ["POST to /token with grant_type=refresh_token"]
}
# Evaluate
results = evaluate(test_data, metrics=[
faithfulness,
answer_relevancy,
context_recall,
context_precision
])
# Production targets:
# faithfulness: >0.90 (minimal hallucination)
# answer_relevancy: >0.85 (addresses user query)
# context_recall: >0.80 (sufficient context retrieved)
# context_precision: >0.75 (minimal noise)
Performance Optimization
Embedding Generation
- Batch processing: 100-500 chunks per batch
- Caching: Cache embeddings by content hash
- Rate limiting: Respect API provider limits (exponential backoff)
Vector Search
- Index type: HNSW (Hierarchical Navigable Small World) for most cases
- Distance metric: Cosine for normalized embeddings
- Pre-filtering: Apply metadata filters before vector search
- Result diversity: Use MMR (Maximal Marginal Relevance) to reduce redundancy
Cost Optimization
- Embedding model: Consider text-embedding-3-small for budget constraints
- Dimension reduction: Use maturity shortening (3072d â 1024d)
- Caching: Implement semantic caching for repeated queries
- Batch operations: Group insertions/updates for efficiency
Common Workflows
1. Building a RAG Chatbot
- Vector database: Qdrant (self-hosted or cloud)
- Embeddings: OpenAI text-embedding-3-large
- Chunking: 512 tokens, 50 overlap, semantic splitter
- Search: Hybrid (vector + BM25)
- Integration: Frontend with ai-chat skill
See examples/qdrant-python/ for complete implementation.
2. Semantic Search Engine
- Vector database: Qdrant or Pinecone
- Embeddings: Voyage AI voyage-3 (best quality)
- Chunking: Content-type specific (see chunking-patterns.md)
- Search: Hybrid with re-ranking
- Filtering: Pre-filter by metadata (date, category, etc.)
3. Code Search
- Vector database: Qdrant
- Embeddings: OpenAI text-embedding-3-large
- Chunking: AST-based (function/class boundaries)
- Metadata: Language, file path, imports
- Search: Hybrid with language filtering
See examples/qdrant-python/ for code-specific implementation.
Integration with Other Skills
Frontend Skills
- ai-chat: Vector DB powers RAG pipeline behind chat interface
- search-filter: Replace keyword search with semantic search
- data-viz: Visualize embedding spaces, similarity scores
Backend Skills
- databases-relational: Hybrid approach using pgvector extension
- api-patterns: Expose semantic search via REST/GraphQL
- observability: Monitor embedding quality and retrieval metrics
Multi-Language Support
Python (Primary)
- Client:
qdrant-client - Framework: LangChain, LlamaIndex
- See:
examples/qdrant-python/
Rust
- Client:
qdrant-client(1,549 code snippets in Context7) - Framework: Raw Rust for performance-critical systems
- See:
examples/rust-axum-vector/
TypeScript
- Client:
@qdrant/js-client-rest - Framework: LangChain.js, integration with Next.js
- See:
examples/typescript-rag/
Go
- Client:
qdrant-go - Use case: High-performance microservices
Troubleshooting
Poor Retrieval Quality
- Check chunking strategy (too large/small?)
- Verify metadata filtering (too restrictive?)
- Try hybrid search instead of vector-only
- Implement re-ranking stage
- Evaluate with RAGAS metrics
Slow Performance
- Use HNSW index (not Flat)
- Pre-filter with metadata before vector search
- Reduce vector dimensions (maturity shortening)
- Batch operations (insertions, searches)
- Consider GPU acceleration (Milvus)
High Costs
- Switch to text-embedding-3-small
- Implement semantic caching
- Reduce chunk overlap
- Use self-hosted embeddings (nomic, bge-m3)
- Batch embedding generation
Qdrant Context7 Documentation
Primary resource: /llmstxt/qdrant_tech_llms-full_txt
- Trust score: High
- Code snippets: 10,154
- Quality score: 83.1
Access via Context7:
resolve-library-id({ libraryName: "Qdrant" })
get-library-docs({
context7CompatibleLibraryID: "/llmstxt/qdrant_tech_llms-full_txt",
topic: "hybrid search collections python",
mode: "code"
})
Additional Resources
Reference Documentation
references/qdrant.md– Comprehensive Qdrant guidereferences/pgvector.md– PostgreSQL pgvector extensionreferences/milvus.md– Milvus/Zilliz for billion-scalereferences/embedding-strategies.md– Embedding model comparisonreferences/chunking-patterns.md– Advanced chunking techniques
Code Examples
examples/qdrant-python/– FastAPI + Qdrant RAG pipelineexamples/pgvector-prisma/– PostgreSQL + Prisma integrationexamples/typescript-rag/– TypeScript RAG with Hono
Automation Scripts
scripts/generate_embeddings.py– Batch embedding generationscripts/benchmark_similarity.py– Performance benchmarkingscripts/evaluate_rag.py– RAGAS-based evaluation
Next Steps:
- Choose vector database based on scale and infrastructure
- Select embedding model based on quality vs. cost trade-off
- Implement chunking strategy for the content type
- Set up hybrid search for production quality
- Evaluate with RAGAS metrics
- Optimize for performance and cost