vector-databases
10
总安装量
7
周安装量
#28853
全站排名
安装命令
npx skills add https://github.com/melodic-software/claude-code-plugins --skill vector-databases
Agent 安装分布
opencode
5
windsurf
4
antigravity
4
gemini-cli
4
codex
4
Skill 文档
Vector Databases
When to Use This Skill
Use this skill when:
- Choosing between vector database options
- Designing semantic/similarity search systems
- Optimizing vector search performance
- Understanding ANN algorithm trade-offs
- Scaling vector search infrastructure
- Implementing hybrid search (vectors + filters)
Keywords: vector database, embeddings, vector search, similarity search, ANN, approximate nearest neighbor, HNSW, IVF, FAISS, Pinecone, Weaviate, Milvus, Qdrant, Chroma, pgvector, cosine similarity, semantic search
Vector Database Comparison
Managed Services
| Database | Strengths | Limitations | Best For |
|---|---|---|---|
| Pinecone | Fully managed, easy scaling, enterprise | Vendor lock-in, cost at scale | Enterprise production |
| Weaviate Cloud | GraphQL, hybrid search, modules | Complexity | Knowledge graphs |
| Zilliz Cloud | Milvus-based, high performance | Learning curve | High-scale production |
| MongoDB Atlas Vector | Existing MongoDB users | Newer feature | MongoDB shops |
| Elastic Vector | Existing Elastic stack | Resource heavy | Search platforms |
Self-Hosted Options
| Database | Strengths | Limitations | Best For |
|---|---|---|---|
| Milvus | Feature-rich, scalable, GPU support | Operational complexity | Large-scale production |
| Qdrant | Rust performance, filtering, easy | Smaller ecosystem | Performance-focused |
| Weaviate | Modules, semantic, hybrid | Memory usage | Knowledge applications |
| Chroma | Simple, Python-native | Limited scale | Development, prototyping |
| pgvector | PostgreSQL extension | Performance limits | Postgres shops |
| FAISS | Library, not DB, fastest | No persistence, no filtering | Research, embedded |
Selection Decision Tree
Need managed, don't want operations?
âââ Yes â Pinecone (simplest) or Weaviate Cloud
âââ No (self-hosted)
âââ Already using PostgreSQL?
âââ Yes, <1M vectors â pgvector
âââ No
âââ Need maximum performance at scale?
âââ Yes â Milvus or Qdrant
âââ No
âââ Prototyping/development?
âââ Yes â Chroma
âââ No â Qdrant (balanced choice)
ANN Algorithms
Algorithm Overview
Exact KNN:
⢠Search ALL vectors
⢠O(n) time complexity
⢠Perfect accuracy
⢠Impractical at scale
Approximate NN (ANN):
⢠Search SUBSET of vectors
⢠O(log n) to O(1) complexity
⢠Near-perfect accuracy
⢠Practical at any scale
HNSW (Hierarchical Navigable Small World)
Layer 3: âââââââââââââââââââââââââ (sparse, long connections)
â â
Layer 2: âââââââââââââââââââââââââ (medium density)
â â â â â
Layer 1: âââââââââââââââââââââââââ (denser)
âââââââââââââââââââââââ
Layer 0: âââââââââââââââââââââââââ (all vectors)
Search: Start at top layer, greedily descend
⢠Fast: O(log n) search time
⢠High recall: >95% typically
⢠Memory: Extra graph storage
HNSW Parameters:
| Parameter | Description | Trade-off |
|---|---|---|
M |
Connections per node | Memory vs. recall |
ef_construction |
Build-time search width | Build time vs. recall |
ef_search |
Query-time search width | Latency vs. recall |
IVF (Inverted File Index)
Clustering Phase:
âââââââââââââââââââââââââââââââââââââââââââ
â Cluster vectors into K centroids â
â â
â â â â â â
â /â\ /â\ /â\ /â\ â
â âââââ âââââ âââââ âââââ â
â Cluster 1 Cluster 2 Cluster 3 Cluster 4â
âââââââââââââââââââââââââââââââââââââââââââ
Search Phase:
1. Find nprobe nearest centroids
2. Search only those clusters
3. Much faster than exhaustive
IVF Parameters:
| Parameter | Description | Trade-off |
|---|---|---|
nlist |
Number of clusters | Build time vs. search quality |
nprobe |
Clusters to search | Latency vs. recall |
IVF-PQ (Product Quantization)
Original Vector (128 dim):
[0.1, 0.2, ..., 0.9] (128 Ã 4 bytes = 512 bytes)
PQ Compressed (8 subvectors, 8-bit codes):
[23, 45, 12, 89, 56, 34, 78, 90] (8 bytes)
Memory reduction: 64x
Accuracy trade-off: ~5% recall drop
Algorithm Comparison
| Algorithm | Search Speed | Memory | Build Time | Recall |
|---|---|---|---|---|
| Flat/Brute | Slow (O(n)) | Low | None | 100% |
| IVF | Fast | Low | Medium | 90-95% |
| IVF-PQ | Very fast | Very low | Medium | 85-92% |
| HNSW | Very fast | High | Slow | 95-99% |
| HNSW+PQ | Very fast | Medium | Slow | 90-95% |
When to Use Which
< 100K vectors:
âââ Flat index (exact search is fast enough)
100K - 1M vectors:
âââ HNSW (best recall/speed trade-off)
1M - 100M vectors:
âââ Memory available â HNSW
âââ Memory constrained â IVF-PQ or HNSW+PQ
> 100M vectors:
âââ Sharded IVF-PQ or distributed HNSW
Distance Metrics
Common Metrics
| Metric | Formula | Range | Best For |
|---|---|---|---|
| Cosine Similarity | A·B / (||A|| ||B||) |
[-1, 1] | Normalized embeddings |
| Dot Product | A·B |
(-â, â) | When magnitude matters |
| Euclidean (L2) | âΣ(A-B)² |
[0, â) | Absolute distances |
| Manhattan (L1) | Σ|A-B| |
[0, â) | High-dimensional sparse |
Metric Selection
Embeddings pre-normalized (unit vectors)?
âââ Yes â Cosine = Dot Product (use Dot, faster)
âââ No
âââ Magnitude meaningful?
âââ Yes â Dot Product
âââ No â Cosine Similarity
Note: Most embedding models output normalized vectors
â Dot product is usually the best choice
Filtering and Hybrid Search
Pre-filtering vs Post-filtering
Pre-filtering (Filter â Search):
âââââââââââââââââââââââââââââââââââââââââââ
â 1. Apply metadata filter â
â (category = "electronics") â
â Result: 10K of 1M vectors â
â â
â 2. Vector search on 10K vectors â
â Much faster, guaranteed filter match â
âââââââââââââââââââââââââââââââââââââââââââ
Post-filtering (Search â Filter):
âââââââââââââââââââââââââââââââââââââââââââ
â 1. Vector search on 1M vectors â
â Return top-1000 â
â â
â 2. Apply metadata filter â
â May return < K results! â
âââââââââââââââââââââââââââââââââââââââââââ
Hybrid Search Architecture
Query: "wireless headphones under $100"
â
âââââââ´ââââââ
â¼ â¼
âââââââââ âââââââââ
âVector â âFilter â
âSearch â â Build â
â"wire- â âprice â
âless â â< 100 â
âhead- â â â
âphones"â â â
âââââââââ âââââââââ
â â
âââââââ¬ââââââ
â¼
âââââââââââââ
â Combine â
â Results â
âââââââââââââ
Metadata Index Design
| Metadata Type | Index Strategy | Query Example |
|---|---|---|
| Categorical | Bitmap/hash index | category = “books” |
| Numeric range | B-tree | price BETWEEN 10 AND 50 |
| Keyword search | Inverted index | tags CONTAINS “sale” |
| Geospatial | R-tree/geohash | location NEAR (lat, lng) |
Scaling Strategies
Sharding Approaches
Naive Sharding (by ID):
âââââââââââ âââââââââââ âââââââââââ
â Shard 1 â â Shard 2 â â Shard 3 â
â IDs 0-N â âIDs N-2N â âIDs 2N-3Nâ
âââââââââââ âââââââââââ âââââââââââ
Query â Search ALL shards â Merge results
Semantic Sharding (by cluster):
âââââââââââ âââââââââââ âââââââââââ
â Shard 1 â â Shard 2 â â Shard 3 â
â Tech â â Health â â Finance â
â docs â â docs â â docs â
âââââââââââ âââââââââââ âââââââââââ
Query â Route to relevant shard(s) â Faster!
Replication
âââââââââââââââââââââââââââââââââââââââââââ
â Load Balancer â
âââââââââââââââââââââââââââââââââââââââââââ
â â â
â¼ â¼ â¼
âââââââââââ âââââââââââ âââââââââââ
âReplica 1â âReplica 2â âReplica 3â
â (Read) â â (Read) â â (Read) â
âââââââââââ âââââââââââ âââââââââââ
â â â
âââââââââââââ¼ââââââââââââ
â
âââââââââââ
â Primary â
â (Write) â
âââââââââââ
Scaling Decision Matrix
| Scale (vectors) | Architecture | Replication |
|---|---|---|
| < 1M | Single node | Optional |
| 1-10M | Single node, more RAM | For HA |
| 10-100M | Sharded, few nodes | Required |
| 100M-1B | Sharded, many nodes | Required |
| > 1B | Sharded + tiered | Required |
Performance Optimization
Index Build Optimization
| Optimization | Description | Impact |
|---|---|---|
| Batch insertion | Insert in batches of 1K-10K | 10x faster |
| Parallel build | Multi-threaded index construction | 2-4x faster |
| Incremental index | Add to existing index | Avoids rebuild |
| GPU acceleration | Use GPU for training (IVF) | 10-100x faster |
Query Optimization
| Optimization | Description | Impact |
|---|---|---|
| Warm cache | Keep index in memory | 10x latency reduction |
| Query batching | Batch similar queries | Higher throughput |
| Reduce dimensions | PCA, random projection | 2-4x faster |
| Early termination | Stop when “good enough” | Lower latency |
Memory Optimization
Memory per vector:
ââââââââââââââââââââââââââââââââââââââââââ
â 1536 dims à 4 bytes = 6KB per vector â
â â
â 1M vectors: â
â Raw: 6GB â
â + HNSW graph: +2-4GB (M-dependent) â
â = 8-10GB total â
â â
â With PQ (64 subquantizers): â
â 1M vectors: ~64MB â
â = 100x reduction â
ââââââââââââââââââââââââââââââââââââââââââ
Operational Considerations
Backup and Recovery
| Strategy | Description | RPO/RTO |
|---|---|---|
| Snapshots | Periodic full backup | Hours |
| WAL replication | Write-ahead log streaming | Minutes |
| Real-time sync | Synchronous replication | Seconds |
Monitoring Metrics
| Metric | Description | Alert Threshold |
|---|---|---|
| Query latency p99 | 99th percentile latency | > 100ms |
| Recall | Search accuracy | < 90% |
| QPS | Queries per second | Capacity dependent |
| Memory usage | Index memory | > 80% |
| Index freshness | Time since last update | Domain dependent |
Index Maintenance
âââââââââââââââââââââââââââââââââââââââââââ
â Index Maintenance Tasks â
âââââââââââââââââââââââââââââââââââââââââââ¤
â ⢠Compaction: Merge small segments â
â ⢠Reindex: Rebuild degraded index â
â ⢠Vacuum: Remove deleted vectors â
â ⢠Optimize: Tune parameters â
â â
â Schedule during low-traffic periods â
âââââââââââââââââââââââââââââââââââââââââââ
Common Patterns
Multi-Tenant Vector Search
Option 1: Namespace/Collection per tenant
âââââââââââââââââââââââââââââââââââââââââââ
â tenant_1_collection â
â tenant_2_collection â
â tenant_3_collection â
âââââââââââââââââââââââââââââââââââââââââââ
Pro: Complete isolation
Con: Many indexes, operational overhead
Option 2: Single collection + tenant filter
âââââââââââââââââââââââââââââââââââââââââââ
â shared_collection â
â metadata: { tenant_id: "..." } â
â Pre-filter by tenant_id â
âââââââââââââââââââââââââââââââââââââââââââ
Pro: Simpler operations
Con: Requires efficient filtering
Real-Time Updates
Write Path:
âââââââââââââââ âââââââââââââââ âââââââââââââââ
â Write â â Buffer â â Merge â
â Request âââââ¶â (Memory) âââââ¶â to Index â
âââââââââââââââ âââââââââââââââ âââââââââââââââ
Strategy:
1. Buffer writes in memory
2. Periodically merge to main index
3. Search: main index + buffer
4. Compact periodically
Embedding Versioning
Version 1 embeddings âââ
â
Version 2 embeddings âââ¼âââ¶ Parallel indexes during migration
â
â âââââââââââââââââââââââ
âââââ¶â Gradual reindexing â
â Blue-green switch â
âââââââââââââââââââââââ
Cost Estimation
Storage Costs
Cost = (vectors à dimensions à bytes à replication) / GB à $/GB/month
Example:
10M vectors à 1536 dims à 4 bytes à 3 replicas = 184 GB
At $0.10/GB/month = $18.40/month storage
Note: Memory (for serving) costs more than storage
Compute Costs
Factors:
⢠QPS (queries per second)
⢠Latency requirements
⢠Index type (HNSW needs more RAM)
⢠Filtering complexity
Rule of thumb:
⢠1M vectors, HNSW, <50ms latency: 16GB RAM
⢠10M vectors, HNSW, <50ms latency: 64-128GB RAM
⢠100M vectors: Distributed system required
Related Skills
rag-architecture– Using vector databases in RAG systemsllm-serving-patterns– LLM inference with vector retrievalml-system-design– End-to-end ML pipeline designestimation-techniques– Capacity planning for vector systems
Version History
- v1.0.0 (2025-12-26): Initial release – Vector database patterns for systems design
Last Updated
Date: 2025-12-26