vector-databases

📁 melodic-software/claude-code-plugins 📅 Jan 24, 2026

总安装量

周安装量

#28853

全站排名

安装命令

npx skills add https://github.com/melodic-software/claude-code-plugins --skill vector-databases

Agent 安装分布

opencode 5

windsurf 4

antigravity 4

gemini-cli 4

codex 4

Skill 文档

Vector Databases

When to Use This Skill

Use this skill when:

Choosing between vector database options
Designing semantic/similarity search systems
Optimizing vector search performance
Understanding ANN algorithm trade-offs
Scaling vector search infrastructure
Implementing hybrid search (vectors + filters)

Keywords: vector database, embeddings, vector search, similarity search, ANN, approximate nearest neighbor, HNSW, IVF, FAISS, Pinecone, Weaviate, Milvus, Qdrant, Chroma, pgvector, cosine similarity, semantic search

Vector Database Comparison

Managed Services

Database	Strengths	Limitations	Best For
Pinecone	Fully managed, easy scaling, enterprise	Vendor lock-in, cost at scale	Enterprise production
Weaviate Cloud	GraphQL, hybrid search, modules	Complexity	Knowledge graphs
Zilliz Cloud	Milvus-based, high performance	Learning curve	High-scale production
MongoDB Atlas Vector	Existing MongoDB users	Newer feature	MongoDB shops
Elastic Vector	Existing Elastic stack	Resource heavy	Search platforms

Self-Hosted Options

Database	Strengths	Limitations	Best For
Milvus	Feature-rich, scalable, GPU support	Operational complexity	Large-scale production
Qdrant	Rust performance, filtering, easy	Smaller ecosystem	Performance-focused
Weaviate	Modules, semantic, hybrid	Memory usage	Knowledge applications
Chroma	Simple, Python-native	Limited scale	Development, prototyping
pgvector	PostgreSQL extension	Performance limits	Postgres shops
FAISS	Library, not DB, fastest	No persistence, no filtering	Research, embedded

Selection Decision Tree

Need managed, don't want operations?
âââ Yes â Pinecone (simplest) or Weaviate Cloud
âââ No (self-hosted)
    âââ Already using PostgreSQL?
        âââ Yes, <1M vectors â pgvector
        âââ No
            âââ Need maximum performance at scale?
                âââ Yes â Milvus or Qdrant
                âââ No
                    âââ Prototyping/development?
                        âââ Yes â Chroma
                        âââ No â Qdrant (balanced choice)

ANN Algorithms

Algorithm Overview

Exact KNN:
â¢ Search ALL vectors
â¢ O(n) time complexity
â¢ Perfect accuracy
â¢ Impractical at scale

Approximate NN (ANN):
â¢ Search SUBSET of vectors
â¢ O(log n) to O(1) complexity
â¢ Near-perfect accuracy
â¢ Practical at any scale

HNSW (Hierarchical Navigable Small World)

Layer 3: âââââââââââââââââââââââââ  (sparse, long connections)
          â                       â
Layer 2: âââââââââââââââââââââââââ  (medium density)
          â   â       â       â   â
Layer 1: âââââââââââââââââââââââââ  (denser)
          âââââââââââââââââââââââ
Layer 0: âââââââââââââââââââââââââ  (all vectors)

Search: Start at top layer, greedily descend
â¢ Fast: O(log n) search time
â¢ High recall: >95% typically
â¢ Memory: Extra graph storage

HNSW Parameters:

Parameter	Description	Trade-off
`M`	Connections per node	Memory vs. recall
`ef_construction`	Build-time search width	Build time vs. recall
`ef_search`	Query-time search width	Latency vs. recall

IVF (Inverted File Index)

Clustering Phase:
âââââââââââââââââââââââââââââââââââââââââââ
â     Cluster vectors into K centroids    â
â                                         â
â    â         â         â         â     â
â   /â\       /â\       /â\       /â\    â
â  âââââ     âââââ     âââââ     âââââ   â
â Cluster 1  Cluster 2 Cluster 3 Cluster 4â
âââââââââââââââââââââââââââââââââââââââââââ

Search Phase:
1. Find nprobe nearest centroids
2. Search only those clusters
3. Much faster than exhaustive

IVF Parameters:

Parameter	Description	Trade-off
`nlist`	Number of clusters	Build time vs. search quality
`nprobe`	Clusters to search	Latency vs. recall

IVF-PQ (Product Quantization)

Original Vector (128 dim):
[0.1, 0.2, ..., 0.9]  (128 Ã 4 bytes = 512 bytes)

PQ Compressed (8 subvectors, 8-bit codes):
[23, 45, 12, 89, 56, 34, 78, 90]  (8 bytes)

Memory reduction: 64x
Accuracy trade-off: ~5% recall drop

Algorithm Comparison

Algorithm	Search Speed	Memory	Build Time	Recall
Flat/Brute	Slow (O(n))	Low	None	100%
IVF	Fast	Low	Medium	90-95%
IVF-PQ	Very fast	Very low	Medium	85-92%
HNSW	Very fast	High	Slow	95-99%
HNSW+PQ	Very fast	Medium	Slow	90-95%

When to Use Which

< 100K vectors:
âââ Flat index (exact search is fast enough)

100K - 1M vectors:
âââ HNSW (best recall/speed trade-off)

1M - 100M vectors:
âââ Memory available â HNSW
âââ Memory constrained â IVF-PQ or HNSW+PQ

> 100M vectors:
âââ Sharded IVF-PQ or distributed HNSW

Distance Metrics

Common Metrics

Metric	Formula	Range	Best For
Cosine Similarity	`AÂ·B / (\|\|A\|\| \|\|B\|\|)`	[-1, 1]	Normalized embeddings
Dot Product	`AÂ·B`	(-â, â)	When magnitude matters
Euclidean (L2)	`âÎ£(A-B)Â²`	[0, â)	Absolute distances
Manhattan (L1)	`Î£\|A-B\|`	[0, â)	High-dimensional sparse

Metric Selection

Embeddings pre-normalized (unit vectors)?
âââ Yes â Cosine = Dot Product (use Dot, faster)
âââ No
    âââ Magnitude meaningful?
        âââ Yes â Dot Product
        âââ No â Cosine Similarity

Note: Most embedding models output normalized vectors
      â Dot product is usually the best choice

Filtering and Hybrid Search

Pre-filtering vs Post-filtering

Pre-filtering (Filter â Search):
âââââââââââââââââââââââââââââââââââââââââââ
â 1. Apply metadata filter               â
â    (category = "electronics")           â
â    Result: 10K of 1M vectors           â
â                                         â
â 2. Vector search on 10K vectors        â
â    Much faster, guaranteed filter match â
âââââââââââââââââââââââââââââââââââââââââââ

Post-filtering (Search â Filter):
âââââââââââââââââââââââââââââââââââââââââââ
â 1. Vector search on 1M vectors         â
â    Return top-1000                      â
â                                         â
â 2. Apply metadata filter               â
â    May return < K results!             â
âââââââââââââââââââââââââââââââââââââââââââ

Hybrid Search Architecture

Query: "wireless headphones under $100"
           â
     âââââââ´ââââââ
     â¼           â¼
 âââââââââ  âââââââââ
 âVector â  âFilter â
 âSearch â  â Build â
 â"wire- â  âprice  â
 âless   â  â< 100  â
 âhead-  â  â       â
 âphones"â  â       â
 âââââââââ  âââââââââ
     â           â
     âââââââ¬ââââââ
           â¼
    âââââââââââââ
    â  Combine  â
    â  Results  â
    âââââââââââââ

Metadata Index Design

Metadata Type	Index Strategy	Query Example
Categorical	Bitmap/hash index	category = “books”
Numeric range	B-tree	price BETWEEN 10 AND 50
Keyword search	Inverted index	tags CONTAINS “sale”
Geospatial	R-tree/geohash	location NEAR (lat, lng)

Scaling Strategies

Sharding Approaches

Naive Sharding (by ID):
âââââââââââ âââââââââââ âââââââââââ
â Shard 1 â â Shard 2 â â Shard 3 â
â IDs 0-N â âIDs N-2N â âIDs 2N-3Nâ
âââââââââââ âââââââââââ âââââââââââ
Query â Search ALL shards â Merge results

Semantic Sharding (by cluster):
âââââââââââ âââââââââââ âââââââââââ
â Shard 1 â â Shard 2 â â Shard 3 â
â Tech    â â Health  â â Finance â
â docs    â â docs    â â docs    â
âââââââââââ âââââââââââ âââââââââââ
Query â Route to relevant shard(s) â Faster!

Replication

âââââââââââââââââââââââââââââââââââââââââââ
â              Load Balancer              â
âââââââââââââââââââââââââââââââââââââââââââ
         â           â           â
         â¼           â¼           â¼
    âââââââââââ âââââââââââ âââââââââââ
    âReplica 1â âReplica 2â âReplica 3â
    â  (Read) â â  (Read) â â  (Read) â
    âââââââââââ âââââââââââ âââââââââââ
         â           â           â
         âââââââââââââ¼ââââââââââââ
                     â
                âââââââââââ
                â Primary â
                â (Write) â
                âââââââââââ

Scaling Decision Matrix

Scale (vectors)	Architecture	Replication
< 1M	Single node	Optional
1-10M	Single node, more RAM	For HA
10-100M	Sharded, few nodes	Required
100M-1B	Sharded, many nodes	Required
> 1B	Sharded + tiered	Required

Performance Optimization

Index Build Optimization

Optimization	Description	Impact
Batch insertion	Insert in batches of 1K-10K	10x faster
Parallel build	Multi-threaded index construction	2-4x faster
Incremental index	Add to existing index	Avoids rebuild
GPU acceleration	Use GPU for training (IVF)	10-100x faster

Query Optimization

Optimization	Description	Impact
Warm cache	Keep index in memory	10x latency reduction
Query batching	Batch similar queries	Higher throughput
Reduce dimensions	PCA, random projection	2-4x faster
Early termination	Stop when “good enough”	Lower latency

Memory Optimization

Memory per vector:
ââââââââââââââââââââââââââââââââââââââââââ
â 1536 dims Ã 4 bytes = 6KB per vector   â
â                                        â
â 1M vectors:                            â
â   Raw: 6GB                             â
â   + HNSW graph: +2-4GB (M-dependent)   â
â   = 8-10GB total                       â
â                                        â
â With PQ (64 subquantizers):            â
â   1M vectors: ~64MB                    â
â   = 100x reduction                     â
ââââââââââââââââââââââââââââââââââââââââââ

Operational Considerations

Backup and Recovery

Strategy	Description	RPO/RTO
Snapshots	Periodic full backup	Hours
WAL replication	Write-ahead log streaming	Minutes
Real-time sync	Synchronous replication	Seconds

Monitoring Metrics

Metric	Description	Alert Threshold
Query latency p99	99th percentile latency	> 100ms
Recall	Search accuracy	< 90%
QPS	Queries per second	Capacity dependent
Memory usage	Index memory	> 80%
Index freshness	Time since last update	Domain dependent

Index Maintenance

âââââââââââââââââââââââââââââââââââââââââââ
â        Index Maintenance Tasks          â
âââââââââââââââââââââââââââââââââââââââââââ¤
â â¢ Compaction: Merge small segments      â
â â¢ Reindex: Rebuild degraded index       â
â â¢ Vacuum: Remove deleted vectors        â
â â¢ Optimize: Tune parameters             â
â                                         â
â Schedule during low-traffic periods     â
âââââââââââââââââââââââââââââââââââââââââââ

Common Patterns

Multi-Tenant Vector Search

Option 1: Namespace/Collection per tenant
âââââââââââââââââââââââââââââââââââââââââââ
â tenant_1_collection                     â
â tenant_2_collection                     â
â tenant_3_collection                     â
âââââââââââââââââââââââââââââââââââââââââââ
Pro: Complete isolation
Con: Many indexes, operational overhead

Option 2: Single collection + tenant filter
âââââââââââââââââââââââââââââââââââââââââââ
â shared_collection                       â
â   metadata: { tenant_id: "..." }        â
â   Pre-filter by tenant_id               â
âââââââââââââââââââââââââââââââââââââââââââ
Pro: Simpler operations
Con: Requires efficient filtering

Real-Time Updates

Write Path:
âââââââââââââââ    âââââââââââââââ    âââââââââââââââ
â   Write     â    â   Buffer    â    â   Merge     â
â   Request   âââââ¶â   (Memory)  âââââ¶â   to Index  â
âââââââââââââââ    âââââââââââââââ    âââââââââââââââ

Strategy:
1. Buffer writes in memory
2. Periodically merge to main index
3. Search: main index + buffer
4. Compact periodically

Embedding Versioning

Version 1 embeddings âââ
                       â
Version 2 embeddings âââ¼âââ¶ Parallel indexes during migration
                       â
                       â    âââââââââââââââââââââââ
                       âââââ¶â Gradual reindexing  â
                            â Blue-green switch   â
                            âââââââââââââââââââââââ

Cost Estimation

Storage Costs

Cost = (vectors Ã dimensions Ã bytes Ã replication) / GB Ã $/GB/month

Example:
10M vectors Ã 1536 dims Ã 4 bytes Ã 3 replicas = 184 GB
At $0.10/GB/month = $18.40/month storage

Note: Memory (for serving) costs more than storage

Compute Costs

Factors:
â¢ QPS (queries per second)
â¢ Latency requirements
â¢ Index type (HNSW needs more RAM)
â¢ Filtering complexity

Rule of thumb:
â¢ 1M vectors, HNSW, <50ms latency: 16GB RAM
â¢ 10M vectors, HNSW, <50ms latency: 64-128GB RAM
â¢ 100M vectors: Distributed system required

Related Skills

rag-architecture – Using vector databases in RAG systems
llm-serving-patterns – LLM inference with vector retrieval
ml-system-design – End-to-end ML pipeline design
estimation-techniques – Capacity planning for vector systems

Version History

v1.0.0 (2025-12-26): Initial release – Vector database patterns for systems design

Last Updated

Date: 2025-12-26

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台