vector-search
npx skills add https://github.com/cleanexpo/nodejs-starter-v1 --skill vector-search
Agent 安装分布
Skill 文档
Vector Search – Embedding Queries & Similarity Search
Codifies the project’s dual vector search systems (Memory Store for agent domain knowledge, RAG Pipeline for document retrieval), the multi-provider embedding abstraction, pgvector indexing, hybrid search scoring, and chunking strategies. All patterns are built on Supabase/PostgreSQL with pgvector.
Description
Codifies pgvector embedding queries, similarity search, hybrid search, and multi-provider embedding generation for NodeJS-Starter-V1’s Supabase/PostgreSQL stack, covering the Memory Store and RAG Pipeline vector infrastructure, indexing strategies, and chunking patterns.
When to Apply
Positive Triggers
- Adding semantic search to new data types
- Creating or modifying embedding generation logic
- Implementing similarity queries or nearest-neighbour lookups
- Configuring chunking strategies for document ingestion
- Tuning search relevance (thresholds, weights, reranking)
- Adding new embedding providers
- User mentions: “vector”, “embedding”, “semantic search”, “similarity”, “RAG”, “pgvector”, “cosine”
Negative Triggers
- Building dashboard UI for search results (use
dashboard-patternsinstead) - Adding full-text keyword search only (use PostgreSQL
tsvectordirectly) - Instrumenting search latency metrics (use
metrics-collectorinstead) - Logging search queries (use
structured-logginginstead)
Core Directives
The Three Laws of Vector Search
- Provider-agnostic: All embedding generation goes through
EmbeddingProviderabstraction. Never call OpenAI/Ollama directly. - Hybrid by default: Combine vector similarity with keyword matching. Pure vector search misses exact terms; pure keyword misses semantics.
- Server-side scoring: Similarity computation happens in PostgreSQL via RPC functions. Never download all vectors to Python for client-side comparison.
Existing Project Infrastructure
Two Vector Search Systems
| System | Location | Purpose | Table |
|---|---|---|---|
| Memory Store | src/memory/store.py |
Agent domain knowledge (patterns, preferences, debugging) | domain_memories |
| RAG Pipeline | src/rag/storage.py |
Document retrieval (uploaded docs, chunked content) | document_chunks |
Both share the same EmbeddingProvider abstraction from src/memory/embeddings.py.
Embedding Providers
| Provider | Model | Dimensions | Use Case |
|---|---|---|---|
| OpenAI | text-embedding-3-small |
1536 | Production (preferred) |
| Ollama | nomic-embed-text |
768 | Local development (free) |
| Simple | Hash-based | 1536 | Testing only (deterministic) |
Selection via get_embedding_provider() â checks OPENAI_API_KEY, then ANTHROPIC_API_KEY, then falls back to SimpleEmbeddingProvider.
API Routes
| Route | Method | Search Type |
|---|---|---|
/rag/search |
POST | Vector, hybrid, or keyword |
/rag/upload |
POST | Document ingestion + embedding |
/api/search |
POST | Full-text search (tsvector only) |
Database
| Table | Vector Column | Index Type | Distance Function |
|---|---|---|---|
documents |
VECTOR(1536) |
IVFFlat | vector_cosine_ops |
domain_memories |
embedding |
â | Cosine (via RPC) |
document_chunks |
embedding |
â | Cosine (via RPC) |
Embedding Provider Pattern
The EmbeddingProvider abstract base class defines a single method:
class EmbeddingProvider(ABC):
@abstractmethod
async def get_embedding(self, text: str) -> list[float]:
"""Generate embedding vector for text."""
pass
Three implementations: OpenAIEmbeddingProvider (calls /v1/embeddings via httpx), OllamaEmbeddingProvider (local /api/embeddings), SimpleEmbeddingProvider (hash-based, testing only).
Adding a New Provider
- Subclass
EmbeddingProvider - Implement
get_embedding()returning a fixed-dimension vector - Add selection logic in
get_embedding_provider() - Match the dimension to existing index (1536 for OpenAI compatibility, or create a separate index)
Dimension Consistency Rule
All vectors in a table MUST share the same dimension. If mixing providers with different dimensions (e.g., OpenAI 1536 vs Ollama 768), either:
- Pad/truncate to a standard dimension, OR
- Use separate columns per dimension, OR
- Standardise on one dimension and re-embed when switching providers
The project currently standardises on 1536 dimensions (OpenAI).
Search Patterns
Similarity Search (Memory Store)
MemoryStore.find_similar() generates a query embedding and calls the find_similar_memories PostgreSQL RPC:
async def find_similar(self, query_text: str, domain: MemoryDomain | None = None,
user_id: str | None = None, similarity_threshold: float = 0.7, limit: int = 10,
) -> list[dict[str, Any]]:
query_embedding = await self.embedding_provider.get_embedding(query_text)
result = self.client.rpc("find_similar_memories", {
"query_embedding": json.dumps(query_embedding),
"match_threshold": similarity_threshold,
"match_count": limit,
"filter_domain": domain.value if domain else None,
"filter_user_id": user_id,
}).execute()
return result.data or []
Key parameters: match_threshold (0.0â1.0, cosine similarity minimum), match_count (max results). Domain and user filters are applied server-side in the RPC function.
Hybrid Search (RAG Pipeline)
RAGStore.hybrid_search() combines vector similarity with keyword matching using configurable weights:
async def hybrid_search(self, query: str, project_id: str,
vector_weight: float = 0.6, keyword_weight: float = 0.4,
limit: int = 10, threshold: float = 0.5,
) -> list[dict[str, Any]]:
query_embedding = await self.embedding_provider.get_embedding(query)
result = self.client.rpc("hybrid_search", {
"query_text": query,
"query_embedding": query_embedding,
"project_id_filter": project_id,
"vector_weight": vector_weight,
"keyword_weight": keyword_weight,
"match_threshold": threshold,
"match_count": limit,
}).execute()
return result.data or []
Default weights: 60% vector + 40% keyword. Adjust for domain:
- Technical docs: 70/30 (semantics matter more)
- Exact match scenarios (IDs, codes): 30/70 (keywords matter more)
- General content: 60/40 (balanced)
Full-Text Search (PostgreSQL tsvector)
The /api/search route uses native PostgreSQL full-text search with ts_rank:
func.ts_rank(
func.to_tsvector("english", Document.title + " " + Document.content),
func.plainto_tsquery("english", query_text),
32, # RANK_CD normalisation flag
).label("relevance")
This is independent of vector search and uses the documents table directly via SQLAlchemy.
Indexing Patterns
IVFFlat Index (Current)
The project uses IVFFlat for approximate nearest-neighbour search:
CREATE INDEX idx_documents_embedding
ON documents USING ivfflat (embedding vector_cosine_ops);
IVFFlat partitions vectors into lists (clusters). Query searches only the nearest cluster(s), trading recall for speed.
Tuning parameters:
lists(build-time): Number of clusters. Rule of thumb:sqrt(row_count)for < 1M rowsprobes(query-time): Number of clusters to search. Higher = better recall, slower. Default: 1
-- Set probes for a session (higher = more accurate, slower)
SET ivfflat.probes = 10;
HNSW Index (Recommended for Production)
For datasets > 10K rows, prefer HNSW (Hierarchical Navigable Small World):
CREATE INDEX idx_documents_embedding_hnsw
ON documents USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
HNSW provides better recall than IVFFlat without manual tuning. Higher m and ef_construction improve quality at the cost of build time and memory.
Distance Functions
| Function | Operator | Index Ops | Use When |
|---|---|---|---|
| Cosine similarity | <=> |
vector_cosine_ops |
Normalised embeddings (most common) |
| L2 distance | <-> |
vector_l2_ops |
Raw distance comparison |
| Inner product | <#> |
vector_ip_ops |
Pre-normalised, performance-critical |
The project uses cosine similarity (vector_cosine_ops) throughout.
Chunking Strategies
The RAG pipeline supports five chunking strategies via ChunkingStrategy enum:
| Strategy | When to Use | Config |
|---|---|---|
FIXED_SIZE |
Uniform chunks, simple content | chunk_size=512, chunk_overlap=50 |
SEMANTIC |
Respects paragraph/section boundaries | Same + boundary detection |
RECURSIVE |
Nested structure (Markdown, HTML) | Splits by headers, then paragraphs, then sentences |
PARENT_CHILD |
Best recall with context | parent_chunk_size=2048, child chunk_size=512 |
CODE_AWARE |
Source code files | Splits by functions/classes |
Default: PARENT_CHILD with 512-token children and 2048-token parents. Search matches children; context retrieval includes the parent chunk.
Pipeline Config
PipelineConfig(
chunking_strategy=ChunkingStrategy.PARENT_CHILD,
chunk_size=512,
chunk_overlap=50,
parent_chunk_size=2048,
generate_embeddings=True,
generate_keywords=True,
)
Relevance & Scoring
Threshold Guidelines
| Threshold | Meaning | Use Case |
|---|---|---|
| 0.9+ | Near-exact semantic match | Deduplication |
| 0.7â0.9 | Strong relevance | Default search |
| 0.5â0.7 | Moderate relevance | Exploratory search |
| < 0.5 | Weak match | Usually noise |
The Memory Store defaults to similarity_threshold=0.7. The RAG Pipeline defaults to min_score=0.5.
Relevance Decay
MemoryStore.update_relevance() adjusts memory relevance based on feedback:
- Positive feedback (+0.1 per point, capped at 1.0)
- Negative feedback (configurable
decay_rate, default 0.1, floored at 0.0)
Stale Memory Pruning
MemoryStore.prune_stale() removes memories below min_relevance=0.3 or older than max_age_days=90 via the prune_stale_memories RPC.
Pydantic Models
Memory System
| Model | Fields | Purpose |
|---|---|---|
MemoryEntry |
domain, category, key, value, embedding, relevance_score, access_count | Core memory unit |
MemoryQuery |
domain, category, query_text, similarity_threshold, tags, limit, offset | Query specification |
MemoryResult |
entries, total_count, query | Paginated result |
MemoryDomain |
KNOWLEDGE, PREFERENCE, TESTING, DEBUGGING | Domain enum |
RAG System
| Model | Fields | Purpose |
|---|---|---|
DocumentChunk |
source_id, content, embedding, chunk_level, heading_hierarchy, keywords | Chunk record |
DocumentSource |
source_type, source_uri, status, metadata | Source tracking |
SearchRequest |
query, project_id, search_type, vector_weight, keyword_weight, min_score | Search input |
SearchResult |
chunk_id, content, vector_score, keyword_score, combined_score | Result item |
SearchResponse |
results, total_count, search_type, execution_time_ms | Search output |
Database Schema
documents Table (Legacy)
CREATE TABLE documents (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
title VARCHAR(500) NOT NULL,
content TEXT NOT NULL,
embedding VECTOR(1536),
-- ... other columns
);
CREATE INDEX idx_documents_embedding ON documents USING ivfflat (embedding vector_cosine_ops);
domain_memories Table
Stores agent memories with embeddings for semantic retrieval. Accessed via MemoryStore class.
document_chunks Table
Stores RAG pipeline chunks with embeddings. Accessed via RAGStore class. Includes heading_hierarchy, summary, entities, keywords, and classification_tags for enriched retrieval.
RPC Functions
| Function | Purpose |
|---|---|
find_similar_memories |
Cosine similarity search on domain_memories with domain/user filters |
hybrid_search |
Combined vector + keyword search on document_chunks |
prune_stale_memories |
Delete low-relevance or expired memories |
increment_memory_access |
Increment access count on retrieval |
Anti-Patterns
| Anti-Pattern | Why It Fails | Correct Approach |
|---|---|---|
| Client-side similarity computation | Downloads all vectors, O(n) per query, no index usage | PostgreSQL RPC with pgvector index |
| Mixing embedding dimensions in one column | VECTOR(1536) rejects 768-dim vectors |
Standardise dimension or use separate columns |
| No similarity threshold | Returns noise matches below 0.3 | Always set match_threshold (0.5â0.7) |
| Embedding at query time without caching | Re-embeds identical queries | Cache query embeddings for repeated searches |
IVFFlat with probes=1 on large datasets |
Poor recall (misses relevant results) | Increase probes or migrate to HNSW |
| Storing embeddings without indexing | Sequential scan on every query | Create IVFFlat or HNSW index |
| Hardcoding OpenAI API calls | Breaks local development, vendor lock-in | Use EmbeddingProvider abstraction |
| Chunking without overlap | Loses context at chunk boundaries | Set chunk_overlap=50 minimum |
Checklist for New Vector Search Features
Embedding
- Uses
EmbeddingProviderabstraction (never direct API calls) - Dimension matches existing index (1536 default)
- Handles provider unavailability (fallback or graceful error)
Search
- Hybrid search by default (vector + keyword)
- Similarity threshold configured (not unbounded)
- Server-side computation via PostgreSQL RPC
- Results include similarity scores for transparency
Indexing
- pgvector index created on embedding column
- Distance function matches query pattern (cosine for normalised)
- Index type appropriate for dataset size (IVFFlat < 10K, HNSW >= 10K)
Data Quality
- Chunking strategy matches content type
- Chunk overlap prevents boundary information loss
- Stale/expired entries have pruning mechanism
Integration
- Search latency instrumented via
metrics-collector - Errors use
error-taxonomycodes - Queries logged via
structured-logging
Response Format
[AGENT_ACTIVATED]: Vector Search
[PHASE]: {Design | Implementation | Review}
[STATUS]: {in_progress | complete}
{vector search analysis or implementation guidance}
[NEXT_ACTION]: {what to do next}
Integration Points
Council of Logic
- Turing: Verify search is O(log n) via index, not O(n) sequential scan
- Shannon: Embedding dimension and chunk size tuned for information density
Metrics Collector
search_query_duration_mshistogram for search latencysearch_result_countgauge for average results per queryembedding_generation_duration_mshistogram for provider latency
Structured Logging
- Debug-level embedding generation logs (model, dimensions, text length)
- Info-level search execution logs (query, domain, result count)
Error Taxonomy
DATA_VECTOR_PROVIDER_UNAVAILABLE(503) â embedding provider downDATA_VECTOR_DIMENSION_MISMATCH(422) â wrong embedding dimensionDATA_VECTOR_THRESHOLD_INVALID(422) â threshold out of [0, 1] range
Data Validation
SearchRequestvalidated via Pydantic (query non-empty, threshold in range, limit bounded)PipelineConfigvalidates chunk sizes and strategy enum
Dashboard Patterns
- Search results displayed via
DataStripfor aggregate metrics - Real-time search activity via Supabase Realtime on
document_chunkstable
Australian Localisation (en-AU)
- Spelling: neighbour, optimise, normalise, analyse, behaviour, colour
- Date: ISO 8601 in storage; DD/MM/YYYY in UI display
- Timezone: AEST/AEDT â timestamps stored as UTC, converted for display