semantic-search

📁 pearlthoughts/codecompass 📅 2 days ago
1
总安装量
1
周安装量
#49488
全站排名
安装命令
npx skills add https://github.com/pearlthoughts/codecompass --skill semantic-search

Agent 安装分布

amp 1
opencode 1
kimi-cli 1
codex 1
github-copilot 1
claude-code 1

Skill 文档

Semantic Search Technique

Purpose

Decision framework and execution guide for using semantic search effectively in CodeCompass.

When to Use Semantic Search

✅ Use Semantic Search When

1. Concept-based Queries

  • “Find code that handles payment processing”
  • “Where do we validate email addresses?”
  • “Show me error handling patterns”

2. Intent-based Queries

  • “How is user authentication implemented?”
  • “What code calculates shipping costs?”
  • “Find business rules for order approval”

3. Cross-language/Cross-file

  • Searching across PHP, TypeScript, config files
  • Pattern discovery across multiple modules
  • Finding similar implementations

4. Fuzzy/Exploratory

  • User doesn’t know exact class/function names
  • Exploring unfamiliar codebase
  • “Code that does something like X”

5. Natural Language

  • “Show me all database migrations”
  • “Find controllers that handle file uploads”
  • “Where are API rate limits defined?”

❌ Use Grep/Glob When

1. Exact Matches

  • “Find class named PaymentController
  • “Where is processPayment function defined?”
  • “Find all imports of UserService

2. Syntax Patterns

  • “Find all functions starting with get
  • “Show me all @Injectable() decorators”
  • “Find TypeScript interfaces”

3. Performance Critical

  • Quick lookups in known files
  • Repeated searches in tight loops
  • When you know exact location

4. Structural Queries

  • “Find all .ts files in src/modules
  • “List all test files”
  • “Show directory structure”

Execution Guide

Step 1: Formulate Effective Query

❌ Bad Queries (too vague):

  • “payment”
  • “code”
  • “function”

✅ Good Queries (specific context):

  • “business logic for processing customer payments and updating order status”
  • “validation rules for user email and password requirements”
  • “error handling patterns for database connection failures”

Why: More context = better semantic matching

Formula:

[Action/Purpose] for [Specific Entity] with [Context/Constraints]

Examples:

  • “Extract business capabilities from Yii2 controllers”
  • “Validation logic for user registration with email verification”
  • “Database migration patterns for schema versioning”

Step 2: Verify Indexing

Before searching, ensure codebase is indexed:

# Check if indexed
curl http://localhost:8081/v1/schema

# Should show collections like:
# - CodeContext
# - AtlasCode

If not indexed:

codecompass batch:index <path-to-codebase>

Step 3: Execute Search

codecompass search:semantic "business logic for payment processing"

Alternative (if using as library):

const results = await searchService.semanticSearch({
  query: "business logic for payment processing",
  limit: 10,
  certainty: 0.7 // Minimum relevance score
});

Step 4: Interpret Results

Check relevance scores:

  • >0.8: Highly relevant (exact match)
  • 0.7-0.8: Good match (related)
  • 0.6-0.7: Moderate match (possibly relevant)
  • <0.6: Weak match (may be noise)

Verify context:

  • Does the returned code actually match intent?
  • Are results from expected modules?
  • Multiple related files found (good signal)
  • Or isolated random matches (refine query)

Step 5: Refine if Needed

Too many results (>50):

  • Add more specific context to query
  • Increase certainty threshold
  • Add domain constraints (“in authentication module”)

Too few results (<3):

  • Broaden query (less specific)
  • Lower certainty threshold
  • Check if area is actually indexed
  • Try related terms/synonyms

Wrong results:

  • Rephrase query with different terminology
  • Add negative constraints
  • Try breaking into multiple specific queries

Behind the Scenes

Architecture

Query Text
    ↓
Ollama Embedding (mxbai-embed-large)
    ↓
1024-dimensional vector
    ↓
Weaviate Vector Search (cosine similarity)
    ↓
Ranked Results

Key Components

From .ai/capabilities.json:

  • Module: search, vectorizer, weaviate
  • Embedding: Ollama mxbai-embed-large (1024 dimensions)
  • Vector DB: Weaviate with HNSW indexing
  • Collections: CodeContext, AtlasCode

Configuration (from .env):

EMBEDDING_SERVICE=ollama
OLLAMA_EMBEDDING_MODEL=mxbai-embed-large
OLLAMA_URL=http://localhost:11434
CODECOMPASS_WEAVIATE_URL=http://localhost:8081

Advanced Patterns

Pattern 1: Multi-Query Exploration

For complex questions, break into multiple searches:

# Instead of:
"authentication and authorization and session management"

# Do:
codecompass search:semantic "user authentication login process"
codecompass search:semantic "authorization and access control"
codecompass search:semantic "session management and tokens"

Pattern 2: Iterative Refinement

# 1. Broad search
codecompass search:semantic "payment processing"

# 2. Review results, identify specific module
# 3. Narrow search
codecompass search:semantic "payment gateway integration in PaymentController"

# 4. Pinpoint implementation
codecompass search:semantic "Stripe API call for processing credit cards"

Pattern 3: Cross-Domain Search

Search across different aspects:

# Code implementation
codecompass search:semantic "email validation logic"

# Tests
codecompass search:semantic "test cases for email validation"

# Configuration
codecompass search:semantic "email service configuration"

Common Pitfalls

❌ Pitfall 1: Searching Before Indexing

Symptom: No results or error Solution: Run codecompass batch:index first

❌ Pitfall 2: Too Vague Queries

Symptom: Returns everything or nothing useful Solution: Add specific context and intent

❌ Pitfall 3: Expecting Exact Matches

Symptom: “Why didn’t it find function processPayment?” Reason: Semantic search is for concepts, not exact names Solution: Use grep for exact matches

❌ Pitfall 4: Ignoring Relevance Scores

Symptom: Reading irrelevant results Solution: Filter by score >0.7, ignore weak matches

❌ Pitfall 5: Single Query for Complex Questions

Symptom: Poor results for multi-faceted questions Solution: Break into multiple targeted queries

Decision Tree

┌─────────────────────────────────────┐
│ I need to find code that...         │
└─────────────────────────────────────┘
              ↓
        ┌─────────┐
        │ Know    │ Exact class/function name?
        │ exact   │
        │ name?   │
        └─────────┘
          ↙     ↘
        YES      NO
         ↓        ↓
    Use Grep  ┌─────────┐
              │ Concept │ Searching by meaning/purpose?
              │ search? │
              └─────────┘
                ↙     ↘
              YES      NO
               ↓        ↓
          Semantic  ┌─────────┐
          Search    │ Pattern │ Looking for code pattern?
                    │ match?  │
                    └─────────┘
                      ↙     ↘
                    YES      NO
                     ↓        ↓
                Use Glob  Use both
                          (Glob + Semantic)

Performance Considerations

Speed

  • Grep: Milliseconds (fast, synchronous)
  • Semantic Search: 100-500ms (embedding + vector search)

Tradeoff: Semantic is slower but finds conceptually related code

Token Cost (Embeddings)

  • Each query → 1 embedding generation
  • Ollama local → No API cost
  • But consumes local compute

Scaling

  • Small codebase (<1K files): Either method fine
  • Medium codebase (1K-10K files): Semantic search advantage grows
  • Large codebase (>10K files): Semantic search essential

Integration with Other Tools

With Yii2 Analysis

# 1. Analyze Yii2 project
codecompass analyze:yii2 <path>

# 2. Index results
codecompass batch:index <path>

# 3. Explore with semantic search
codecompass search:semantic "Yii2 controller actions for user management"

With Requirements Extraction

# 1. Extract requirements
codecompass requirements:extract

# 2. Search extracted requirements
codecompass search:semantic "business rules for order validation"

With Weaviate Direct Query

# Alternative: Query Weaviate GraphQL API directly
curl -X POST http://localhost:8081/v1/graphql \
  -H "Content-Type: application/json" \
  -d '{
    "query": "{
      Get {
        CodeContext(
          nearText: { concepts: [\"payment processing\"] }
          limit: 10
        ) {
          content
          filePath
        }
      }
    }"
  }'

Related Skills

  • 0-discover-capabilities.md – How to discover modules
  • analyze-yii2-project.md – Uses semantic search in workflow

Related Modules

From .ai/capabilities.json:

  • search – SearchService, IntegratedSearchService
  • vectorizer – Ollama embedding generation
  • weaviate – Vector database client
  • indexing – File indexing pipeline

Remember: Semantic search finds code by meaning, not by name. Choose the right tool for the job.