memory-lancedb-pro

📁 win4r/memory-lancedb-pro-skill 📅 Today

总安装量

周安装量

#58048

全站排名

安装命令

npx skills add https://github.com/win4r/memory-lancedb-pro-skill --skill memory-lancedb-pro

Agent 安装分布

mcpjam 1

github-copilot 1

junie 1

windsurf 1

zencoder 1

crush 1

Skill 文档

memory-lancedb-pro Plugin Maintenance Guide

Overview

memory-lancedb-pro is an enhanced long-term memory plugin for OpenClaw. It replaces the built-in memory-lancedb plugin with advanced retrieval capabilities, multi-scope memory isolation, and a management CLI.

Repository: https://github.com/win4r/memory-lancedb-pro License: MIT | Language: TypeScript (ESM) | Runtime: Node.js via OpenClaw Gateway

Architecture

âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
â                   index.ts (Entry Point)                â
â  Plugin Registration Â· Config Parsing Â· Lifecycle Hooks â
ââââââââââ¬âââââââââââ¬âââââââââââ¬âââââââââââ¬ââââââââââââââââ
         â          â          â          â
    ââââââ¼ââââ ââââââ¼ââââ âââââ¼âââââ ââââ¼âââââââââââ
    â store  â âembedderâ âretrieverâ â   scopes    â
    â .ts    â â .ts    â â .ts    â â    .ts      â
    ââââââââââ ââââââââââ ââââââââââ âââââââââââââââ
         â                     â
    ââââââ¼ââââ           âââââââ¼âââââââââââ
    âmigrate â           ânoise-filter.ts â
    â .ts    â           âadaptive-       â
    ââââââââââ           âretrieval.ts    â
                         ââââââââââââââââââ
    âââââââââââââââ   ââââââââââââ
    â  tools.ts   â   â  cli.ts  â
    â (Agent API) â   â (CLI)    â
    âââââââââââââââ   ââââââââââââ

File Reference (Quick Navigation)

File	Purpose	Key Exports
`index.ts`	Plugin entry point. Registers with OpenClaw Plugin API, parses config, mounts lifecycle hooks	`memoryLanceDBProPlugin` (default), `shouldCapture`, `detectCategory`
`openclaw.plugin.json`	Plugin metadata + full JSON Schema config with `uiHints`	â
`package.json`	NPM package. Deps: `@lancedb/lancedb`, `openai`, `@sinclair/typebox`	â
`cli.ts`	CLI: `memory-pro list/search/stats/delete/delete-bulk/export/import/reembed/migrate`	`createMemoryCLI`, `registerMemoryCLI`
`src/store.ts`	LanceDB storage layer. Table creation, FTS indexing, CRUD, vector/BM25 search	`MemoryStore`, `MemoryEntry`, `loadLanceDB`
`src/embedder.ts`	Embedding abstraction. OpenAI-compatible API, task-aware, LRU cache	`Embedder`, `createEmbedder`, `getVectorDimensions`
`src/retriever.ts`	Hybrid retrieval engine. Full scoring pipeline	`MemoryRetriever`, `createRetriever`, `DEFAULT_RETRIEVAL_CONFIG`
`src/scopes.ts`	Multi-scope access control	`MemoryScopeManager`, `createScopeManager`
`src/tools.ts`	Agent tool definitions: `memory_recall/store/forget/update/stats/list`	`registerAllMemoryTools`
`src/noise-filter.ts`	Noise filter for low-quality content	`isNoise`, `filterNoise`
`src/adaptive-retrieval.ts`	Skip retrieval for greetings, commands, emoji	`shouldSkipRetrieval`
`src/migrate.ts`	Migration from legacy `memory-lancedb`	`MemoryMigrator`, `createMigrator`
`scripts/jsonl_distill.py`	JSONL session distillation script (Python)	â

Core Subsystem Reference

For detailed deep-dives into each subsystem, read the appropriate reference file:

Retrieval Pipeline (scoring math, RRF fusion, reranking, all scoring stages): See references/retrieval_pipeline.md
Storage & Data Model (LanceDB schema, FTS indexing, CRUD, vector dim): See references/storage_and_schema.md
Embedding System (providers, task-aware API, caching, dimensions): See references/embedding_system.md
Plugin Lifecycle & Config (hooks, registration, config parsing): See references/plugin_lifecycle.md
Scope System (multi-scope isolation, agent access, patterns): See references/scope_system.md
Tools & CLI (agent tools, CLI commands, parameters): See references/tools_and_cli.md
Common Gotchas & Troubleshooting: See references/troubleshooting.md

Development Workflows

Adding a New Embedding Provider

Check if it’s OpenAI-compatible (most are). If so, no code change needed â just config
If the model is not in EMBEDDING_DIMENSIONS map in src/embedder.ts, add it
If the provider needs special request fields beyond task and normalized, extend buildPayload() in src/embedder.ts
Test with embedder.test() method
Document the provider in README.md table

Adding a New Rerank Provider

Add provider name to RerankProvider type in src/retriever.ts
Add case in buildRerankRequest() for request format (headers + body)
Add case in parseRerankResponse() for response parsing
Add to rerankProvider enum in openclaw.plugin.json
Test with actual API calls â reranker has 5s timeout protection

Adding a New Scoring Stage

Create a private apply<StageName>(results: RetrievalResult[]): RetrievalResult[] method in MemoryRetriever
Add corresponding config fields to RetrievalConfig interface
Insert the stage in the pipeline sequence in both hybridRetrieval() and vectorOnlyRetrieval()
Add defaults to DEFAULT_RETRIEVAL_CONFIG
Add JSON Schema fields to openclaw.plugin.json
Pipeline order: Fusion â Rerank â Recency â Importance â LengthNorm â TimeDecay â HardMin â Noise â MMR

Adding a New Agent Tool

Create registerMemory<ToolName>Tool() in src/tools.ts
Define parameters with Type.Object() from @sinclair/typebox
Use stringEnum() from openclaw/plugin-sdk for enum params
Always validate scope access via context.scopeManager
Register in registerAllMemoryTools() â decide if core (always) or management (optional)
Return { content: [{ type: "text", text }], details: {...} }

Adding a New CLI Command

Add command in registerMemoryCLI() in cli.ts
Pattern: memory.command("name <args>").description("...").option("--flag", "...").action(async (args, opts) => { ... })
Support --json flag for machine-readable output
Use process.exit(1) for error cases
CLI is registered via api.registerCli() in index.ts

Modifying Auto-Capture Logic

shouldCapture(text) in index.ts controls what gets auto-captured
MEMORY_TRIGGERS regex array defines trigger patterns (supports EN/CJK)
detectCategory(text) classifies captures as preference/fact/decision/entity/other
Auto-capture runs in agent_end hook, limited to 3 per turn
Duplicate detection threshold: cosine similarity > 0.95

Modifying Auto-Recall Logic

Auto-recall uses before_agent_start hook (OFF by default)
shouldSkipRetrieval() from src/adaptive-retrieval.ts gates retrieval
Injected as <relevant-memories> XML block with UNTRUSTED DATA warning
sanitizeForContext() strips HTML, newlines, limits to 300 chars per memory
Max 3 memories injected per turn

Key Design Decisions

autoRecall defaults to OFF â prevents model from echoing injected memory context
autoCapture defaults to ON â transparent memory accumulation
sessionMemory defaults to OFF â raw session summaries degrade retrieval quality; use JSONL distillation instead
LanceDB dynamic import â loaded asynchronously to avoid blocking; cached in singleton promise
Startup checks are fire-and-forget â gateway binds HTTP port immediately; embedding/retrieval tests run in background with 8s timeout
Daily JSONL backup â 24h interval, keeps last 7 files, runs 1 min after start
BM25 score normalization â raw BM25 scores are unbounded, normalized with sigmoid: 1 / (1 + exp(-score/5))
Update = delete + re-add â LanceDB doesn’t support in-place updates
ID prefix matching â 8+ hex char prefix resolves to full UUID for user convenience
CJK-aware thresholds â shorter minimum lengths for Chinese/Japanese/Korean text (4â6 chars vs 10â15 for English)
Env var resolution â ${VAR} syntax resolved at config parse time; gateway service may not inherit shell env

Testing

Smoke test: node test/cli-smoke.mjs
Manual verification: openclaw plugins doctor, openclaw memory-pro stats
Embedding test: embedder.test() returns { success, dimensions, error? }
Retrieval test: retriever.test() returns { success, mode, hasFtsSupport, error? }

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台