token-saver-context-compression
npx skills add https://github.com/oimiragieo/agent-studio --skill token-saver-context-compression
Agent 安装分布
Skill 文档
Token Saver Context Compression
Use this skill to reduce token usage while preserving grounded evidence. This integrates:
pnpm search:code(hybrid retrieval)- token-saver Python compression scripts
- MemoryRecord persistence into framework memory
- spawn prompt evidence injection (
[mem:*]/[rag:*])
When to Use
pnpm search:tokensshows a file/directory exceeds 32K tokens- Context is large or expensive and you need a compressed summary
- You need query-targeted compression before synthesis
- You need hard evidence sufficiency gating before persisting memory
- You’re building a prompt and
search:coderesults alone aren’t enough context
Iron Law
Do not persist compressed content directly to memory files from a subprocess. Emit MemoryRecord payloads and let framework hooks process sync/indexing.
Workflow
- Retrieve candidate context (
pnpm search:code "<query>"). - Compress using token-saver in JSON mode (
run_skill_workflow.py --output-format json). - If evidence is insufficient and fail gate is on, stop.
- Map distilled insights into MemoryRecord-ready payloads.
- Persist through MemoryRecord so
.claude/hooks/memory/sync-memory-index.cjsruns.
Mapping Rule (Deterministic)
gotchas.json:- text contains
gotcha|pitfall|anti-pattern|risk|warning|failure
- text contains
issues.md:- text contains
issue|bug|error|incident|defect|gap
- text contains
decisions.md:- text contains
decision|tradeoff|choose|selected|rationale
- text contains
patterns.json:- default fallback for all remaining distilled evidence
Tooling Commands
Preferred wrapper entrypoint:
node .claude/skills/token-saver-context-compression/scripts/main.cjs --query "<question>" --mode evidence_aware --limit 20 --fail-on-insufficient-evidence
Direct Python engine (advanced):
python .claude/skills/token-saver-context-compression/scripts/run_skill_workflow.py --file <path> --mode evidence_aware --query "<question>" --output-format json --fail-on-insufficient-evidence
Output Contract
- Wrapper emits JSON with:
searchsummarycompressionsummarymemoryRecordsgrouped by target (patterns,gotchas,issues,decisions)evidencesufficiency status
Workflow References
- Skill workflow:
.claude/workflows/token-saver-context-compression-skill-workflow.md - Companion tool:
.claude/tools/token-saver-context-compression/token-saver-context-compression.cjs - Command surface:
.claude/skills/token-saver-context-compression/commands/token-saver-context-compression.md - Citation format is unchanged:
- memory entries become
[mem:xxxxxxxx] - RAG entries remain
[rag:xxxxxxxx]
- memory entries become
Integration with search:tokens
Use pnpm search:tokens to decide when to invoke this skill:
# Check if you need compression
pnpm search:tokens .claude/lib/memory
# Output: 60 files, 500KB, ~128K tokens â OVER CONTEXT
# Then compress with a targeted query
node .claude/skills/token-saver-context-compression/scripts/main.cjs \
--query "how does memory persistence work" --mode evidence_aware --limit 10
The tool reads actual file content from search results (not just file paths), compresses via the Python engine, and extracts memory records classified by type (patterns, gotchas, issues, decisions).
Adaptive Compression
Adaptive compression (adjusting compression ratio based on corpus size) is automatic and requires no env var configuration. When the input corpus is small, compression is lighter; when it is large, compression is more aggressive. This is controlled internally by the Python engine based on token counts.
Requirements
- Node.js 18+
- Python 3.10+
Iron Laws
- ALWAYS run hybrid search (
pnpm search:code) before compressing to retrieve grounded evidence for the distilled output - NEVER compress context that still has open uncertainties â resolve ambiguities before compressing
- ALWAYS persist distilled learnings via MemoryRecord immediately after compression
- NEVER discard evidence that contradicts the current working hypothesis during compression
- ALWAYS inject
[mem:*]and[rag:*]citations in the compressed output for downstream spawn prompt grounding
Anti-Patterns
| Anti-Pattern | Why It Fails | Correct Approach |
|---|---|---|
| Compressing without prior hybrid search | Output lacks grounded evidence, hallucination risk | Run pnpm search:code first, embed citations |
| Discarding contradicting evidence | Creates false confidence in distilled output | Preserve all conflicting signals in summary |
| No MemoryRecord after compression | Learnings lost on next context reset | Persist key findings immediately via MemoryRecord |
| Compressing too late (past 80K tokens) | Severe accuracy degradation before compression | Trigger compression at 80K tokens, not at limit |
Skipping [mem:*] / [rag:*] citations |
Downstream agents cannot verify claims | Always annotate evidence sources in output |
Memory Protocol (MANDATORY)
Before work:
cat .claude/context/memory/learnings.md
After work:
- Add integration learnings to
.claude/context/memory/learnings.md - Add integration risks to
.claude/context/memory/issues.md