context-optimization
npx skills add https://github.com/5dlabs/cto --skill context-optimization
Agent 安装分布
Skill 文档
Context Optimization Techniques
Context optimization extends effective capacity through strategic compression, masking, caching, and partitioning. Effective optimization can double or triple effective context capacity.
When to Activate
- Context limits constrain task complexity
- Optimizing for cost reduction (fewer tokens = lower costs)
- Reducing latency for long conversations
- Building production systems at scale
Core Strategies
Compaction
Summarize context contents when approaching limits, reinitialize with summary.
Priority for compression:
- Tool outputs â replace with summaries
- Old turns â summarize early conversation
- Retrieved docs â summarize if recent versions exist
- Never compress system prompt
Summary preservation by type:
- Tool outputs: Key findings, metrics, conclusions
- Conversations: Key decisions, commitments, context shifts
- Documents: Key facts and claims
Observation Masking
Tool outputs can comprise 80%+ of token usage. Replace verbose outputs with compact references once their purpose is served.
Masking Strategy:
| Category | Action |
|---|---|
| Never mask | Current task observations, most recent turn, active reasoning |
| Consider masking | 3+ turns ago, verbose outputs with extractable key points |
| Always mask | Repeated outputs, boilerplate, already summarized |
Example:
if len(observation) > max_length:
ref_id = store_observation(observation)
return f"[Obs:{ref_id} elided. Key: {extract_key(observation)}]"
KV-Cache Optimization
Reuse cached computations across requests with identical prefixes.
Cache-friendly ordering:
- System prompt (stable, first)
- Tool definitions (stable)
- Frequently reused elements
- Unique content (last)
Design tips:
- Avoid dynamic content like timestamps
- Use consistent formatting
- Keep structure stable across sessions
Context Partitioning
Split work across sub-agents with isolated contexts. Each operates in clean context focused on its subtask.
Aggregation pattern:
- Validate all partitions completed
- Merge compatible results
- Summarize if still too large
Budget Management
Design explicit token budgets:
- System prompt: X tokens
- Tool definitions: Y tokens
- Retrieved docs: Z tokens
- Message history: W tokens
- Reserved buffer: 10-20%
Trigger optimization when:
- Token utilization > 70%
- Response quality degrades
- Costs increase due to long contexts
Decision Framework
| Dominant component | Apply |
|---|---|
| Tool outputs | Observation masking |
| Retrieved documents | Summarization or partitioning |
| Message history | Compaction with summarization |
| Multiple | Combine strategies |
Performance Targets
- Compaction: 50-70% reduction, <5% quality degradation
- Masking: 60-80% reduction in masked observations
- Cache optimization: 70%+ hit rate for stable workloads
Guidelines
- Measure before optimizingâknow current state
- Apply compaction before masking when possible
- Design for cache stability with consistent prompts
- Partition before context becomes problematic
- Balance token savings against quality preservation