performance-monitor
31
总安装量
31
周安装量
#6619
全站排名
安装命令
npx skills add https://github.com/404kidwiz/claude-supercode-skills --skill performance-monitor
Agent 安装分布
opencode
22
claude-code
21
gemini-cli
20
cursor
16
windsurf
14
Skill 文档
Performance Monitor
Purpose
Provides expertise in monitoring, benchmarking, and optimizing AI agent performance. Specializes in token usage tracking, latency analysis, cost optimization, and implementing quality evaluation metrics (evals) for AI systems.
When to Use
- Tracking token usage and costs for AI agents
- Measuring and optimizing agent latency
- Implementing evaluation metrics (evals)
- Benchmarking agent quality and accuracy
- Optimizing agent cost efficiency
- Building observability for AI pipelines
- Analyzing agent conversation patterns
- Setting up A/B testing for agents
Quick Start
Invoke this skill when:
- Optimizing AI agent costs and token usage
- Measuring agent latency and performance
- Implementing evaluation frameworks
- Building observability for AI systems
- Benchmarking agent quality
Do NOT invoke when:
- General application performance â use
/performance-engineer - Infrastructure monitoring â use
/sre-engineer - ML model training optimization â use
/ml-engineer - Prompt design â use
/prompt-engineer
Decision Framework
Optimization Goal?
âââ Cost Reduction
â âââ Token usage â Prompt optimization
â âââ API calls â Caching, batching
âââ Latency
â âââ Time to first token â Streaming
â âââ Total response time â Model selection
âââ Quality
â âââ Accuracy â Evals with ground truth
â âââ Consistency â Multiple run analysis
âââ Reliability
âââ Error rates, retry patterns
Core Workflows
1. Token Usage Tracking
- Instrument API calls to capture usage
- Track input vs output tokens separately
- Aggregate by agent, task, user
- Calculate costs per operation
- Build dashboards for visibility
- Set alerts for anomalous usage
2. Eval Framework Setup
- Define evaluation criteria
- Create test dataset with expected outputs
- Implement scoring functions
- Run automated eval pipeline
- Track scores over time
- Use for regression testing
3. Latency Optimization
- Measure baseline latency
- Identify bottlenecks (model, network, parsing)
- Implement streaming where applicable
- Optimize prompt length
- Consider model size tradeoffs
- Add caching for repeated queries
Best Practices
- Track tokens separately from API call counts
- Implement evals before optimizing
- Use percentiles (p50, p95, p99) not averages for latency
- Log prompt and response for debugging
- Set cost budgets and alerts
- Version prompts and track performance per version
Anti-Patterns
| Anti-Pattern | Problem | Correct Approach |
|---|---|---|
| No token tracking | Surprise costs | Instrument all calls |
| Optimizing without evals | Quality regression | Measure before optimizing |
| Average-only latency | Hides tail latency | Use percentiles |
| No prompt versioning | Can’t correlate changes | Version and track |
| Ignoring caching | Repeated costs | Cache stable responses |