performance-monitor

📁 404kidwiz/claude-supercode-skills 📅 Jan 24, 2026

总安装量

周安装量

#6619

全站排名

安装命令

npx skills add https://github.com/404kidwiz/claude-supercode-skills --skill performance-monitor

Agent 安装分布

opencode 22

claude-code 21

gemini-cli 20

cursor 16

windsurf 14

Skill 文档

Performance Monitor

Purpose

Provides expertise in monitoring, benchmarking, and optimizing AI agent performance. Specializes in token usage tracking, latency analysis, cost optimization, and implementing quality evaluation metrics (evals) for AI systems.

When to Use

Tracking token usage and costs for AI agents
Measuring and optimizing agent latency
Implementing evaluation metrics (evals)
Benchmarking agent quality and accuracy
Optimizing agent cost efficiency
Building observability for AI pipelines
Analyzing agent conversation patterns
Setting up A/B testing for agents

Quick Start

Invoke this skill when:

Optimizing AI agent costs and token usage
Measuring agent latency and performance
Implementing evaluation frameworks
Building observability for AI systems
Benchmarking agent quality

Do NOT invoke when:

General application performance â use /performance-engineer
Infrastructure monitoring â use /sre-engineer
ML model training optimization â use /ml-engineer
Prompt design â use /prompt-engineer

Decision Framework

Optimization Goal?
âââ Cost Reduction
â   âââ Token usage â Prompt optimization
â   âââ API calls â Caching, batching
âââ Latency
â   âââ Time to first token â Streaming
â   âââ Total response time â Model selection
âââ Quality
â   âââ Accuracy â Evals with ground truth
â   âââ Consistency â Multiple run analysis
âââ Reliability
    âââ Error rates, retry patterns

Core Workflows

1. Token Usage Tracking

Instrument API calls to capture usage
Track input vs output tokens separately
Aggregate by agent, task, user
Calculate costs per operation
Build dashboards for visibility
Set alerts for anomalous usage

2. Eval Framework Setup

Define evaluation criteria
Create test dataset with expected outputs
Implement scoring functions
Run automated eval pipeline
Track scores over time
Use for regression testing

3. Latency Optimization

Measure baseline latency
Identify bottlenecks (model, network, parsing)
Implement streaming where applicable
Optimize prompt length
Consider model size tradeoffs
Add caching for repeated queries

Best Practices

Track tokens separately from API call counts
Implement evals before optimizing
Use percentiles (p50, p95, p99) not averages for latency
Log prompt and response for debugging
Set cost budgets and alerts
Version prompts and track performance per version

Anti-Patterns

Anti-Pattern	Problem	Correct Approach
No token tracking	Surprise costs	Instrument all calls
Optimizing without evals	Quality regression	Measure before optimizing
Average-only latency	Hides tail latency	Use percentiles
No prompt versioning	Can’t correlate changes	Version and track
Ignoring caching	Repeated costs	Cache stable responses

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台