llm-api-benchmark

📁 ridewind/my-skills 📅 Today

总安装量

周安装量

#58105

全站排名

安装命令

npx skills add https://github.com/ridewind/my-skills --skill llm-api-benchmark

Agent 安装分布

amp 2

cline 2

qoder 2

trae-cn 2

opencode 2

cursor 2

Skill 文档

LLM API Benchmark

Automatically detects the current LLM API endpoint from environment variables and performs performance benchmarking.

Usage

Simply invoke this skill when you want to benchmark your current LLM API:

"Test API speed"
"Run benchmark"
"æµè¯APIéåº¦"
"æµè¯ååºæ¶é´"

Or use the command directly:

python skills/llm-api-benchmark/scripts/benchmark.py

Supported Providers

The tool auto-detects these providers from environment variables:

Environment Variable	Provider
`ANTHROPIC_API_KEY`	Anthropic/Claude
`OPENAI_API_KEY`	OpenAI
`AZURE_OPENAI_API_KEY`	Azure OpenAI
`GOOGLE_GENERATIVE_AI_API_KEY`	Google Gemini
`AWS_ACCESS_KEY_ID`	AWS Bedrock

Options

# List available presets
python benchmark.py --list-presets

# Use preset (recommended for consistent results)
python benchmark.py --preset throughput   # For TPS testing (~300-500 tokens)
python benchmark.py --preset quick        # Fast test (~10 tokens)
python benchmark.py --preset standard     # Medium (~20 tokens)

# Custom iterations
python benchmark.py --iterations 10

# Custom model
python benchmark.py --model gpt-4o

# Custom prompt
python benchmark.py --prompt "Your test prompt"

# Custom output directory
python benchmark.py --output-dir ./my-reports

# Quiet mode (less output)
python benchmark.py --preset throughput -q

Presets

Preset	Description	Expected Output
`quick`	Short prompt for fast testing	~10 tokens
`standard`	Medium-length prompt	~20 tokens
`long`	Longer output test	~100+ tokens
`throughput`	High token output for TPS testing	~300-500 tokens
`code`	Programming-related prompt	~50 tokens
`json`	Structured JSON output test	~30 tokens

Recommended: Use --preset throughput for accurate TPS measurement.

Output

Reports are saved to reports/llm-benchmark-{timestamp}/:

reports/llm-benchmark-20260227-103000/
âââ benchmark-report.md    # Human-readable report
âââ benchmark-data.json    # Raw data for programmatic access

Metrics

The benchmark measures:

Response Time: End-to-end latency
TTFT (Time To First Token): Time until first token received
TPS (Tokens Per Second): Token generation throughput

Statistics include: average, min, max, P50, P95, P99

Example Report

See ../../plans/llm-api-benchmark-plan.md for expected report format.

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台