1-min-eval
npx skills add https://github.com/topvibecoder/eval --skill 1-min-eval
Agent 安装分布
Skill 文档
1-Minute Codebase Evaluation
Fast, parallel evaluation of codebases using Claude CLI with structured metrics.
Features
- â
Smart Scanning: Automatically skips
.claude/,node_modules/,.git/, and previouseval_*results - â Parallel Evaluation: Runs multiple metrics concurrently for speed
- â Auto Ranking: Submits to TopVibeCoder API and gets your rank
- â Progress Tracking: Saves ranking history to track improvements over time
- â Detailed Reports: Generates comprehensive markdown reports with citations
- â Terminal Bar Chart: Visual score display with Unicode block characters
Quick Start
# Evaluate current directory (use by default)
.claude/skills/1-min-eval/scripts/run_eval.sh .
# Evaluate with specific metrics
.claude/skills/1-min-eval/scripts/run_eval.sh /path/to/project --metrics impact,technical
# Full evaluation with all metrics (DO NOT use by default)
.claude/skills/1-min-eval/scripts/run_eval.sh /path/to/project --all-metrics
How It Works
- Scan:
scan_codebase.pyextracts repo tree and source code with line numbers - Evaluate: Runs parallel
claude -pcalls for each metric - Aggregate: Combines JSON results into a final report
- Visualize: Displays terminal bar chart with scores
Example Output
After evaluation completes, you’ll see a visual bar chart:
==================================================
ð Evaluation Scores
==================================================
presentation 6.25 | ââââââââââââââââââââ
impact 5.25 | ââââââââââââââââââââ
technical 1.75 | ââââââââââââââââââââ
creativity 0.50 | ââââââââââââââââââââ
prompt_design 0.00 | ââââââââââââââââââââ
==================================================
Available Metrics
| Metric | Description |
|---|---|
| impact | Real-world problem solving, usable experience |
| technical | Architecture, robustness, LLM integration |
| creativity | Originality, novel LLM usage |
| presentation | UX clarity, onboarding, demo quality |
| prompt_design | Prompt structure, staging, constraints |
| security | Secure coding, auth, dependency hygiene |
| completion | Description-to-code alignment |
| monetization | Business potential analysis |
Scoring Scale (0.00-10.00)
| Range | Meaning |
|---|---|
| 0.00-2.50 | Barely functional, major gaps |
| 2.51-4.50 | Minimal implementation, weak |
| 4.51-6.50 | Working but basic, clear gaps |
| 6.51-8.50 | Solid implementation, good quality |
| 8.51-10.00 | Excellent, production-ready |
Configuration
| Variable | Default | Description |
|---|---|---|
| EVAL_PARALLEL | 4 | Number of parallel evaluations |
| EVAL_TIMEOUT | 300 | Timeout per metric (seconds) |
| EVAL_MAX_CHARS | 300000 | Max chars to include |
| EVAL_MODEL | claude-sonnet-4-5-20250929 | Model to use for evaluation |
Ranking & Progress Tracking
Results are automatically submitted to the TopVibeCoder ranking API to get:
- Overall rank and percentile
- Per-metric rankings (individual rank for each metric)
- Comparison with nearby apps
- Historical progress tracking
Rankings are saved to ranking_history.jsonl in the output directory and to .evals/history.jsonl for unified tracking across all evaluations.
Note: The ranking API uses browser-like headers to bypass Cloudflare protection, ensuring reliable submissions. If the API fails, the evaluation continues and results are still saved locally.
Output Structure
Results saved to .evals/<timestamp>_<project>/ (hidden directory):
codebase.md– Scanned source codecodebase.json– Structured metadataprompts/– Generated evaluation promptsresults/– JSON results per metriclogs/– Execution logsreport.md– Aggregated markdown report with rankingranking_history.jsonl– Historical ranking data (one entry per evaluation)
Note: Evaluation results are saved to a hidden .evals/ directory to keep your workspace clean. Add .evals/ to your .gitignore if you don’t want to commit evaluation results.
Manual Usage
You can also run components individually:
# 1. Scan codebase
python3 .claude/skills/1-min-eval/scripts/scan_codebase.py ./project \
--output /tmp/code.md --max-chars 300000
# 2. Run single metric evaluation
cat /tmp/code.md | claude -p "Evaluate for IMPACT..." --output-format json
# 3. Aggregate results
python3 .claude/skills/1-min-eval/scripts/aggregate.py \
--input-dir ./results --output ./report.md
Adding Custom Metadata
Create metadata.json in project root:
{
"name": "My App",
"description": "An AI-powered tool that...",
"author": "Your Name"
}
Tips
- Large codebases: Use
--max-chars 500000for more context - Debugging: Add
--verboseto see detailed output - Resume: Results are cached; re-run skips completed metrics
- Single metric: Use
--metrics impactfor quick test