1-min-eval

📁 topvibecoder/eval 📅 Jan 28, 2026

总安装量

周安装量

#49928

全站排名

安装命令

npx skills add https://github.com/topvibecoder/eval --skill 1-min-eval

Agent 安装分布

cursor 4

gemini-cli 4

antigravity 4

claude-code 4

codex 4

windsurf 4

Skill 文档

1-Minute Codebase Evaluation

Fast, parallel evaluation of codebases using Claude CLI with structured metrics.

Features

â Smart Scanning: Automatically skips .claude/, node_modules/, .git/, and previous eval_* results
â Parallel Evaluation: Runs multiple metrics concurrently for speed
â Auto Ranking: Submits to TopVibeCoder API and gets your rank
â Progress Tracking: Saves ranking history to track improvements over time
â Detailed Reports: Generates comprehensive markdown reports with citations
â Terminal Bar Chart: Visual score display with Unicode block characters

Quick Start

# Evaluate current directory (use by default)
.claude/skills/1-min-eval/scripts/run_eval.sh .

# Evaluate with specific metrics
.claude/skills/1-min-eval/scripts/run_eval.sh /path/to/project --metrics impact,technical

# Full evaluation with all metrics (DO NOT use by default)
.claude/skills/1-min-eval/scripts/run_eval.sh /path/to/project --all-metrics

How It Works

Scan: scan_codebase.py extracts repo tree and source code with line numbers
Evaluate: Runs parallel claude -p calls for each metric
Aggregate: Combines JSON results into a final report
Visualize: Displays terminal bar chart with scores

Example Output

After evaluation completes, you’ll see a visual bar chart:

==================================================
ð Evaluation Scores
==================================================
  presentation    6.25 | ââââââââââââââââââââ
  impact          5.25 | ââââââââââââââââââââ
  technical       1.75 | ââââââââââââââââââââ
  creativity      0.50 | ââââââââââââââââââââ
  prompt_design   0.00 | ââââââââââââââââââââ
==================================================

Available Metrics

Metric	Description
impact	Real-world problem solving, usable experience
technical	Architecture, robustness, LLM integration
creativity	Originality, novel LLM usage
presentation	UX clarity, onboarding, demo quality
prompt_design	Prompt structure, staging, constraints
security	Secure coding, auth, dependency hygiene
completion	Description-to-code alignment
monetization	Business potential analysis

Scoring Scale (0.00-10.00)

Range	Meaning
0.00-2.50	Barely functional, major gaps
2.51-4.50	Minimal implementation, weak
4.51-6.50	Working but basic, clear gaps
6.51-8.50	Solid implementation, good quality
8.51-10.00	Excellent, production-ready

Configuration

Variable	Default	Description
EVAL_PARALLEL	4	Number of parallel evaluations
EVAL_TIMEOUT	300	Timeout per metric (seconds)
EVAL_MAX_CHARS	300000	Max chars to include
EVAL_MODEL	claude-sonnet-4-5-20250929	Model to use for evaluation

Ranking & Progress Tracking

Results are automatically submitted to the TopVibeCoder ranking API to get:

Overall rank and percentile
Per-metric rankings (individual rank for each metric)
Comparison with nearby apps
Historical progress tracking

Rankings are saved to ranking_history.jsonl in the output directory and to .evals/history.jsonl for unified tracking across all evaluations.

Note: The ranking API uses browser-like headers to bypass Cloudflare protection, ensuring reliable submissions. If the API fails, the evaluation continues and results are still saved locally.

Output Structure

Results saved to .evals/<timestamp>_<project>/ (hidden directory):

codebase.md – Scanned source code
codebase.json – Structured metadata
prompts/ – Generated evaluation prompts
results/ – JSON results per metric
logs/ – Execution logs
report.md – Aggregated markdown report with ranking
ranking_history.jsonl – Historical ranking data (one entry per evaluation)

Note: Evaluation results are saved to a hidden .evals/ directory to keep your workspace clean. Add .evals/ to your .gitignore if you don’t want to commit evaluation results.

Manual Usage

You can also run components individually:

# 1. Scan codebase
python3 .claude/skills/1-min-eval/scripts/scan_codebase.py ./project \
    --output /tmp/code.md --max-chars 300000

# 2. Run single metric evaluation
cat /tmp/code.md | claude -p "Evaluate for IMPACT..." --output-format json

# 3. Aggregate results
python3 .claude/skills/1-min-eval/scripts/aggregate.py \
    --input-dir ./results --output ./report.md

Adding Custom Metadata

Create metadata.json in project root:

{
  "name": "My App",
  "description": "An AI-powered tool that...",
  "author": "Your Name"
}

Tips

Large codebases: Use --max-chars 500000 for more context
Debugging: Add --verbose to see detailed output
Resume: Results are cached; re-run skips completed metrics
Single metric: Use --metrics impact for quick test

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台