assess

📁 yonatangross/orchestkit 📅 Jan 25, 2026

总安装量

周安装量

#9184

全站排名

安装命令

npx skills add https://github.com/yonatangross/orchestkit --skill assess

Agent 安装分布

claude-code 8

gemini-cli 7

opencode 7

antigravity 7

codex 6

windsurf 6

Skill 文档

Assess

Comprehensive assessment skill for answering “is this good?” with structured evaluation, scoring, and actionable recommendations.

Quick Start

/assess backend/app/services/auth.py
/assess our caching strategy
/assess the current database schema
/assess frontend/src/components/Dashboard

STEP 0: Verify User Intent with AskUserQuestion

BEFORE creating tasks, clarify assessment dimensions:

AskUserQuestion(
  questions=[{
    "question": "What dimensions to assess?",
    "header": "Dimensions",
    "options": [
      {"label": "Full assessment (Recommended)", "description": "All dimensions: quality, maintainability, security, performance"},
      {"label": "Code quality only", "description": "Readability, complexity, best practices"},
      {"label": "Security focus", "description": "Vulnerabilities, attack surface, compliance"},
      {"label": "Quick score", "description": "Just give me a 0-10 score with brief notes"}
    ],
    "multiSelect": false
  }]
)

Based on answer, adjust workflow:

Full assessment: All 7 phases, parallel agents
Code quality only: Skip security and performance phases
Security focus: Prioritize security-auditor agent
Quick score: Single pass, brief output

STEP 0b: Select Orchestration Mode

Choose Agent Teams (mesh â assessors cross-validate scores) or Task tool (star â all report to lead):

ORCHESTKIT_PREFER_TEAMS=1 â Agent Teams mode
Agent Teams unavailable â Task tool mode (default)
Otherwise: Full assessment with 6 dimension agents â recommend Agent Teams; Quick score or single-dimension â Task tool

Aspect	Task Tool	Agent Teams
Score calibration	Lead normalizes independently	Assessors discuss disagreements
Cross-dimension findings	Lead correlates after completion	Security assessor alerts performance assessor of overlap
Cost	~200K tokens	~500K tokens
Best for	Quick scores, single dimension	Full multi-dimensional assessment

Fallback: If Agent Teams encounters issues, fall back to Task tool for remaining assessment.

Task Management (CC 2.1.16)

# Create main assessment task
TaskCreate(
  subject="Assess: {target}",
  description="Comprehensive evaluation with quality scores and recommendations",
  activeForm="Assessing {target}"
)

# Create subtasks for 7-phase process
for phase in ["Understand target", "Rate quality", "List pros/cons",
              "Compare alternatives", "Generate suggestions",
              "Estimate effort", "Compile report"]:
    TaskCreate(subject=phase, activeForm=f"{phase}ing")

What This Skill Answers

Question	How It’s Answered
“Is this good?”	Quality score 0-10 with reasoning
“What are the trade-offs?”	Structured pros/cons list
“Should we change this?”	Improvement suggestions with effort
“What are the alternatives?”	Comparison with scores
“Where should we focus?”	Prioritized recommendations

Workflow Overview

Phase	Activities	Output
1. Target Understanding	Read code/design, identify scope	Context summary
2. Quality Rating	6-dimension scoring (0-10)	Scores with reasoning
3. Pros/Cons Analysis	Strengths and weaknesses	Balanced evaluation
4. Alternative Comparison	Score alternatives	Comparison matrix
5. Improvement Suggestions	Actionable recommendations	Prioritized list
6. Effort Estimation	Time and complexity estimates	Effort breakdown
7. Assessment Report	Compile findings	Final report

Phase 1: Target Understanding

Identify what’s being assessed (code, design, approach, decision, pattern) and gather context:

# PARALLEL - Gather context
Read(file_path="$ARGUMENTS")  # If file path
Grep(pattern="$ARGUMENTS", output_mode="files_with_matches")
mcp__memory__search_nodes(query="$ARGUMENTS")  # Past decisions

Phase 2: Quality Rating (6 Dimensions)

Rate each dimension 0-10 with weighted composite score. See Scoring Rubric for details.

Dimension	Weight	What It Measures
Correctness	0.20	Does it work correctly?
Maintainability	0.20	Easy to understand/modify?
Performance	0.15	Efficient, no bottlenecks?
Security	0.15	Follows best practices?
Scalability	0.15	Handles growth?
Testability	0.15	Easy to test?

Composite Score: Weighted average of all dimensions.

Launch 6 parallel agents (one per dimension) with run_in_background=True.

Phase 2 â Agent Teams Alternative

In Agent Teams mode, form an assessment team where dimension assessors cross-validate scores and discuss disagreements:

TeamCreate(team_name="assess-{target-slug}", description="Assess {target}")

Task(subagent_type="code-quality-reviewer", name="correctness-assessor",
     team_name="assess-{target-slug}",
     prompt="""Assess CORRECTNESS (0-10) and MAINTAINABILITY (0-10) for: {target}
     When you find issues that affect security, message security-assessor.
     When you find issues that affect performance, message perf-assessor.
     Share your scores with all teammates for calibration â if scores diverge
     significantly (>2 points), discuss the disagreement.""")

Task(subagent_type="security-auditor", name="security-assessor",
     team_name="assess-{target-slug}",
     prompt="""Assess SECURITY (0-10) for: {target}
     When correctness-assessor flags security-relevant patterns, investigate deeper.
     When you find performance-impacting security measures, message perf-assessor.
     Share your score and flag any cross-dimension trade-offs.""")

Task(subagent_type="performance-engineer", name="perf-assessor",
     team_name="assess-{target-slug}",
     prompt="""Assess PERFORMANCE (0-10) and SCALABILITY (0-10) for: {target}
     When security-assessor flags performance trade-offs, evaluate the impact.
     When you find testability issues (hard-to-benchmark code), message test-assessor.
     Share your scores with reasoning for the composite calculation.""")

Task(subagent_type="test-generator", name="test-assessor",
     team_name="assess-{target-slug}",
     prompt="""Assess TESTABILITY (0-10) for: {target}
     Evaluate test coverage, test quality, and ease of testing.
     When other assessors flag dimension-specific concerns, verify test coverage
     for those areas. Share your score and any coverage gaps found.""")

Team teardown after report compilation:

SendMessage(type="shutdown_request", recipient="correctness-assessor", content="Assessment complete")
SendMessage(type="shutdown_request", recipient="security-assessor", content="Assessment complete")
SendMessage(type="shutdown_request", recipient="perf-assessor", content="Assessment complete")
SendMessage(type="shutdown_request", recipient="test-assessor", content="Assessment complete")
TeamDelete()

Fallback: If team formation fails, use standard Phase 2 Task spawns above.

Phase 3: Pros/Cons Analysis

## Pros (Strengths)
| # | Strength | Impact | Evidence |
|---|----------|--------|----------|
| 1 | [strength] | High/Med/Low | [example] |

## Cons (Weaknesses)
| # | Weakness | Severity | Evidence |
|---|----------|----------|----------|
| 1 | [weakness] | High/Med/Low | [example] |

**Net Assessment:** [Strengths outweigh / Balanced / Weaknesses dominate]
**Recommended action:** [Keep as-is / Improve / Reconsider / Rewrite]

Phase 4: Alternative Comparison

See Alternative Analysis for full comparison template.

Criteria	Current	Alternative A	Alternative B
Composite	[N.N]	[N.N]	[N.N]
Migration Effort	N/A	[1-5]	[1-5]

Phase 5: Improvement Suggestions

See Improvement Prioritization for effort/impact guidelines.

Suggestion	Effort (1-5)	Impact (1-5)	Priority (I/E)
[action]	[N]	[N]	[ratio]

Quick Wins = Effort <= 2 AND Impact >= 4. Always highlight these first.

Phase 6: Effort Estimation

Timeframe	Tasks	Total
Quick wins (< 1hr)	[list]	X min
Short-term (< 1 day)	[list]	X hrs
Medium-term (1-3 days)	[list]	X days

Phase 7: Assessment Report

See Scoring Rubric for full report template.

# Assessment Report: $ARGUMENTS

**Overall Score: [N.N]/10** (Grade: [A+/A/B/C/D/F])

**Verdict:** [EXCELLENT | GOOD | ADEQUATE | NEEDS WORK | CRITICAL]

## Answer: Is This Good?
**[YES / MOSTLY / SOMEWHAT / NO]**
[Reasoning]

Grade Interpretation

Score	Grade	Verdict
9.0-10.0	A+	EXCELLENT
8.0-8.9	A	GOOD
7.0-7.9	B	GOOD
6.0-6.9	C	ADEQUATE
5.0-5.9	D	NEEDS WORK
0.0-4.9	F	CRITICAL

Key Decisions

Decision	Choice	Rationale
6 dimensions	Comprehensive coverage	All quality aspects without overwhelming
0-10 scale	Industry standard	Easy to understand and compare
Parallel assessment	6 agents	Fast, thorough evaluation
Effort/Impact scoring	1-5 scale	Simple prioritization math

Related Skills

assess-complexity – Task complexity assessment
verify – Post-implementation verification
code-review-playbook – Code review patterns
quality-gates – Quality gate patterns

Version: 1.0.0 (January 2026)

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台