better-skill-review

📁 psylch/better-skills 📅 Today

总安装量

周安装量

#53230

全站排名

安装命令

npx skills add https://github.com/psylch/better-skills --skill better-skill-review

Agent 安装分布

mcpjam 1

claude-code 1

replit 1

junie 1

windsurf 1

zencoder 1

Skill 文档

Skill Review

Language

Match user’s language: Respond in the same language the user uses.

Overview

Review a agent skill through three layers: automated linting (hard rules), contextual finding evaluation (agent judges with context), and structured semantic review (deep analysis against best practices). The linter catches mechanical issues; the agent catches design issues.

Dialogue Flow

Progress:

Step 1: Identify the skill
Step 2: Automated linting (hard rules)
Step 3: Profile extraction
Step 4: Contextual findings review (agent judges)
Step 5: Semantic review (deep analysis)
Step 6: Present findings
Step 7: Interactive improvement

Step 1: Identify the Skill

Accept the skill location as a directory path containing SKILL.md. Auto-detect if the current working directory contains one.

Step 2: Automated Linting

Run the validator for hard-rule checks:

python3 {SKILL_DIR}/scripts/validate.py run --path <skill-path>

The output contains two arrays:

checks: Hard-rule verdicts (pass/warn/fail) â mechanical, unambiguous. These determine the linter grade.
findings: Soft detections (data + context_hint) â need your judgment. No verdict yet.

Record the linter grade. Failures must be fixed; warnings are convention issues.

Step 3: Profile Extraction

Run the analyzer for structured facts:

bash {SKILL_DIR}/scripts/analyze.sh analyze <skill-path>

Note the skill level (l0/l0plus/l1) and feature flags â you’ll need these for Steps 4 and 5.

Step 4: Contextual Findings Review

For each finding from Step 2, read the context_hint and examine the actual locations. Judge whether each is a real issue:

Finding	Is a problem	Not a problem
`todo_markers`	In SKILL.md body, script logic, or description	In `.tmpl` template files (intentional scaffolding)
`hardcoded_paths`	In scripts or SKILL.md prose	In `references/` docs as illustrative examples
`pii_patterns`	Real personal emails in scripts/configs	Example emails in docs (`user@example.com` pattern)
`script_conventions`	L0+/L1 skill missing expected patterns	L0 pure-prompt skill with no scripts â skip entirely

Promote findings you judge as real issues to warnings. Dismiss the rest with a brief note.

Step 5: Semantic Review

This is where real value is delivered. Read the skill’s SKILL.md fully, then evaluate each dimension below. For each, give a score (0-3) and specific feedback.

Scoring: 3 = excellent, 2 = adequate, 1 = needs improvement, 0 = missing/broken

5.1 Description Quality (/3)

Read the frontmatter description field.

Length â¥ 100 chars? (50-100 acceptable, <50 needs expansion)
Contains trigger phrases? (“when the user says…”, “Use when…”)
Third-person voice? (not “you/your” â description is consumed by AI)
Specific about what the skill does, not vague (“helps with APIs”)

â Reference: references/improvement_patterns.md Â§ Description Quality

5.2 Workflow Design (/3)

Read the workflow/process sections.

Clear numbered steps with defined inputs/outputs?
Decision points explicit? (if X then Y, else Z)
Interactive steps specify AskUserQuestion where needed?
SKILL.md total lines < 200? (if over, detailed content should be in references/)

â Reference: references/improvement_patterns.md Â§ Workflow Clarity

5.3 Runtime Robustness (/3)

Only for L0+/L1 skills with scripts. Score 3 automatically for L0 pure-prompt skills.

Preflight covers all dependencies, credentials, services?
Preflight failures have a Check â Fix table with specific remediation per item?
Setup separated from business logic? (first-time init vs. daily use)
Degradation strategy defined for optional dependencies?
Troubleshooting table present? (Symptom | Resolution)

â Reference: references/best_practices.md Â§ Preflight Standard, Â§ Degradation Patterns

5.4 Script Quality (/3)

Only for skills with scripts. Score 3 automatically for L0 skills without scripts.

stdout JSON with hint field for all output?
stderr JSON error handling with error, hint, recoverable fields?
Exit codes: 0=success, 1=recoverable, 2=fatal?
Token awareness: --limit or bounded output to avoid context explosion?

â Reference: references/best_practices.md Â§ Script Output Convention, Â§ Token Awareness

5.5 UX Practices (/3)

Check the Applicability Matrix first â only evaluate practices that apply to this skill. A missing practice with no applicable condition is the expected state, not a problem.

Practice	Applies when	Skip when
Language Matching	Published publicly or multilingual audience	Personal/single-language skill
Progress Checklist	4+ sequential workflow steps	Simple 1-3 step or non-linear
Completion Report	Produces artifacts (files, API mutations)	Purely informational (research, audit)
Input Adaptation	Accepts file/content input from user	Dialogue-driven, no file input
Cross-skill Dependencies	References another skill by name	Self-contained
User Preferences	Recurring per-user config across sessions	Fresh parameters each invocation

For each applicable practice that’s missing, suggest adding it with a concrete example.

â Reference: references/best_practices.md Â§ UX Practices, Â§ Applicability Matrix

Step 6: Present Findings

Format the report:

[Skill Review] <skill-name>

âââ Linter âââ
Grade: <letter> (<pass>/<total> passed, <warn> warnings, <fail> failures)

Failures:
  â <check_id>: <message> â Fix: <fix>

Warnings:
  â  <check_id>: <message>

Findings (agent-reviewed):
  â <finding_id>: dismissed â <reason>
  â  <finding_id>: promoted to warning â <reason>

âââ Semantic Review âââ
5.1 Description Quality:   <score>/3  <one-line assessment>
5.2 Workflow Design:        <score>/3  <one-line assessment>
5.3 Runtime Robustness:     <score>/3  <one-line assessment>
5.4 Script Quality:         <score>/3  <one-line assessment>
5.5 UX Practices:           <score>/3  <one-line assessment>
                            âââââââââ
Semantic Score:             <total>/15

âââ Improvement Suggestions âââ
For each dimension scoring < 3, provide:
  1. What to change and why
  2. Which file to edit
  3. A concrete before/after example or specific instruction
  4. Priority: High (functionality/UX) / Medium (convention) / Low (polish)

If linter grade is A and semantic score â¥ 12: congratulate and suggest publishing with better-skill-publish.

Step 7: Interactive Improvement

Ask the user which issues and suggestions to address:

Fix all â Apply all suggested changes
Pick and choose â Let the user select specific items
None â Just use the analysis as a reference

For each selected item, make the edit directly, then confirm. After all changes, optionally re-run the linter to show updated results.

Linter Check Categories

Category	Checks (hard rules)
structure	SKILL.md exists, frontmatter present, required fields, directory layout
naming	Kebab-case, length, no consecutive hyphens, matches directory
content	Description length, body length, heading structure
paths	Referenced files exist, scripts have execute permission
security	No secrets (API key patterns), no template placeholders

Linter Grading

Grade is based on hard-rule checks only (not findings, not semantic review):

A â All checks pass, zero warnings
B â All checks pass, some warnings
C â 1â2 failures
D â 3+ failures
F â SKILL.md missing or no valid frontmatter

References

For the rationale behind each validation check, read references/validation_rules.md.

For the full knowledge base of improvement patterns with examples, read references/improvement_patterns.md.

For skill design conventions and quick reference, read references/best_practices.md.

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台