create-skill
npx skills add https://github.com/cpave3/skills --skill create-skill
Agent 安装分布
Skill 文档
Create Skill
Create new skills and iteratively improve them through testing and evaluation.
About Skills
Skills are modular, self-contained packages that extend Claude’s capabilities with specialized knowledge, workflows, and tools. They transform Claude from a general-purpose agent into a specialized one equipped with procedural knowledge no model can fully possess.
Skills provide: specialized workflows, tool integrations, domain expertise, and bundled resources (scripts, references, assets).
Core Principles
Concise is Key
The context window is a public good shared with system prompt, conversation history, other skills, and user requests. Claude is already very smart â only add context it doesn’t already have. Challenge each piece: “Does Claude really need this?” and “Does this justify its token cost?” Prefer concise examples over verbose explanations.
Set Appropriate Degrees of Freedom
Match specificity to the task’s fragility and variability:
- High freedom (text instructions): Multiple valid approaches, context-dependent decisions
- Medium freedom (pseudocode/parameterized scripts): Preferred pattern exists, some variation acceptable
- Low freedom (specific scripts, few parameters): Fragile operations, consistency critical, exact sequence required
Think of Claude exploring a path: a narrow bridge needs guardrails (low freedom), an open field allows many routes (high freedom).
Explain the Why
Explain why things are important rather than using heavy-handed MUSTs. LLMs have good theory of mind â when given reasoning, they go beyond rote instructions. If you find yourself writing ALWAYS or NEVER in all caps, reframe and explain the reasoning instead.
Anatomy of a Skill
skill-name/
âââ SKILL.md (required)
â âââ YAML frontmatter (name, description â required)
â âââ Markdown instructions
âââ Bundled Resources (optional)
âââ scripts/ - Executable code for deterministic/repetitive tasks
âââ references/ - Docs loaded into context as needed
âââ assets/ - Files used in output (templates, icons, fonts)
SKILL.md (required)
- Frontmatter (YAML):
nameanddescriptionfields only. The description is the primary triggering mechanism â Claude reads it to decide when to use the skill. All “when to use” info goes here, not in the body. - Body (Markdown): Instructions and guidance. Only loaded after the skill triggers.
Scripts (scripts/)
Executable code for tasks requiring deterministic reliability or that get rewritten repeatedly.
- When to include: Same code rewritten repeatedly, deterministic reliability needed, errors need explicit handling
- Benefits: Token efficient, deterministic, may execute without loading into context
- Test added scripts by actually running them to verify correctness
References (references/)
Documentation loaded into context as needed.
- When to include: Database schemas, API docs, domain knowledge, company policies, detailed workflow guides
- Avoid duplication: Information lives in either SKILL.md or references, not both. Keep SKILL.md lean; move detailed reference material to separate files.
- Large files (>300 lines): Include a table of contents. If >10k words, include grep patterns in SKILL.md.
Assets (assets/)
Files used in output, not loaded into context.
- When to include: Templates, images, icons, boilerplate code, fonts, sample documents
- Benefits: Separates output resources from documentation
What NOT to Include
Do NOT create extraneous files: README.md, INSTALLATION_GUIDE.md, QUICK_REFERENCE.md, CHANGELOG.md, etc. The skill should only contain what an AI agent needs to do the job.
Progressive Disclosure
Skills use a three-level loading system:
- Metadata (name + description) â Always in context (~100 words)
- SKILL.md body â When skill triggers (<500 lines ideal)
- Bundled resources â As needed (unlimited; scripts can execute without loading)
Keep SKILL.md under 500 lines. When approaching this limit, split content into separate files with clear references describing when to read them.
Pattern 1: High-level guide with references
## Quick start
[Minimal working example]
## Advanced features
- **Form filling**: See [FORMS.md](FORMS.md)
- **API reference**: See [REFERENCE.md](REFERENCE.md)
Pattern 2: Domain-specific organization
bigquery-skill/
âââ SKILL.md (overview + navigation)
âââ references/
âââ finance.md
âââ sales.md
âââ product.md
Claude reads only the relevant reference for the user’s query.
Pattern 3: Conditional details â Show basics, link to advanced content only when needed.
Guidelines:
- Keep references one level deep from SKILL.md
- For files >100 lines, include a table of contents
Skill Creation Process
Figure out where the user is in this process and help them progress. Maybe they want to create from scratch, or maybe they already have a draft and want to iterate.
Step 1: Capture Intent
Start by understanding what the skill should do. The conversation may already contain a workflow to capture (e.g., “turn this into a skill”) â extract what you can from history first.
Key questions (don’t overwhelm â start with the most important):
- What should this skill enable Claude to do?
- When should it trigger? (specific phrases, contexts, file types)
- What’s the expected output format?
- What specific use cases should it handle?
- Does it need executable scripts or just instructions?
- Any reference materials to include?
Check available MCPs for research. Come prepared with context to reduce burden on the user.
Step 2: Plan Reusable Resources
Analyze each concrete example by considering how to execute it from scratch, then identifying what scripts, references, and assets would help when executing repeatedly.
Examples:
- PDF rotation â
scripts/rotate_pdf.py(same code rewritten each time) - Frontend webapp â
assets/hello-world/template (same boilerplate each time) - BigQuery queries â
references/schema.md(re-discovering schemas each time)
Create a list of reusable resources: scripts, references, and assets.
Step 3: Create and Implement
Create the directory structure. Only create subdirectories the skill actually needs â most skills need just SKILL.md and perhaps one resource directory.
Implement the planned resources. This may require user input (e.g., brand assets, documentation). Test added scripts by running them.
Step 4: Write SKILL.md
The skill is for another Claude instance. Include information beneficial and non-obvious to Claude â procedural knowledge, domain details, reusable assets. Use imperative/infinitive form.
Writing the Description
The description is the only thing the agent sees when deciding which skill to load. It must provide enough info to know:
- What capability this skill provides
- When/why to trigger it (keywords, contexts, file types)
Format:
- Max 1024 characters
- Third person voice
- First sentence: what it does
- Second sentence: “Use when [specific triggers]”
- Make descriptions slightly “pushy” â Claude tends to under-trigger. Instead of just describing functionality, explicitly list contexts that should trigger the skill, even non-obvious ones.
Good: “Extract text and tables from PDF files, fill forms, merge documents. Use when working with PDF files or when user mentions PDFs, forms, or document extraction.”
Bad: “Helps with documents.”
Writing the Body
Output format patterns:
## Report structure
ALWAYS use this exact template:
# [Title]
## Executive summary
## Key findings
Example patterns:
## Commit message format
**Example 1:**
Input: Added user authentication with JWT tokens
Output: feat(auth): implement JWT-based authentication
Reference any bundled resources and describe clearly when to read them.
Step 5: Review
After drafting, verify:
- Description includes specific triggers (“Use when…”)
- Description is slightly pushy (lists non-obvious trigger contexts)
- SKILL.md under 500 lines
- No time-sensitive info (dates, versions that will go stale)
- Consistent terminology throughout
- Concrete examples included
- References one level deep, clearly linked from SKILL.md
- No extraneous documentation files
- Scripts tested and working
Present the draft to the user:
- Does this cover your use cases?
- Anything missing or unclear?
- Should any section be more/less detailed?
Step 6: Test and Evaluate
After the skill draft is ready, create 2-3 realistic test prompts and run them. See references/eval-workflow.md for the complete testing and evaluation workflow, including:
- Spawning with-skill and baseline runs
- Drafting quantitative assertions
- Grading, benchmarking, and launching the eval viewer
- Reading user feedback
Save test cases to evals/evals.json. See references/schemas.md for the JSON schema.
If the user prefers to skip formal evals (“just vibe with me”), that’s fine â adapt to their preference.
Step 7: Iterate and Improve
This is the heart of the loop. Apply feedback from user evaluation:
-
Generalize from feedback â The skill will be used across many prompts, not just test examples. Avoid fiddly, overfitting changes. If something is stubbornly wrong, try different metaphors or patterns rather than adding rigid constraints.
-
Keep the skill lean â Remove what isn’t pulling its weight. Read transcripts, not just outputs â if the skill makes the model waste time unproductively, trim those parts.
-
Look for repeated work â If all test runs independently wrote similar helper scripts, bundle that script in
scripts/. Save every future invocation from reinventing the wheel. -
Apply, rerun, review, repeat â After improving, rerun all test cases into a new iteration directory. Keep going until the user is happy, feedback is empty, or no meaningful progress.
See references/eval-workflow.md for detailed iteration mechanics.
Description Optimization
After creating or improving a skill, offer to optimize the description for better triggering accuracy. This uses an automated loop that tests different descriptions against eval queries.
See the “Description Optimization” section in references/eval-workflow.md for the full workflow including trigger eval generation, review, and the optimization loop.
Reference Files
Agents (read when spawning specialized subagents)
- agents/grader.md â How to evaluate assertions against outputs
- agents/comparator.md â How to do blind A/B comparison between outputs
- agents/analyzer.md â How to analyze benchmark results and why one version beat another
References
- references/schemas.md â JSON schemas for evals.json, grading.json, timing.json, benchmark.json, comparison.json, analysis.json
- references/eval-workflow.md â Detailed eval/testing/benchmarking workflow, iteration mechanics, description optimization, and environment-specific instructions
Scripts
scripts/run_loop.pyâ Automated description optimization loopscripts/run_eval.pyâ Execute evaluation queriesscripts/aggregate_benchmark.pyâ Calculate benchmark statisticsscripts/improve_description.pyâ Optimize skill descriptionsscripts/generate_report.pyâ Create HTML reportsscripts/package_skill.pyâ Package skills for distributionscripts/quick_validate.pyâ Quick validation utility
Assets
assets/eval_review.htmlâ HTML template for eval query revieweval-viewer/generate_review.pyâ Generate the eval results viewereval-viewer/viewer.htmlâ HTML viewer for eval results