taxonomy-builder
17
总安装量
15
周安装量
#20545
全站排名
安装命令
npx skills add https://github.com/willoscar/research-units-pipeline-skills --skill taxonomy-builder
Agent 安装分布
claude-code
10
gemini-cli
9
codex
8
cursor
7
antigravity
6
Skill 文档
Taxonomy Builder
Turn a core paper set into a 2+ level, mappable taxonomy that will drive the outline and paper-to-section mapping.
This is structure, not writing: avoid prose paragraphs and avoid âgeneric placeholderâ buckets.
Role cards (prompt-level guidance)
-
Taxonomy Architect
- Mission: create a taxonomy that reads like a surveyâs core chapters (few, thick buckets).
- Do: choose 3â4 top-level chapters by reader questions and decision-relevant axes.
- Avoid: keyword-only clusters, âMisc/Otherâ, and too many top-level buckets that would bloat the final ToC.
-
Mapping Sponsor
- Mission: keep the taxonomy mappable to real papers.
- Do: ensure each leaf can plausibly map to multiple papers (ideally â¥3); keep node names discriminative.
- Avoid: overlapping buckets whose boundaries are not explainable.
-
Scope Guardian
- Mission: encode what counts as in-scope at the taxonomy level.
- Do: bake boundary cues into descriptions (what belongs here, what does not).
- Avoid: relying on later prose to resolve scope drift.
When to use
- You have a
papers/core_set.csvand need a stable structure for a survey/snapshot. - You want categories that are meaningful to readers (not just keyword clusters).
When not to use
- You already have an approved taxonomy that maps well to your target narrative (donât churn it).
Inputs
papers/core_set.csv(required)- Optional:
papers/papers_dedup.jsonl(to peek at abstracts/metadata) - Optional:
DECISIONS.md(scope constraints)
Output
outline/taxonomy.yml
Workflow (heuristic)
Uses: papers/papers_dedup.jsonl, DECISIONS.md.
- Skim the core set and cluster by reader-relevant axes, not by surface keywords.
- For LLM agents, common axes: control loop/architecture, tool use, planning & reasoning, memory/RAG, multi-agent coordination, evaluation/benchmarks, safety/security, applications.
- Choose top-level nodes that feel like âchapters in a surveyâ, and keep a paper-like section budget:
- If you want a paper-like PDF with ~6â8 H2 sections total (see
ref/agent-surveys/STYLE_REPORT.md), remember the pipeline also adds fixed H2 sections (Introduction / Related Work / Discussion / Conclusion). - In that case, aim for ~3â4 taxonomy-driven chapters (top-level nodes), not 8â12 tiny buckets.
- Deep surveys can go wider (e.g., 5â6 taxonomy chapters), but expect thinner writing unless you also expand evidence and writing budgets.
- If you want a paper-like PDF with ~6â8 H2 sections total (see
- For each top-level node, create 2â6 subtopics with clear inclusion cues (what belongs here, what doesnât).
- Write a short description for every node:
- define what the bucket covers
- name 2â5 representative paper IDs (or recognizable lines of work) that belong here
- Sanity check:
- leaves arenât too tiny (ideally â¥3 papers per leaf)
- names are mutually exclusive enough (some overlap is OK, confusion is not)
Quality checklist
-
outline/taxonomy.ymlhas â¥2 levels. - Every node has a
descriptionwith concrete meaning (not âPapers and ideas centered on â¦â boilerplate). - Leaf nodes look mappable (not overly broad like âMisc/Otherâ).
- Top-level nodes feel like chapters (avoid too many tiny buckets if you target a paper-like 6â8 H2 structure).
Common failure modes (and fixes)
- Generic buckets (âOverview/Benchmarks/Open Problemsâ) â rename to content-based subtopics.
- Keyword clustering â reframe as design/evaluation questions a reader would ask.
- Too much overlap â tighten inclusion cues; split a bucket by mechanism vs evaluation vs safety.
- Too many top-level buckets â merge into fewer, thicker chapters; push fine-grained points into subsection bullets/axes instead of new H2 sections.
Helper script (optional)
Quick Start
python .codex/skills/taxonomy-builder/scripts/run.py --helppython .codex/skills/taxonomy-builder/scripts/run.py --workspace <workspace_dir>
All Options
--top-k <n>: number of candidate terms to consider--min-freq <n>: minimum frequency threshold
Examples
- Generate a baseline taxonomy (then optionally refine):
python .codex/skills/taxonomy-builder/scripts/run.py --workspace <ws> --top-k 100 --min-freq 2
Notes
- The script generates a baseline 2-level taxonomy (topic-aware) and never overwrites non-placeholder work.
- In
pipeline.py --strictit will be blocked only if placeholder markers (TODO/TBD/FIXME/(placeholder)) remain.
Refinement marker (recommended; completion signal)
When you are satisfied with the taxonomy (and after C2 approval if applicable), create:
outline/taxonomy.refined.ok
This is an explicit “I reviewed/refined this” signal:
- makes it harder for a scaffold-y taxonomy to silently pass in strict runs
- documents that buckets were edited into reader-meaningful, mappable nodes
Troubleshooting
Common Issues
Issue: Quality gate blocks taxonomy_scaffold
Symptom:
output/QUALITY_GATE.mdreports taxonomy containsTODO/placeholder text.
Causes:
- Helper script generated a scaffold, but taxonomy was not rewritten.
Solutions:
- Rewrite every node name + description to be domain-meaningful.
- Ensure â¥2 levels via
children. - Remove generic buckets like âOverview/Benchmarks/Open Problemsâ.
Issue: Taxonomy has no depth (children missing)
Symptom:
- Quality gate reports âneeds â¥2 levelsâ.
Causes:
- Only top-level nodes were created.
Solutions:
- Add 2â6 child nodes per top-level node, each with clear inclusion cues.
Recovery Checklist
-
outline/taxonomy.ymlis valid YAML list. - At least one node has a non-empty
childrenlist. - No
TODO/(placeholder)remains.