devtu-optimize-skills

📁 mims-harvard/tooluniverse 📅 9 days ago
95
总安装量
11
周安装量
#4513
全站排名
安装命令
npx skills add https://github.com/mims-harvard/tooluniverse --skill devtu-optimize-skills

Agent 安装分布

codex 11
claude-code 10
opencode 9
amp 6
gemini-cli 6
kimi-cli 5

Skill 文档

Optimizing ToolUniverse Skills

Best practices for creating high-quality ToolUniverse research skills that produce detailed, evidence-graded reports with proper source attribution.

When to Use This Skill

Apply when:

  • Creating new ToolUniverse research skills
  • Reviewing/improving existing skills
  • User complains about missing details, noisy results, or unclear reports
  • Skill produces process-heavy instead of content-heavy output
  • Tools are failing silently or returning empty results

Core Optimization Principles

1. Tool Interface Verification (Pre-flight Check)

Problem: Tool APIs change parameter names over time, or skills are written with incorrect parameter assumptions. This causes silent failures – tools return empty results without errors.

Solution: Verify tool parameters before calling unfamiliar tools:

# Always check tool params to prevent silent failures
tool_info = tu.tools.get_tool_info(tool_name="Reactome_map_uniprot_to_pathways")
# Reveals: takes `id` not `uniprot_id`

Maintain a known corrections table in skills that use many tools:

Tool WRONG Parameter CORRECT Parameter
Reactome_map_uniprot_to_pathways uniprot_id id
ensembl_get_xrefs gene_id id
GTEx_get_median_gene_expression gencode_id only gencode_id + operation="median"
OpenTargets_* ensemblID ensemblId (camelCase)

Rule: Before calling any tool for the first time in a skill, confirm params via get_tool_info() once per tool family, or maintain a vetted param map in the skill.

Why this matters: Retry logic won’t help if you’re calling a tool with wrong parameter names – it will consistently return empty. This is different from API flakiness.

2. Foundation Data Layer (Path 0)

Problem: Skills query specialized tools for each section independently, missing data that a comprehensive aggregator already has. Results are inconsistent when specialized tools fail.

Solution: Identify if your domain has a comprehensive aggregator and query it FIRST before specialized tools.

Examples by domain:

Domain Foundation Source What It Provides
Drug targets Open Targets Diseases, tractability, safety, drugs, GO, publications, mouse models
Chemicals PubChem Properties, bioactivity, patents, literature
Diseases Open Targets / OMIM Genes, drugs, phenotypes, literature
Genes MyGene / Ensembl Annotations, cross-refs, GO, pathways

Pattern:

## Workflow
Phase 0: Foundation Data (aggregator query)
Phase 1: Disambiguation (ID resolution, collision detection)
Phase 2: Specialized Queries (fill gaps from Phase 0)
Phase 3: Report Synthesis

Why this works: The aggregator provides reliable baseline data across multiple sections. Specialized tools then add depth or fill gaps, rather than being the sole source.

3. Versioned Identifier Handling

Problem: Some APIs require versioned identifiers (e.g., GTEx needs ENSG00000123456.12), while others reject them. Skills fail silently when using the wrong format.

Solution: During ID resolution, capture BOTH versioned and unversioned forms:

ids = {
    'ensembl': 'ENSG00000123456',           # Unversioned (most APIs)
    'ensembl_versioned': 'ENSG00000123456.12'  # Versioned (GTEx, some others)
}

# Get version from Ensembl lookup
gene_info = tu.tools.ensembl_lookup_gene(id=ensembl_id, species="human")
if gene_info and gene_info.get('version'):
    ids['ensembl_versioned'] = f"{ensembl_id}.{gene_info['version']}"

Fallback strategy:

  1. Try unversioned first (more portable)
  2. If empty, try versioned
  3. Document which format worked

Common versioned ID APIs: GTEx, GENCODE, some Ensembl endpoints

4. Disambiguation Before Research

Problem: Skills that jump straight to literature search often miss target details or retrieve irrelevant papers due to naming collisions.

Solution: Add a disambiguation phase before any literature search:

## Phase 1: Target Disambiguation (Default ON)

### 1.1 Resolve Official Identifiers
- UniProt accession (canonical protein)
- Ensembl gene ID + version (for expression data)
- NCBI Gene ID (for literature)
- ChEMBL target ID (for drug data)

### 1.2 Gather Synonyms and Aliases
- All known gene symbols
- Protein name variants
- Historical names

### 1.3 Detect Naming Collisions
- Search "[SYMBOL]"[Title] - review top 20 results
- If >20% off-topic → identify collision terms
- Build negative filter: NOT [collision1] NOT [collision2]

### 1.4 Get Baseline Profile (from annotation DBs, not literature)
- Protein domains (InterPro)
- Subcellular location (HPA)
- Tissue expression (GTEx)
- GO terms and pathways

Why this works: Annotation databases provide reliable baseline data even when literature is sparse or noisy.

5. Report-Only Output (Hide Search Process)

Problem: Users don’t want to see “searched 8 databases, found 1,247 papers, deduplicated to 892…”

Solution: Output structure:

File Content When
[topic]_report.md Narrative findings only Always (default)
[topic]_bibliography.json Full deduplicated papers Always
methods_appendix.md Search methodology Only if requested

In the report:

  • ✅ DO: “The literature reveals three main therapeutic approaches…”
  • ❌ DON’T: “I searched PubMed, OpenAlex, and EuropePMC, finding 342 papers…”

6. Evidence Grading

Problem: A review article mention is treated the same as a mechanistic study with direct evidence.

Solution: Apply evidence tiers to every claim:

Tier Symbol Criteria
T1 ★★★ Mechanistic study with direct evidence
T2 ★★☆ Functional study (knockdown, overexpression)
T3 ★☆☆ Association (screen hit, GWAS, correlation)
T4 ☆☆☆ Mention (review, text-mined, peripheral)

In report:

ATP6V1A drives lysosomal acidification [★★★: PMID:12345678] and has been 
implicated in cancer progression [★☆☆: PMID:23456789, TCGA expression data].

Required locations for evidence grades:

  1. Executive Summary – key disease claims
  2. Disease Associations – every disease link
  3. Key Papers table – evidence tier column
  4. Recommendations – reference evidence quality

Per-section summary:

### Theme: Lysosomal Function (47 papers)
**Evidence Quality**: Strong (32 mechanistic, 11 functional, 4 association)

7. Quantified Completeness (Not Just Categorical)

Problem: “Include PPIs” is aspirational; reports pass the checklist but are data-thin.

Solution: Define numeric minimums for each section:

Section Minimum Data If Not Met
PPIs ≥20 interactors Explain why fewer + which tools failed
Expression Top 10 tissues with values Note “limited data” with specific gaps
Disease Top 10 associations with scores Note if fewer available
Variants All 4 constraint scores (pLI, LOEUF, missense Z, pRec) Note which unavailable
Druggability All modalities assessed “No drugs/probes” is valid data
Literature Total + 5-year trend + 3-5 key papers Note if sparse (<50 papers)

Why this matters: Quantified minimums make completeness auditing objective and mechanical, not subjective.

8. Mandatory Completeness Checklist

Problem: Reports have inconsistent sections; some topics get skipped entirely.

Solution: Define mandatory sections that MUST exist, even if populated with “Limited evidence” or “Unknown”:

## Completeness Checklist (ALL Required)

### Identity & Context
- [ ] Official identifiers resolved (all 6 types)
- [ ] Synonyms/aliases documented
- [ ] Naming collisions handled (or "none detected")

### Biology
- [ ] Protein architecture (or "N/A for non-protein")
- [ ] Subcellular localization
- [ ] Expression profile (≥10 tissues with values)
- [ ] Pathway involvement (≥10 pathways)

### Mechanism
- [ ] Core function with evidence grades
- [ ] Model organism data (or "none found")
- [ ] Key assays described

### Disease & Clinical
- [ ] Genetic variants (SNVs and CNVs separated)
- [ ] Constraint scores (all 4, with interpretations)
- [ ] Disease links with evidence grades (≥10 or "limited")

### Druggability
- [ ] Tractability for all modalities
- [ ] Known drugs (or "none")
- [ ] Chemical probes (or "none available")
- [ ] Clinical pipeline (or "none")

### Synthesis (CRITICAL)
- [ ] Research themes (≥3 papers each, or "limited")
- [ ] Open questions/gaps
- [ ] Biological model synthesized
- [ ] Testable hypotheses (≥3)

9. Aggregated Data Gaps Section

Problem: “No data” notes scattered across 14 sections; users can’t quickly see what’s missing.

Solution: Add a dedicated Data Gaps & Limitations section that consolidates all gaps:

## 15. Data Gaps & Limitations

| Section | Expected Data | Actual | Reason | Alternative Source |
|---------|---------------|--------|--------|-------------------|
| 6. PPIs | ≥20 interactors | 8 | Novel target, limited studies | Literature review needed |
| 7. Expression | GTEx TPM | None | Versioned ID not recognized | See HPA data |
| 9. Probes | Chemical probes | None | No validated probes exist | Consider tool compound dev |

**Recommendations for Data Gaps**:
1. For PPIs: Query BioGRID with broader parameters; check yeast-2-hybrid studies
2. For Expression: Query GEO directly for tissue-specific datasets

Why this matters: Users can quickly assess data quality and know where to look for more information.

10. Query Strategy Optimization

Problem: Simple keyword searches retrieve too much noise or miss relevant papers.

Solution: Three-step collision-aware query strategy:

## Query Strategy

### Step 1: High-Precision Seeds
Build a mechanistic core set (15-30 papers):
- "[GENE_SYMBOL]"[Title] AND mechanism
- "[FULL_PROTEIN_NAME]"[Title]
- "UniProt:ACCESSION"

### Step 2: Citation Network Expansion
From seeds, expand via citations:
- Forward: PubMed_get_cited_by, EuropePMC_get_citations
- Related: PubMed_get_related
- Backward: EuropePMC_get_references

### Step 3: Collision-Filtered Broad
Apply negative filters for known collisions:
- "TRAG" AND immune NOT plasmid NOT conjugation
- "JAK" AND kinase NOT "just another"

Citation-first for sparse targets: When keyword search returns <30 papers, prioritize citation expansion from the few good seeds.

11. Tool Failure Handling

Problem: NCBI elink and other APIs can be flaky; skills fail silently.

Solution: Automatic retry with fallback chains:

## Failure Handling

### Retry Protocol
Attempt 1 → fails → wait 2s → Attempt 2 → fails → wait 5s → Fallback

### Fallback Chains
| Primary | Fallback 1 | Fallback 2 |
|---------|------------|------------|
| PubMed_get_cited_by | EuropePMC_get_citations | OpenAlex citations |
| PubMed_get_related | SemanticScholar | Keyword search |
| GTEx_* | HPA_* | Note as unavailable |
| Unpaywall | EuropePMC OA flag | OpenAlex is_oa |
| ChEMBL_get_target_activities | GtoPdb_get_target_ligands | OpenTargets drugs |
| intact_get_interactions | STRING_get_protein_interactions | OpenTargets interactions |

### Document Failures
In report: "Expression data unavailable (GTEx API timeout after 3 attempts)"

Rule: NEVER silently skip failed tools. Always document in the Data Gaps section.

12. Scalable Output Structure

Problem: Reports with 500+ papers become unreadable; users can’t find what they need.

Solution: Separate narrative from data:

Narrative report (~20-50 pages max):

  • Executive summary
  • Key findings by theme
  • Top 20-50 papers highlighted
  • Conclusions and hypotheses

Bibliography files (unlimited):

  • [topic]_bibliography.json – Full structured data
  • [topic]_bibliography.csv – Tabular for filtering

JSON structure:

{
  "pmid": "12345678",
  "doi": "10.1038/xxx",
  "title": "...",
  "evidence_tier": "T1",
  "themes": ["lysosomal_function", "autophagy"],
  "is_core_seed": true,
  "oa_status": "gold"
}

13. Synthesis Sections

Problem: Reports describe what was found but don’t synthesize into actionable insights.

Solution: Require synthesis sections:

## Required Synthesis Sections

### Biological Model (3-5 paragraphs)
Integrate all evidence into a coherent model:
- What does the target do?
- How does it connect to disease?
- What's the key uncertainty?

### Testable Hypotheses (≥3)
| # | Hypothesis | Perturbation | Readout | Expected |
|---|------------|--------------|---------|----------|
| 1 | [Hypothesis] | [Experiment] | [Measure] | [Prediction] |

### Suggested Experiments
Brief description of how to test each hypothesis.

Skill Review Checklist

When reviewing a ToolUniverse skill, check:

Tool Contract

  • Tool parameters verified via get_tool_info() or documented corrections
  • Versioned vs unversioned ID handling specified
  • Foundation data source identified (if available for domain)

Report Quality

  • Report focuses on content, not search process
  • Methodology in separate appendix (optional)
  • Evidence grades applied to claims (T1-T4)
  • Source attribution on every fact
  • Sections exist even if “limited evidence”

Query Strategy

  • Disambiguation phase before search
  • Collision detection for ambiguous names
  • High-precision seeds before broad search
  • Citation expansion option for sparse topics
  • Negative filters documented

Tool Usage

  • Annotation tools used (not just literature)
  • Fallback chains defined
  • Failure handling with retry
  • OA handling (full or best-effort)

Completeness

  • Quantified minimums defined per section
  • Completeness checklist with checkboxes
  • Data Gaps section aggregates all missing data
  • “Negative results” explicitly documented (“no probes” not blank)

Output Structure

  • Main report is narrative-focused
  • Bibliography in separate JSON/CSV
  • Synthesis sections required

User Experience

  • Progress updates are brief
  • No raw tool outputs shown
  • Final report is the deliverable

Common Anti-Patterns to Fix

1. “Search Log” Reports

Bad: “Round 1: Searched PubMed (234 papers), OpenAlex (456 papers)…” Fix: Keep methodology internal; report findings only

2. Missing Disambiguation

Bad: Search “JAK” and get kinase + “just another kinase” papers mixed Fix: Add collision detection; build negative filters

3. No Evidence Grading

Bad: “Multiple studies show…” (which studies? what quality?) Fix: Apply T1-T4 grades; label each claim

4. Empty Sections Omitted

Bad: Skip “Pathogen Involvement” because nothing found Fix: Include section with “None identified in literature search”

5. No Synthesis

Bad: Long list of papers organized by theme Fix: Add biological model + testable hypotheses

6. Monolithic Bibliography

Bad: 200 papers embedded in report narrative Fix: Top 20-50 in report; full list in JSON/CSV

7. Silent Failures

Bad: “Expression data: [blank]” (tool failed, user doesn’t know) Fix: “Expression data unavailable (API timeout); see HPA directly”

8. Wrong Tool Parameters (NEW)

Bad: Reactome_map_uniprot_to_pathways(uniprot_id=...) returns empty Fix: Verify params via get_tool_info(); use correct param id

9. Missing Versioned IDs (NEW)

Bad: GTEx returns empty for ENSG00000123456 Fix: Try versioned ID ENSG00000123456.12; document which worked

10. No Foundation Layer (NEW)

Bad: Query 15 specialized tools independently, miss data when some fail Fix: Query comprehensive aggregator (e.g., Open Targets) first

11. Scattered “No Data” Notes (NEW)

Bad: “No data” in 5 different sections; user doesn’t know overall gaps Fix: Aggregate all gaps in dedicated Data Gaps section with recommendations

12. Aspirational Completeness (NEW)

Bad: “Include PPIs” ✓ (but only 3 interactors listed) Fix: “≥20 PPIs OR explanation why fewer”


Template: Optimized Skill Structure

---
name: [domain]-research
description: [What it does]. Creates detailed report with evidence grading 
and mandatory completeness. [When to use triggers].
---

# [Domain] Research Strategy

## When to Use
[Trigger scenarios]

## Workflow
Phase -1: Tool Verification → Phase 0: Foundation Data → Phase 1: Disambiguate → Phase 2: Search → Phase 3: Report

## Phase -1: Tool Verification
[Parameter corrections table for tools used in this skill]

## Phase 0: Foundation Data
[Comprehensive aggregator query - e.g., Open Targets for targets]

## Phase 1: Disambiguation (Default ON)
[ID resolution (versioned + unversioned), collision detection, baseline profile]

## Phase 2: Specialized Queries (Internal)
[Query strategy with collision filters, citation expansion, tool fallbacks]

## Phase 3: Report Synthesis
[Progressive writing, evidence grading, mandatory sections]

## Output Files
- `[topic]_report.md` (narrative, always)
- `[topic]_bibliography.json` (data, always)
- `methods_appendix.md` (only if requested)

## Quantified Minimums
[Specific numbers per section - e.g., ≥20 PPIs, top 10 tissues]

## Completeness Checklist
[ALL required sections with checkboxes]

## Data Gaps Section
[Template for aggregating missing data with recommendations]

## Evidence Grading
[T1-T4 definitions with required locations]

## Tool Reference
[Tools by category with fallback chains and parameter notes]

Quick Fixes for Common Complaints

User Complaint Root Cause Fix
“Report is too short” Missing annotation data Add Phase 1 disambiguation + Phase 0 foundation
“Too much noise” No collision filtering Add negative query filters
“Can’t tell what’s important” No evidence grading Add T1-T4 tiers
“Missing sections” No completeness checklist Add mandatory sections with minimums
“Too long/unreadable” Monolithic output Separate narrative from JSON
“Just a list of papers” No synthesis Add biological model + hypotheses
“Shows search process” Wrong output focus Report-only; methodology in appendix
“Tool failed, no data” No fallback handling Add retry + fallback chains
“Empty results, no error” Wrong tool parameters Add Phase -1 param verification
“GTEx returns nothing” Versioned ID needed Try ENSG*.version format
“Data seems incomplete” No foundation layer Add Phase 0 with aggregator
“Can’t tell what’s missing” Scattered gaps Add Data Gaps section

Summary

Seven pillars of optimized ToolUniverse skills:

  1. Verify tool contracts – Check params via get_tool_info(); maintain corrections table
  2. Foundation first – Query comprehensive aggregators before specialized tools
  3. Disambiguate carefully – Resolve IDs (versioned + unversioned), detect collisions, get baseline from annotation DBs
  4. Grade evidence – T1-T4 tiers on all claims; summarize quality per section
  5. Require quantified completeness – Numeric minimums, not just “include X”
  6. Report content, not process – Methodology in appendix only if asked; aggregate gaps in one section
  7. Synthesize – Biological models and testable hypotheses, not just paper lists

Apply these principles to any ToolUniverse research skill for better user experience and actionable output.