tooluniverse-literature-deep-research

📁 mims-harvard/tooluniverse 📅 9 days ago

总安装量

周安装量

安装命令

npx skills add https://github.com/mims-harvard/tooluniverse --skill tooluniverse-literature-deep-research

Agent 安装分布

claude-code 22

opencode 21

codex 21

amp 13

github-copilot 13

Skill 文档

Literature Deep Research Strategy (Enhanced)

A systematic approach to comprehensive literature research that starts with target disambiguation to prevent missing details, uses evidence grading to separate signal from noise, and produces a content-focused report with mandatory completeness sections.

KEY PRINCIPLES:

Target disambiguation FIRST – Resolve IDs, synonyms, naming collisions before literature search
Right-size the deliverable – Use Factoid / Verification Mode for single, answerable questions; use full report mode for âdeep researchâ
Report-first output – Default deliverable is a report file; an inline answer is allowed (and recommended) for Factoid / Verification Mode
Evidence grading – Grade every claim by evidence strength (mechanistic paper vs screen hit vs review vs text-mined)
Mandatory completeness – All checklist sections must exist, even if “unknown/limited evidence”
Source attribution – Every piece of information traceable to database/tool
English-first queries – Always use English terms for literature searches and tool calls, even if the user writes in another language. Only try original-language terms as a fallback if English returns no results. Respond in the user’s language

Workflow Overview

User Query
  â
Phase 0: CLARIFY + MODE SELECT (factoid vs deep report)
  â
Phase 1: TARGET DISAMBIGUATION + PROFILE (default ON for biological targets)
  ââ Resolve official IDs (Ensembl, UniProt, HGNC)
  ââ Gather synonyms/aliases + known naming collisions
  ââ Get protein length, isoforms, domain architecture
  ââ Get subcellular location, expression, GO terms, pathways
  ââ Output: Target Profile section + Collision-aware search plan
  â
Phase 2: LITERATURE SEARCH (internal methodology, not shown)
  ââ High-precision seed queries (build mechanistic core)
  ââ Citation network expansion from seeds
  ââ Collision-filtered broader queries
  ââ Theme clustering + evidence grading
  â
Phase 3: REPORT SYNTHESIS
  ââ Progressive writing to [topic]_report.md
  ââ Mandatory completeness checklist validation
  ââ Biological model + testable hypotheses
  â
Optional: methods_appendix.md (only if user requests)

Phase 0: Initial Clarification

Mandatory Questions

Target type: Is this a biological target (gene/protein), a general topic, or a disease?
Scope: Is this a single factoid to verify (âWhich antibiotic?â, âWhich strain?â, âWhich year?â) or a comprehensive/deep review?
Known aliases: Any specific gene symbols or protein names you use?
Constraints: Open access only? Include preprints? Specific organisms?
Methods appendix: Do you want methodology details in a separate file?

Mode Selection (CRITICAL)

Pick exactly one mode based on the userâs intent and the question structure:

Factoid / Verification Mode (single concrete question; answer should be a short phrase/sentence)
Mini-review Mode (narrow topic; 1â3 pages of synthesis)
Full Deep-Research Mode (use the full template + completeness checklist)

Heuristic:

If the user asks âX has been evolved to be resistant to which antibiotic?â â Factoid / Verification Mode
If the user asks âWhat does the literature say about X?â â Full Deep-Research Mode

Factoid / Verification Mode (Fast Path)

Goal: Provide a correct, source-verified single answer, with minimal but explicit evidence attribution.

Deliverables (still file-backed):

[topic]_factcheck_report.md (â¤ 1 page)
[topic]_bibliography.json (+ CSV) containing the key paper(s)

Fact-check report template:

# [TOPIC]: Fact-check Report

*Generated: [Date]*
*Evidence cutoff: [Date]*

## Question
[User question]

## Answer
**[One-sentence answer]** [Evidence: âââ/âââ/âââ/âââ]

## Source(s)
- [Primary paper citation: journal/year/PMID/DOI as available]

## Verification Notes
- [1â3 bullets: where in the paper the statement appears (Abstract/Results/Methods), and any key constraints]

## Limitations
- [If full text not available, or if only review evidence exists]

Required verification behavior:

Prefer ToolUniverse literature tools (Europe PMC / PubMed / PMC / Semantic Scholar) over general web browsing.
Use full-text snippet verification when possible (Europe PMC auto-snippet tier is ideal).
Avoid adding extra claims (e.g., ânot Xâ) unless the paper explicitly supports them.

Suggested tool pattern:

EuropePMC_search_articles(query=..., extract_terms_from_fulltext=[...]) to pull OA full-text snippets for the key terms.
If OA snippets unavailable: fall back to PMC_search_papers (if in PMC) or SemanticScholar_search_papers â SemanticScholar_get_pdf_snippets.

Evidence grading (factoid):

If the statement is explicitly made in a primary experimental paper (Results/Methods/Abstract): label T1 (âââ).
If itâs only in a review: label T4 (âââ) and try to locate the primary source.

Detect Target Type

Query Pattern	Type	Action
Gene symbol (EGFR, TP53, ATP6V1A)	Biological target	Phase 1 required
Protein name (“V-ATPase”, “kinase”)	Biological target	Phase 1 required
UniProt ID (P00533, Q93050)	Biological target	Phase 1 required
Disease, pathway, method	General topic	Phase 1 optional
“Literature on X”	Depends on X	Assess X

Phase 1: Target Disambiguation + Profile (Default ON)

CRITICAL: This phase prevents “missing target details” when literature is sparse or noisy.

1.1 Resolve Official Identifiers

Use these tools to establish canonical identity:

UniProt_search â Get UniProt accession for human protein
UniProt_get_entry_by_accession â Full entry with cross-references
UniProt_id_mapping â Map between ID types
ensembl_lookup_gene â Ensembl gene ID, biotype
MyGene_get_gene_annotation â NCBI Gene ID, aliases, summary

Output for report:

## Target Identity

| Identifier | Value | Source |
|------------|-------|--------|
| Official Symbol | ATP6V1A | HGNC |
| UniProt | P38606 | UniProt |
| Ensembl Gene | ENSG00000114573 | Ensembl |
| NCBI Gene ID | 523 | NCBI |
| ChEMBL Target | CHEMBL2364682 | ChEMBL |

**Full Name**: V-type proton ATPase catalytic subunit A
**Synonyms/Aliases**: ATP6A1, VPP2, Vma1, VA68

1.2 Identify Naming Collisions

CRITICAL: Many gene names have collisions. Examples:

TRAG: T-cell regulatory gene vs bacterial TraG conjugation protein
WDR7-7: Could match gene WDR7 vs lncRNA
JAK: Janus kinase vs Just Another Kinase
CAT: Catalase vs chloramphenicol acetyltransferase

Detection strategy:

Search PubMed for "[SYMBOL]"[Title] – review first 20 titles
If >20% off-topic, identify collision terms
Build negative filter: NOT [collision_term1] NOT [collision_term2]

Output for report:

### Known Naming Collisions

- Symbol "ATP6V1A" is unambiguous (no major collisions detected)
- Related but distinct: ATP6V0A1-4 (V0 subunits vs V1 subunits)
- Search filter applied: Include "vacuolar" OR "V-ATPase", exclude "V0 domain" when V1-specific

1.3 Protein Architecture & Domains

Use annotation tools (not literature):

InterPro_get_protein_domains â Domain architecture
UniProt_get_ptm_processing_by_accession â PTMs, active sites
proteins_api_get_protein â Additional protein features

Output for report:

### Protein Architecture

| Domain | Position | InterPro ID | Function |
|--------|----------|-------------|----------|
| V-ATPase A subunit, N-terminal | 1-90 | IPR022879 | ATP binding |
| V-ATPase A subunit, catalytic | 91-490 | IPR005725 | Catalysis |
| V-ATPase A subunit, C-terminal | 491-617 | IPR022878 | Complex assembly |

**Length**: 617 aa | **Isoforms**: 2 (canonical P38606-1, variant P38606-2 missing aa 1-45)
**Active sites**: Lys-168 (ATP binding), Glu-261 (catalytic)

*Sources: InterPro, UniProt*

1.4 Subcellular Location

HPA_get_subcellular_location â Human Protein Atlas localization
UniProt_get_subcellular_location_by_accession â UniProt annotation

Output for report:

### Subcellular Localization

| Location | Confidence | Source |
|----------|------------|--------|
| Lysosome membrane | High | HPA + UniProt |
| Endosome membrane | High | UniProt |
| Golgi apparatus | Medium | HPA |
| Plasma membrane (subset) | Low | Literature |

**Primary location**: Lysosomal/endosomal membranes (vacuolar ATPase complex)
*Sources: Human Protein Atlas, UniProt*

1.5 Baseline Expression

GTEx_get_median_gene_expression â Tissue expression (TPM)
HPA_get_rna_expression_by_source â HPA expression data

Output for report:

### Baseline Tissue Expression

| Tissue | Expression (TPM) | Specificity |
|--------|------------------|-------------|
| Kidney cortex | 145.3 | Elevated |
| Liver | 98.7 | Medium |
| Brain - Cerebellum | 87.2 | Medium |
| Lung | 76.4 | Medium |
| Ubiquitous baseline | ~50 | Broad |

**Tissue Specificity**: Low (Ï = 0.28) - broadly expressed housekeeping gene
*Source: GTEx v8*

1.6 GO Terms & Pathway Placement

GO_get_annotations_for_gene â GO annotations
Reactome_map_uniprot_to_pathways â Reactome pathways
kegg_get_gene_info â KEGG pathways
OpenTargets_get_target_gene_ontology_by_ensemblID â Open Targets GO

Output for report:

### Functional Annotations (GO)

**Molecular Function**:
- ATP hydrolysis activity (GO:0016887) [Evidence: IDA]
- Proton-transporting ATPase activity (GO:0046961) [Evidence: IDA]

**Biological Process**:
- Lysosomal acidification (GO:0007041) [Evidence: IMP]
- Autophagy (GO:0006914) [Evidence: IMP]
- Bone resorption (GO:0045453) [Evidence: IMP]

**Cellular Component**:
- Vacuolar proton-transporting V-type ATPase, V1 domain (GO:0000221) [Evidence: IDA]

### Pathway Involvement

| Pathway | Database | Significance |
|---------|----------|--------------|
| Lysosome | KEGG hsa04142 | Core component |
| Phagosome | KEGG hsa04145 | Acidification |
| Autophagy - animal | Reactome R-HSA-9612973 | mTORC1 regulation |

*Sources: GO Consortium, Reactome, KEGG*

Phase 2: Literature Search (Internal Methodology)

NOTE: This methodology is kept internal. The report shows findings, not process.

2.1 Query Strategy: Collision-Aware Synonym Plan

Step 1: High-Precision Seed Queries (Build Mechanistic Core)

Query 1: "[GENE_SYMBOL]"[Title] AND (mechanism OR function OR structure)
Query 2: "[FULL_PROTEIN_NAME]"[Title] 
Query 3: "[UNIPROT_ID]" (catches supplementary materials)

Purpose: Get 15-30 high-confidence, mechanistic papers that are definitely on-target.

Step 2: Citation Network Expansion (Especially for Sparse Targets)

Once you have 5-15 core PMIDs:

PubMed_get_cited_by â Papers citing each seed
PubMed_get_related â Computationally related papers  
EuropePMC_get_citations â Alternative citation source
EuropePMC_get_references â Backward citations from seeds

Citation-network first option: For older targets with deprecated terminology, citation expansion often outperforms keyword searching.

Step 3: Collision-Filtered Broader Queries

Broader query: "[GENE_SYMBOL]" AND ([pathway1] OR [pathway2] OR [function])
Apply collision filter: NOT [collision_term1] NOT [collision_term2]

Example for bacterial TraG collision:

"TRAG" AND (T-cell OR immune OR cancer) NOT plasmid NOT conjugation NOT bacterial

2.2 Database Tools

Literature Search (use all relevant):

PubMed_search_articles – Primary biomedical
PMC_search_papers – Full-text
EuropePMC_search_articles – European coverage
openalex_literature_search – Broad academic
Crossref_search_works – DOI registry
SemanticScholar_search_papers – AI-ranked
BioRxiv_search_preprints / MedRxiv_search_preprints – Preprints

Citation Tools (with failure handling):

PubMed_get_cited_by – Primary (NCBI elink can be flaky)
EuropePMC_get_citations – Fallback when PubMed fails
PubMed_get_related – Related articles
EuropePMC_get_references – Reference lists

Annotation Tools (not literature, but fill gaps):

UniProt_* tools – Protein data
InterPro_get_protein_domains – Domains
GTEx_* tools – Expression
HPA_* tools – Human Protein Atlas
OpenTargets_* tools – Target-disease associations
GO_get_annotations_for_gene – GO terms

2.3 Full-Text Verification Strategy

WHEN TO USE: Abstracts lack critical experimental details (exact drugs, cell lines, concentrations, specific protocols).

Three-Tier Strategy:

Tier 1: Auto-Snippet Mode (Europe PMC) – FASTEST

Use for: Exploratory queries with 3-5 specific terms

results = EuropePMC_search_articles(
    query="bacterial antibiotic resistance evolution",
    limit=10,
    extract_terms_from_fulltext=["ciprofloxacin", "meropenem", "A. baumannii", "MIC"]
)

# Check which articles have full-text snippets
for article in results:
    if "fulltext_snippets" in article:
        # Snippets automatically extracted from OA full text
        for snippet in article["fulltext_snippets"]:
            # Use snippet["term"] and snippet["snippet"] for verification
            pass

Advantages:

â Single tool call (search + snippets)
â Bounded latency (max 3 OA articles, ~3-5 seconds total)
â No manual URL extraction
â Max 5 search terms

Limitations:

â Only works for OA articles with fullTextXML
â Limited to first 3 OA articles
â Europe PMC coverage only (~30-40% OA)

When to use: Initial exploration, quick verification of 1-2 papers

Tier 2: Manual Two-Step (Semantic Scholar, ArXiv) – TARGETED

Use for: Specific high-value papers you identified from search

# Step 1: Search
papers = SemanticScholar_search_papers(
    query="machine learning interpretability",
    limit=10
)

# Step 2: Extract from specific OA papers
for paper in papers:
    if paper.get("open_access_pdf_url"):
        snippets = SemanticScholar_get_pdf_snippets(
            open_access_pdf_url=paper["open_access_pdf_url"],
            terms=["SHAP", "gradient attribution", "layer-wise relevance"],
            window_chars=300
        )
        if snippets["status"] == "success":
            # Process snippets["snippets"]
            pass

ArXiv variant (100% OA, no paywall):

# All arXiv papers are freely available
snippets = ArXiv_get_pdf_snippets(
    arxiv_id="2301.12345",
    terms=["attention mechanism", "self-attention", "layer normalization"],
    max_snippets_per_term=5
)

Advantages:

â Full control over which papers to process
â Adjustable window size (20-2000 chars)
â Works for Semantic Scholar (~15-20% OA PDFs) and ArXiv (100%)
â Can process any number of papers

Limitations:

â Two tool calls per article (search â extract)
â Manual loop needed
â Slower than auto-snippet mode

When to use: Thorough review of key papers, preprint analysis

Tier 3: Manual Download + Parse (Fallback) – SLOWEST

Use for: Paywalled content via institutional access

# For paywalled PDFs accessible via institution
webpage_text = get_webpage_text_from_url(
    url="https://doi.org/10.1016/...",
    # Requires institutional proxy or VPN
)

# Extract relevant sections manually
if "Methods" in webpage_text:
    # Parse methods section
    pass

Limitations:

â Requires institutional access
â No snippet extraction (full HTML)
â Quality varies by publisher
â Slowest approach

When to use: Last resort for critical paywalled papers

Decision Matrix

Scenario	Recommended Tier	Rationale
Quick verification (“Which antibiotic?”)	Tier 1 (Auto-snippet)	Fast, single call
Preprint deep-dive (arXiv, bioRxiv)	Tier 2 (Manual ArXiv)	100% coverage, no paywall
High-value paper deep analysis	Tier 2 (Manual S2)	Precise control
Systematic review (50+ papers)	Tier 1 + Tier 2	Auto for OA, manual for key papers
Paywalled critical paper	Tier 3 (Manual download)	Only option

Best Practices

1. Limit search terms to 3-5 specific keywords:

â Good: ["ciprofloxacin 5 Î¼g/mL", "HEK293 cells", "RNA-seq"]
â Bad: ["drug", "method", "significant"] (too broad)

2. Check OA status before extraction:

if article.get("open_access") and article.get("fulltext_xml_url"):
    # Proceed with extraction
    pass

3. Adjust window size for context:

Methods: 400-500 chars (full sentences)
Quick verification: 150-200 chars
Default: 220 chars (balanced)

4. Handle failures gracefully:

if "fulltext_snippets" not in article:
    # Fallback: use abstract or skip
    print(f"No full text available: {article['title']}")

5. Document full-text sources in report:

## Methods Verification

**Antibiotic concentrations** (verified from full text):
- Study A: Ciprofloxacin 5 Î¼g/mL [PMC12345, Methods section]
- Study B: Meropenem 8 Î¼g/mL [arXiv:2301.12345, Experimental Design]

*Note: Full-text verification performed on 8/15 OA papers (53% coverage)*

2.5 Tool Failure Handling

Automatic retry strategy:

Attempt 1: Call tool
If timeout/error:
  Wait 2 seconds
  Attempt 2: Retry
If still fails:
  Wait 5 seconds  
  Attempt 3: Try fallback tool
If fallback fails:
  Document "Data unavailable" in report

Fallback chains:

Primary Tool	Fallback 1	Fallback 2
`PubMed_get_cited_by`	`EuropePMC_get_citations`	OpenAlex citations
`PubMed_get_related`	SemanticScholar recommendations	Manual keyword search
`GTEx_get_median_gene_expression`	`HPA_get_rna_expression_by_source`	Document as unavailable
`Unpaywall_check_oa_status`	Europe PMC OA flags	OpenAlex OA field

2.6 Open Access Handling (Best-Effort)

If Unpaywall email provided: Check OA status for all papers with DOIs

If no Unpaywall email: Use best-effort OA signals:

Europe PMC: isOpenAccess field
PMC: All PMC papers are OA
OpenAlex: is_oa field
DOAJ: All DOAJ papers are OA

Label in report:

*OA Status: Best-effort (Unpaywall not configured)*

Phase 3: Evidence Grading

CRITICAL: Grade every claim by evidence strength to prevent low-signal mentions from diluting the report.

Evidence Tiers

Tier	Label	Description	Example
T1	âââ Mechanistic	In-target mechanistic study with direct experimental evidence	CRISPR KO + rescue
T2	âââ Functional	Functional study showing role (may be in pathway context)	siRNA knockdown phenotype
T3	âââ Association	Screen hit, GWAS association, correlation	High-throughput screen
T4	âââ Mention	Review mention, text-mined interaction, peripheral reference	Review article

How to Apply

In report, label sections and claims:

### Mechanism of Action

ATP6V1A is the catalytic subunit responsible for ATP hydrolysis in the V-ATPase 
complex [âââ Mechanistic: PMID:12345678]. Loss-of-function mutations cause 
vacuolar pH dysregulation [âââ: PMID:23456789].

The target has been implicated in mTORC1 signaling through lysosomal amino acid 
sensing [âââ Functional: PMID:34567890], though direct interaction data is limited.

A genome-wide screen identified ATP6V1A as essential in cancer cell lines 
[âââ Association: PMID:45678901, DepMap].

Theme-Level Grading

For each theme section, summarize evidence quality:

### 3.1 Lysosomal Acidification (12 papers)
**Evidence Quality**: Strong (8 mechanistic, 3 functional, 1 association)

[Theme content...]

Report Structure: Mandatory Completeness Checklist

CRITICAL: This checklist/template applies to Full Deep-Research Mode. For Factoid / Verification Mode, use a short fact-check report (see Phase 0) and do not force the full 15-section template.

Output Files

[topic]_report.md – Main narrative report (Full Deep-Research Mode)
[topic]_factcheck_report.md – Short verification report (Factoid / Verification Mode)
[topic]_bibliography.json – Full deduplicated bibliography (always created)
methods_appendix.md – Methodology details (ONLY if user requests)

Report Template

# [TARGET/TOPIC]: Comprehensive Research Report

*Generated: [Date]*
*Evidence cutoff: [Date]*
*Total unique papers: [N]*

---

## Executive Summary

[2-3 paragraphs synthesizing key findings across all sections]

**Bottom Line**: [One-sentence actionable conclusion]

---

## 1. Target Identity & Aliases
*[MANDATORY - even for non-target topics, clarify scope]*

### 1.1 Official Identifiers
[Table of IDs or scope definition]

### 1.2 Synonyms and Aliases  
[List all known names - critical for complete literature coverage]

### 1.3 Known Naming Collisions
[Document collisions and how they were handled]

---

## 2. Protein Architecture
*[MANDATORY for protein targets; state "N/A - not a protein target" otherwise]*

### 2.1 Domain Structure
[Table of domains with positions, InterPro IDs]

### 2.2 Isoforms
[List isoforms, functional differences if known]

### 2.3 Key Structural Features
[Active sites, binding sites, PTMs]

### 2.4 Available Structures
[PDB entries, AlphaFold availability]

---

## 3. Complexes & Interaction Partners
*[MANDATORY]*

### 3.1 Known Complexes
[List complexes the protein participates in]

### 3.2 Direct Interactors
[Table of top interactors with evidence type and scores]

### 3.3 Functional Interaction Network
[Describe network context]

---

## 4. Subcellular Localization
*[MANDATORY]*

[Table of locations with confidence levels and sources]

---

## 5. Expression Profile
*[MANDATORY]*

### 5.1 Tissue Expression
[Table of top tissues with TPM values]

### 5.2 Cell-Type Expression
[If single-cell data available]

### 5.3 Disease-Specific Expression
[Expression changes in disease contexts]

---

## 6. Core Mechanisms
*[MANDATORY - this is the heart of the report]*

### 6.1 Molecular Function
[What the protein does biochemically]
**Evidence Quality**: [Strong/Moderate/Limited]

### 6.2 Biological Role
[Role in cellular/organismal context]
**Evidence Quality**: [Strong/Moderate/Limited]

### 6.3 Key Pathways
[Pathway involvement with evidence grades]

### 6.4 Regulation
[How the target is regulated]

---

## 7. Model Organism Evidence
*[MANDATORY]*

### 7.1 Mouse Models
[Knockout/knockin phenotypes, if any]

### 7.2 Other Model Organisms
[Yeast, fly, zebrafish, worm data if relevant]

### 7.3 Cross-Species Conservation
[Conservation and functional studies]

---

## 8. Human Genetics & Variants
*[MANDATORY]*

### 8.1 Constraint Scores
[pLI, LOEUF, missense Z - with interpretation]

### 8.2 Disease-Associated Variants
[ClinVar pathogenic variants]

### 8.3 Population Variants
[gnomAD notable variants]

### 8.4 GWAS Associations
[Any GWAS hits for the locus]

---

## 9. Disease Links
*[MANDATORY - include evidence strength]*

### 9.1 Strong Evidence (Genetic + Functional)
[Diseases with causal evidence]

### 9.2 Moderate Evidence (Association + Mechanism)
[Diseases with supporting evidence]

### 9.3 Weak Evidence (Association Only)
[Diseases with correlation/association only]

### 9.4 Evidence Summary Table

| Disease | Evidence Type | Score | Key Papers | Grade |
|---------|---------------|-------|------------|-------|
| [Disease 1] | Genetic + Functional | 0.85 | PMID:xxx | âââ |
| [Disease 2] | GWAS + Expression | 0.45 | PMID:yyy | âââ |

---

## 10. Pathogen Involvement
*[MANDATORY - state "None identified" if not applicable]*

### 10.1 Viral Interactions
[Any viral exploitation or targeting]

### 10.2 Bacterial Interactions
[Any bacterial relevance]

### 10.3 Host Defense Role
[Role in immune response if any]

---

## 11. Key Assays & Readouts
*[MANDATORY]*

### 11.1 Biochemical Assays
[Available assays for target activity]

### 11.2 Cellular Readouts
[Cell-based assays and phenotypes]

### 11.3 In Vivo Models
[Animal models and endpoints]

---

## 12. Research Themes
*[MANDATORY - structured theme extraction]*

### 12.1 [Theme 1 Name] (N papers)
**Evidence Quality**: [Strong/Moderate/Limited]
**Representative Papers**: [â¥3 papers or state "insufficient"]

[Theme description with evidence-graded citations]

### 12.2 [Theme 2 Name] (N papers)
[Same structure]

[Continue for all themes - require â¥3 representative papers per theme, or state "limited evidence"]

---

## 13. Open Questions & Research Gaps
*[MANDATORY]*

### 13.1 Mechanistic Unknowns
[What we don't understand about the target]

### 13.2 Therapeutic Unknowns
[What we don't know for drug development]

### 13.3 Suggested Priority Questions
[Ranked list of important unanswered questions]

---

## 14. Biological Model & Testable Hypotheses
*[MANDATORY - synthesis section]*

### 14.1 Integrated Biological Model
[3-5 paragraph synthesis integrating all evidence into coherent model]

### 14.2 Testable Hypotheses

| # | Hypothesis | Perturbation | Readout | Expected Result | Priority |
|---|------------|--------------|---------|-----------------|----------|
| 1 | [Hypothesis] | [Experiment] | [Measure] | [Prediction] | HIGH |
| 2 | [Hypothesis] | [Experiment] | [Measure] | [Prediction] | HIGH |
| 3 | [Hypothesis] | [Experiment] | [Measure] | [Prediction] | MEDIUM |

### 14.3 Suggested Experiments
[Brief description of key experiments to test hypotheses]

---

## 15. Conclusions & Recommendations
*[MANDATORY]*

### 15.1 Key Takeaways
[Bullet points of most important findings]

### 15.2 Confidence Assessment
[Overall confidence in the findings: High/Medium/Low with justification]

### 15.3 Recommended Next Steps
[Prioritized action items]

---

## References

*[Summary reference list in report - full bibliography in separate file]*

### Key Papers (Must-Read)
1. [Citation with PMID] - [Why important] [Grade: âââ]
2. ...

### By Theme
[Organized reference lists]

---

## Data Limitations

- [Any databases that failed or returned no data]
- [Any known gaps in coverage]
- [OA status method used]

*Full methodology available in methods_appendix.md upon request.*

Bibliography File Format

File: [topic]_bibliography.json

{
  "metadata": {
    "generated": "2026-02-04",
    "query": "ATP6V1A",
    "total_papers": 342,
    "unique_after_dedup": 287
  },
  "papers": [
    {
      "pmid": "12345678",
      "doi": "10.1038/xxx",
      "title": "Paper Title",
      "authors": ["Smith A", "Jones B"],
      "year": 2024,
      "journal": "Nature",
      "source_databases": ["PubMed", "OpenAlex"],
      "evidence_tier": "T1",
      "themes": ["lysosomal_acidification", "autophagy"],
      "oa_status": "gold",
      "oa_url": "https://...",
      "citation_count": 45,
      "in_core_set": true
    }
  ]
}

Also generate [topic]_bibliography.csv with same data in tabular format.

Theme Extraction Protocol

Standardized Theme Clustering

Extract keywords from titles and abstracts
Cluster into themes using semantic similarity
Require minimum N papers per theme (default N=3)
Label themes with standardized names

Standard Theme Categories (adapt to target)

For V-ATPase target example:

lysosomal_acidification – Core function
autophagy_regulation – mTORC1 signaling
bone_resorption – Osteoclast function
cancer_metabolism – Tumor acidification
viral_infection – Viral entry mechanism
neurodegenerative – Neuronal dysfunction
kidney_function – Renal acid-base
methodology – Assays/tools papers

Theme Quality Requirements

Papers	Theme Status
â¥10	Major theme (full section)
3-9	Minor theme (subsection)
<3	Insufficient (note in “limited evidence” or merge)

Completeness Checklist (Verify Before Delivery)

ALL boxes must be checked or explicitly marked “N/A” or “Limited evidence”

Identity & Context

Official identifiers resolved (UniProt, Ensembl, NCBI, ChEMBL)
All synonyms/aliases documented
Naming collisions identified and handled
Protein architecture described (or N/A stated)
Subcellular localization documented
Baseline expression profile included

Mechanism & Function

Core mechanism section with evidence grades
Pathway involvement documented
Model organism evidence (or “none found”)
Complexes/interaction partners listed
Key assays/readouts described

Disease & Clinical

Human genetic variants documented
Constraint scores with interpretation
Disease links with evidence strength grades
Pathogen involvement (or “none identified”)

Synthesis

Research themes clustered with â¥3 papers each (or noted as limited)
Open questions/gaps articulated
Biological model synthesized
â¥3 testable hypotheses with experiments
Conclusions with confidence assessment

Technical

All claims have source attribution
Evidence grades applied throughout
Bibliography file generated
Data limitations documented

Quick Reference: Tool Categories

Literature Tools

PubMed_search_articles, PMC_search_papers, EuropePMC_search_articles, openalex_literature_search, Crossref_search_works, SemanticScholar_search_papers, BioRxiv_search_preprints, MedRxiv_search_preprints

Citation Tools

PubMed_get_cited_by, PubMed_get_related, EuropePMC_get_citations, EuropePMC_get_references

Protein/Gene Annotation Tools

UniProt_get_entry_by_accession, UniProt_search, UniProt_id_mapping, InterPro_get_protein_domains, proteins_api_get_protein

Expression Tools

GTEx_get_median_gene_expression, GTEx_get_gene_expression, HPA_get_rna_expression_by_source, HPA_get_comprehensive_gene_details_by_ensembl_id, HPA_get_subcellular_location

Variant/Disease Tools

gnomad_get_gene_constraints, gnomad_get_gene, clinvar_search_variants, OpenTargets_get_diseases_phenotypes_by_target_ensembl

Pathway Tools

GO_get_annotations_for_gene, Reactome_map_uniprot_to_pathways, kegg_get_gene_info, OpenTargets_get_target_gene_ontology_by_ensemblID

Interaction Tools

STRING_get_protein_interactions, intact_get_interactions, OpenTargets_get_target_interactions_by_ensemblID

OA Tools

Unpaywall_check_oa_status (if email provided), or use OA flags from Europe PMC/OpenAlex

Communication with User

During research (brief updates):

“Resolving target identifiers and gathering baseline profile…”
“Building core paper set with high-precision queries…”
“Expanding via citation network…”
“Clustering into themes and grading evidence…”

When the question looks like a factoid:

Ask (once) if the user wants just the verified answer or a full deep-research report.
If the user doesnât specify, default to Factoid / Verification Mode and keep it short + source-backed.

DO NOT expose:

Raw tool outputs
Deduplication counts
Search round details
Database-by-database results

The report is the deliverable. Methodology stays internal.

Summary

This skill produces comprehensive, evidence-graded research reports that:

Start with disambiguation to prevent naming collisions and missing details
Use annotation tools to fill gaps when literature is sparse
Grade all evidence to separate signal from noise
Require completeness even if stating “limited evidence”
Synthesize into biological models with testable hypotheses
Separate narrative from bibliography for scalability
Keep methodology internal unless explicitly requested

The result is a detailed, actionable research report that reads like an expert synthesis, not a search log.

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台