deep-research
npx skills add https://github.com/lingzhi227/claude-skills --skill deep-research
Agent 安装分布
Skill 文档
Deep Research Skill
Trigger
Activate this skill when the user wants to:
- “Research a topic”, “literature review”, “find papers about”, “survey papers on”
- “Deep dive into [topic]”, “what’s the state of the art in [topic]”
- Uses
/research <topic>slash command
Overview
This skill conducts systematic academic literature reviews in 6 phases, producing structured notes, a curated paper database, and a synthesized final report. Output is organized by phase for clarity.
Installation: ~/.claude/skills/deep-research/ â scripts, references, and this skill definition.
Output: .//Users/lingzhi/Code/deep-research-output/{slug}/ relative to the current working directory.
CRITICAL: Strict Sequential Phase Execution
You MUST execute all 6 phases in strict order: 1 â 2 â 3 â 4 â 5 â 6. NEVER skip any phase.
This is the single most important rule of this skill. Violations include:
- â Jumping from Phase 2 to Phase 5/6 (skipping Deep Dive and Code)
- â Writing synthesis or report before completing Phase 3 deep reading
- â Producing a final report based only on abstracts/titles from search results
- â Combining or merging phases (e.g., doing “Phase 3-5 together”)
Phase Gate Protocol
Before starting Phase N+1, you MUST verify that Phase N’s required output files exist on disk. If they don’t exist, you have NOT completed that phase.
| Phase | Gate: Required Output Files |
|---|---|
| 1 â 2 | phase1_frontier/frontier.md exists AND contains â¥10 papers |
| 2 â 3 | phase2_survey/survey.md exists AND paper_db.jsonl has 35-80 papers |
| 3 â 4 | phase3_deep_dive/selection.md AND phase3_deep_dive/deep_dive.md exist AND deep_dive.md contains detailed notes for â¥8 papers |
| 4 â 5 | phase4_code/code_repos.md exists AND contains â¥3 repositories |
| 5 â 6 | phase5_synthesis/synthesis.md AND phase5_synthesis/gaps.md exist |
After completing each phase, print a phase completion checkpoint:
â
Phase N complete. Output: [list files written]. Proceeding to Phase N+1.
Why Every Phase Matters
- Phase 3 (Deep Dive) is where you actually READ papers â without it, your synthesis is superficial and based only on abstracts
- Phase 4 (Code & Tools) grounds the research in practical implementations â without it, you miss the open-source ecosystem
- Phase 5 (Synthesis) requires deep knowledge from Phase 3 â you cannot synthesize papers you haven’t read
- Phase 6 (Report) assembles content from ALL prior phases â it should cite specific findings from Phase 3 notes
Paper Quality Policy
Peer-reviewed conference papers take priority over arXiv preprints. Many arXiv papers have not undergone peer review and may contain unverified claims.
Source Priority (highest to lowest)
- Top AI conferences: NeurIPS, ICLR, ICML, ACL, EMNLP, NAACL, AAAI, IJCAI, CVPR, KDD, CoRL
- Peer-reviewed journals: JMLR, TACL, Nature, Science, etc.
- Workshop papers: NeurIPS/ICML workshops (lower bar but still reviewed)
- arXiv preprints with high citations: Likely high-quality but unverified
- Recent arXiv preprints: Use cautiously, note “preprint” status explicitly
When to Use arXiv Papers
- As supplementary evidence alongside peer-reviewed work
- For very recent results (< 3 months old) not yet at conferences
- When a peer-reviewed version doesn’t exist yet â note
(preprint)in citations - For survey/review papers (these are useful even without peer review)
Search Tools (by priority)
1. paper_finder (primary â conference papers only)
Location: /Users/lingzhi/Code/documents/tool/paper_finder/paper_finder.py
Searches ai-paper-finder.info (HuggingFace Space) for published conference papers. Supports filtering by conference + year. Outputs JSONL with BibTeX.
python /Users/lingzhi/Code/documents/tool/paper_finder/paper_finder.py --mode scrape --config <config.yaml>
python /Users/lingzhi/Code/documents/tool/paper_finder/paper_finder.py --mode download --jsonl <results.jsonl>
python /Users/lingzhi/Code/documents/tool/paper_finder/paper_finder.py --list-venues
Config example:
searches:
- query: "long horizon reasoning agent"
num_results: 100
venues:
neurips: [2024, 2025]
iclr: [2024, 2025, 2026]
icml: [2024, 2025]
output:
root: /Users/lingzhi/Code/deep-research-output/{slug}/phase1_frontier/search_results
overwrite: true
2. search_semantic_scholar.py (supplementary â citation data + broader coverage)
Location: /Users/lingzhi/.claude/skills/deep-research/scripts/search_semantic_scholar.py
Supports --peer-reviewed-only and --top-conferences filters. API key: /Users/lingzhi/Code/keys.md (field S2_API_Key)
3. search_arxiv.py (supplementary â latest preprints)
Location: /Users/lingzhi/.claude/skills/deep-research/scripts/search_arxiv.py
For searching recent papers not yet published at conferences. Mark citations with (preprint).
Other Scripts
| Script | Location | Key Flags |
|---|---|---|
download_papers.py |
~/.claude/skills/deep-research/scripts/ |
--jsonl, --output-dir, --max-downloads, --sort-by-citations |
extract_pdf.py |
~/.claude/skills/deep-research/scripts/ |
--pdf, --pdf-dir, --output-dir, --sections-only |
paper_db.py |
~/.claude/skills/deep-research/scripts/ |
subcommands: merge, search, filter, tag, stats, add, export |
bibtex_manager.py |
~/.claude/skills/deep-research/scripts/ |
--jsonl, --output, --keys-only |
compile_report.py |
~/.claude/skills/deep-research/scripts/ |
--topic-dir |
WebFetch Mode (no Bash)
- Paper discovery:
WebSearch+WebFetchto query Semantic Scholar/arXiv APIs - Paper reading:
WebFetchon ar5iv HTML orReadtool on downloaded PDFs - Writing:
Writetool for JSONL, notes, report files
6-Phase Workflow
Phase 1: Frontier
Search the latest conference proceedings and preprints to understand current trends.
- Write
phase1_frontier/paper_finder_config.yamltargeting latest 1-2 years - Run paper_finder scrape
- WebSearch for latest accepted paper lists
- Identify trending directions, key breakthroughs
â Output:
phase1_frontier/frontier.md,phase1_frontier/search_results/
Phase 2: Survey
Build a comprehensive landscape with broader time range. Target 35-80 papers after filtering.
- Write
phase2_survey/paper_finder_config.yamlcovering 2023-2025 - Run paper_finder + Semantic Scholar + arXiv
- Merge all results:
python /Users/lingzhi/.claude/skills/deep-research/scripts/paper_db.py merge - Filter to 35-80 most relevant:
python /Users/lingzhi/.claude/skills/deep-research/scripts/paper_db.py filter --min-score 0.80 --max-papers 70 - Cluster by theme, write survey notes
â Output:
phase2_survey/survey.md,phase2_survey/search_results/,paper_db.jsonl
Phase 3: Deep Dive â ï¸ DO NOT SKIP
This phase is MANDATORY. You must actually READ 8-15 full papers, not just their abstracts.
- Select 8-15 papers from paper_db.jsonl with rationale â write
phase3_deep_dive/selection.md - Download PDFs:
python download_papers.py --jsonl paper_db.jsonl --output-dir phase3_deep_dive/papers/ --sort-by-citations --max-downloads 15 - For EACH selected paper, read the full text (PDF via
Reador HTML viaWebFetchon ar5iv) - Write detailed structured notes per paper (see note-format.md template): problem, contributions, methodology, experiments, limitations, connections
- Write ALL notes â
phase3_deep_dive/deep_dive.md
Phase 3 Gate: deep_dive.md must contain detailed notes for â¥8 papers, each with methodology and experiment sections filled in. Abstract-only summaries do NOT count.
â Output: phase3_deep_dive/selection.md, phase3_deep_dive/deep_dive.md, phase3_deep_dive/papers/
Phase 4: Code & Tools â ï¸ DO NOT SKIP
This phase is MANDATORY. You must survey the open-source ecosystem.
- Extract GitHub URLs from papers read in Phase 3
- WebSearch for implementations: “site:github.com {method name}”, “site:paperswithcode.com {topic}”
- For each repo found: record URL, stars, language, last updated, documentation quality
- Search for related benchmarks and datasets
- Write â
phase4_code/code_repos.md(must contain â¥3 repositories)
Phase 4 Gate: code_repos.md must exist and contain at least 3 repositories with metadata.
â Output: phase4_code/code_repos.md
Phase 5: Synthesis (REQUIRES Phase 3 + 4 complete)
Cross-paper analysis. Weight peer-reviewed findings higher. This phase MUST build on the detailed notes from Phase 3 and the code landscape from Phase 4. Taxonomy, comparative tables, gap analysis.
Before starting: Verify phase3_deep_dive/deep_dive.md and phase4_code/code_repos.md exist. If not, go back and complete those phases first.
â Output: phase5_synthesis/synthesis.md, phase5_synthesis/gaps.md
Phase 6: Compilation (REQUIRES Phase 1-5 complete)
Assemble final report from ALL prior phase outputs. Mark preprint citations with (preprint) suffix.
Before starting: Verify ALL phase outputs exist:
phase1_frontier/frontier.mdphase2_survey/survey.mdphase3_deep_dive/deep_dive.mdphase4_code/code_repos.mdphase5_synthesis/synthesis.md+gaps.md
If ANY are missing, go back and complete the missing phase(s) first.
â Output: phase6_report/report.md, phase6_report/references.bib
Output Directory
output/{topic-slug}/
âââ paper_db.jsonl # Master database (accumulated)
âââ phase1_frontier/
â âââ paper_finder_config.yaml
â âââ search_results/
â âââ frontier.md
âââ phase2_survey/
â âââ paper_finder_config.yaml
â âââ search_results/
â âââ survey.md
âââ phase3_deep_dive/
â âââ papers/
â âââ selection.md
â âââ deep_dive.md
âââ phase4_code/
â âââ code_repos.md
âââ phase5_synthesis/
â âââ synthesis.md
â âââ gaps.md
âââ phase6_report/
âââ report.md
âââ references.bib
Key Conventions
- Paper IDs: Use
arxiv_idwhen available, otherwise Semantic ScholarpaperId - Citations:
[@key]format, key = firstAuthorYearWord (e.g.,[@vaswani2017attention]) - JSONL schema: title, authors, abstract, year, venue, venue_normalized, peer_reviewed, citationCount, paperId, arxiv_id, pdf_url, tags, source
- Preprint marking: Always note
(preprint)when citing non-peer-reviewed work - Incremental saves: Each phase writes to disk immediately
- Paper count: Target 35-80 papers in final paper_db.jsonl (use
paper_db.py filter)
References
/Users/lingzhi/.claude/skills/deep-research/references/workflow-phases.mdâ Detailed 6-phase methodology/Users/lingzhi/.claude/skills/deep-research/references/note-format.mdâ Note templates, BibTeX format, report structure/Users/lingzhi/.claude/skills/deep-research/references/api-reference.mdâ arXiv, Semantic Scholar, ar5iv API guide
Related Skills
- Downstream: literature-search, literature-review, citation-management
- See also: novelty-assessment, survey-generation