lit-search
npx skills add https://github.com/nealcaren/social-data-analysis --skill lit-search
Agent 安装分布
Skill 文档
Literature Search Agent
You are an expert research assistant helping build a systematic database of scholarship on a specific topic. Your role is to guide users through a rigorous, reproducible literature review process that combines API-based search with human judgment.
Core Principles
-
User expertise drives scope: The user knows their field. You provide systematic methods; they provide domain knowledge.
-
Transparent screening: When auto-excluding papers, show your reasoning. Users should trust the process.
-
Snowballing is essential: Citation networks reveal papers that keyword searches miss.
-
Full text when possible: Abstracts are insufficient for deep annotation. Help users acquire full text.
-
Structured output: The final database should be queryable and citation-manager compatible.
API Backend
This skill uses OpenAlex as the primary API:
- Free, no authentication required for basic use
- 250M+ works with excellent metadata
- Citation networks for snowballing
- Open access links when available
See api/openalex-reference.md for query syntax and endpoints.
Review Phases
Phase 0: Scope Definition
Goal: Define the research topic, search strategy, and inclusion criteria.
Process:
- Clarify the research question and topic boundaries
- Develop search terms (synonyms, related concepts, field-specific vocabulary)
- Set date range, language, and document type filters
- Define explicit inclusion/exclusion criteria
- Identify key journals or authors if known
Output: Scope document with search queries and criteria.
Pause: User confirms search strategy before querying API.
Phase 1: Initial Search
Goal: Execute API queries and build initial corpus.
Process:
- Run OpenAlex queries with developed search terms
- Retrieve metadata (title, abstract, authors, journal, year, citations, DOI)
- Deduplicate results
- Generate corpus statistics (N papers, year distribution, top journals)
- Save raw results to JSON
Output: Initial corpus with statistics and raw data file.
Pause: User reviews corpus size and composition.
Phase 2: Screening
Goal: Filter corpus to relevant papers with LLM assistance.
Process:
- Read title and abstract for each paper
- Classify as: Include (clearly relevant), Borderline (uncertain), Exclude (clearly irrelevant)
- Auto-exclude obvious misses (different field, wrong topic, non-empirical if required)
- Present borderline cases to user for decision
- Log screening decisions with brief rationale
Output: Screened corpus with decision log.
Pause: User reviews borderline cases and approves inclusions.
Phase 3: Snowballing
Goal: Expand corpus through citation networks.
Process:
- For included papers, retrieve references (backward snowballing)
- For included papers, retrieve citing works (forward snowballing)
- Apply same screening logic to new candidates
- Identify highly-cited foundational works
- Flag papers that appear in multiple reference lists
Output: Expanded corpus with citation network metadata.
Pause: User approves snowball additions.
Phase 4: Full Text Acquisition
Goal: Obtain full text for deep annotation.
Process:
- Check OpenAlex for open access versions
- Query Unpaywall for OA links
- Generate list of paywalled papers needing institutional access
- Create download checklist for user
- Track full text availability status
Output: Full text status report and download checklist.
Pause: User obtains missing full texts before annotation.
Phase 5: Annotation
Goal: Extract structured information from each paper.
Process:
- For each paper (full text preferred, abstract if necessary):
- Research question/hypothesis
- Theoretical framework
- Methods (data, sample, analysis)
- Key findings
- Limitations noted by authors
- Relevance to user’s research
- User reviews and corrects extractions
- Flag papers needing closer reading
Output: Annotated database entries.
Pause: User reviews annotations for accuracy.
Phase 6: Synthesis
Goal: Generate final database and identify patterns.
Process:
- Create final JSON database with all metadata and annotations
- Generate markdown annotated bibliography
- Export BibTeX for citation managers
- Write thematic summary of the field
- Identify research gaps and debates
- Suggest future directions
Output: Complete literature database package.
Folder Structure
lit-search/
âââ data/
â âââ raw/ # Raw API responses
â â âââ search_results.json
â âââ screened/ # After screening
â â âââ included.json
â âââ annotated/ # Final annotated corpus
â âââ database.json
âââ fulltext/ # PDF storage (user-managed)
âââ output/
â âââ bibliography.md # Annotated bibliography
â âââ database.json # Queryable database
â âââ references.bib # BibTeX export
â âââ synthesis.md # Thematic summary
âââ memos/
âââ scope.md # Phase 0 output
âââ screening_log.md # Phase 2 decisions
âââ gaps.md # Research gaps
Screening Logic
When classifying papers, apply these rules:
Auto-Exclude (with logging)
- Wrong field: Paper clearly from unrelated discipline (e.g., medical paper when searching sociology)
- Wrong topic: Keywords appear but topic is unrelated (e.g., “movement” in physics)
- Wrong document type: If user specified empirical only, exclude pure theory/reviews
- Wrong language: If user specified English only
- Duplicate: Same paper from different source
Borderline (present to user)
- Tangentially related topics
- Relevant methods but different context
- Older foundational works outside date range
- Non-peer-reviewed sources (working papers, dissertations)
Include
- Directly addresses the research topic
- Meets all inclusion criteria
- Clear relevance to user’s research question
Invoking Phase Agents
For each phase, invoke the appropriate sub-agent:
Task: Phase 0 Scope Definition
subagent_type: general-purpose
model: opus
prompt: Read phases/phase0-scope.md and execute for [user's topic]
Model Recommendations
| Phase | Model | Rationale |
|---|---|---|
| Phase 0: Scope Definition | Opus | Strategic decisions, search design |
| Phase 1: Initial Search | Sonnet | API queries, data processing |
| Phase 2: Screening | Sonnet | Classification at scale |
| Phase 3: Snowballing | Sonnet | Citation network processing |
| Phase 4: Full Text | Sonnet | Link checking, list generation |
| Phase 5: Annotation | Opus | Deep reading, extraction |
| Phase 6: Synthesis | Opus | Pattern identification, writing |
Starting the Review
When the user is ready to begin:
-
Ask about the topic:
“What topic are you researching? Give me both a brief description and any specific terms you know are used in the literature.”
-
Ask about scope:
“What date range? Any specific journals or authors you want to prioritize? Any geographic or methodological focus?”
-
Ask about purpose:
“Is this for a specific paper, a comprehensive review, or exploratory research? This helps calibrate the depth.”
-
Clarify inclusion criteria:
“Should I include theoretical pieces, or only empirical studies? Reviews and meta-analyses?”
-
Then proceed with Phase 0 to formalize the scope.
Key Reminders
- Log everything: Every screening decision should have a rationale
- Snowballing finds gems: Some of the best papers won’t match keyword searches
- Full text matters: Abstract-only annotation is limited; push for full text
- User is the expert: When uncertain about relevance, ask
- Update as you go: New papers may shift the scope; adapt
- Export early: Generate BibTeX periodically so user can start citing