tech-docs-research
npx skills add https://github.com/puwenyin/skills --skill tech-docs-research
Agent 安装分布
Skill 文档
Tech Docs Research
Systematic workflow for researching technical documentation using Firecrawl’s mapping and scraping capabilities.
Workflow
Follow this 5-step process for comprehensive documentation research:
Step 1: Create Research Directory
Create a timestamped directory for this research session:
# Generate timestamp and topic-based folder name
TIMESTAMP=$(date +"%Y_%m_%d_%H_%M_%S")
TOPIC="react-hooks" # Replace with sanitized topic name (lowercase, hyphens instead of spaces)
RESEARCH_DIR=".firecrawl/${TIMESTAMP}_${TOPIC}"
# Create the research directory structure
mkdir -p "$RESEARCH_DIR/pages"
echo "Research directory: $RESEARCH_DIR"
Folder naming rules:
- Format:
YYYY_MM_DD_HH_mm_ss_<topic> - Topic should be lowercase, use hyphens for spaces
- Examples:
2026_02_08_14_30_45_react-hooks,2026_02_08_15_20_10_nextjs-routing
Step 2: Identify Documentation Base URL
Ask the user for the base documentation URL if not provided, or infer from the technology name:
# Examples:
# React: https://react.dev
# Next.js: https://nextjs.org/docs
# Python: https://docs.python.org
# FastAPI: https://fastapi.tiangolo.com
Step 3: Map Documentation Structure
Use firecrawl map to discover all documentation URLs, filtering by the user’s topic:
# Basic mapping with search filter
firecrawl map https://docs.example.com --search "authentication" -o "$RESEARCH_DIR/docs-urls.txt"
# For comprehensive research (more URLs)
firecrawl map https://docs.example.com --search "api" --limit 100 -o "$RESEARCH_DIR/docs-urls.json" --json
# Include subdomains if documentation spans multiple domains
firecrawl map https://example.com --include-subdomains --search "guides" -o "$RESEARCH_DIR/all-docs.txt"
Key points:
- Use
--searchto filter URLs by topic keywords - Output as JSON (
--json) for easier processing - Adjust
--limitbased on scope (default: all URLs found) - Review the mapped URLs before scraping to ensure relevance
Step 4: Scrape Documentation in Parallel
Extract URLs from the map output and scrape them in parallel:
# Check concurrency limit first
firecrawl --status
# Extract URLs from JSON and scrape in parallel (example for 5 URLs)
jq -r '.urls[]' "$RESEARCH_DIR/docs-urls.json" | head -5 | while read url; do
filename=$(echo "$url" | sed 's|https://||' | sed 's|/|_|g')
firecrawl scrape "$url" --only-main-content -o "$RESEARCH_DIR/pages/${filename}.md" &
done
wait
# Or use xargs for better parallel control (adjust -P based on concurrency limit)
jq -r '.urls[]' "$RESEARCH_DIR/docs-urls.json" | head -10 | \
xargs -P 10 -I {} sh -c 'firecrawl scrape "{}" --only-main-content -o "'"$RESEARCH_DIR"'/pages/$(echo {} | md5sum | cut -d\" \" -f1).md"'
Best practices:
- Always check
firecrawl --statusfor concurrency limits - Use
--only-main-contentto remove navigation and boilerplate - Organize scraped pages in
$RESEARCH_DIR/pages/subdirectory - Use meaningful filenames or hash-based names for URLs
- Scrape incrementally for large documentation sets (10-20 pages at a time)
Step 5: Analyze and Summarize
Read the scraped documentation incrementally and generate a structured summary:
# Check what was scraped
ls -lh "$RESEARCH_DIR/pages/"
# Preview first file to understand structure
head -50 "$RESEARCH_DIR/pages/"[first-file].md
# Use grep to find specific information across all files
grep -r "example" "$RESEARCH_DIR/pages/" | head -20
grep -r "configuration" "$RESEARCH_DIR/pages/" -A 5
Summary structure:
Generate a summary following this template:
IMPORTANT:
- Save to both locations:
- Save the final summary to
$RESEARCH_DIR/summary.md(for archival) - Also save to
docs/directory for easy access (e.g.,docs/react-hooks-research.md)
- Save the final summary to
- Include source URLs: Every finding, code example, and key point MUST include the source documentation URL for traceability and reference
- Preserve research directory: The
$RESEARCH_DIRfolder contains all raw scraped pages, URLs, and the summary for future reference - Use Mermaid diagrams: When documenting processes, workflows, or relationships, use Mermaid diagrams (flowcharts, sequence diagrams, etc.) to make complex concepts visual and easier to understand
# [Technology/Topic] Documentation Research
**Date**: [YYYY-MM-DD]
**Research Scope**: [Brief description of what was researched]
**Pages Analyzed**: [Number of documentation pages]
## Overview
[Brief description of what was researched and total pages analyzed]
## Key Findings
### [Topic Area 1]
- **Main concept**: [explanation]
- **Key points**:
- [point 1] ([Source URL])
- [point 2] ([Source URL])
- **Code example**:
```[language]
[code snippet]
Source: [URL to documentation page]
[Topic Area 2]
- Summary: [explanation] ([Source URL])
- Details:
- [finding with source link]
- [finding with source link]
Process Flow (Use Mermaid diagrams when applicable)
Authentication Flow Example
sequenceDiagram
participant Client
participant Server
participant Database
Client->>Server: POST /login {username, password}
Server->>Database: Query user credentials
Database-->>Server: User data
Server->>Server: Validate password
Server-->>Client: JWT token
Client->>Server: API request with token
Server->>Server: Verify token
Server-->>Client: Protected resource
Source: [URL to authentication documentation]
Request Lifecycle Example
flowchart TD
A[Incoming Request] --> B{Middleware}
B -->|Authenticated| C[Route Handler]
B -->|Not Authenticated| D[Return 401]
C --> E{Validation}
E -->|Valid| F[Process Request]
E -->|Invalid| G[Return 400]
F --> H[Query Database]
H --> I[Transform Data]
I --> J[Return Response]
Source: [URL to request handling documentation]
State Machine Example
stateDiagram-v2
[*] --> Idle
Idle --> Loading: startFetch()
Loading --> Success: onSuccess()
Loading --> Error: onError()
Success --> Idle: reset()
Error --> Idle: reset()
Error --> Loading: retry()
Source: [URL to state management documentation]
When to use Mermaid diagrams:
- Sequence Diagrams: API calls, authentication flows, multi-step processes, component interactions
- Flowcharts: Decision trees, request lifecycle, data processing pipelines, workflow logic
- State Diagrams: Component states, application lifecycle, form validation states
- Class/ER Diagrams: Data models, database schemas, type relationships
- Gantt Charts: Migration timelines, deprecation schedules
Common Patterns
[Recurring themes, best practices, or conventions found across documentation]
- Pattern 1 – [Description] ([Source URLs])
- Pattern 2 – [Description] ([Source URLs])
Important Notes
[Warnings, deprecations, or critical information highlighted in the docs]
- Warning: [description] – See: [URL]
- Deprecation: [description] – See: [URL]
Documentation Resources
- Page Title 1 – [Brief description]
- Page Title 2 – [Brief description]
- API Reference
- Tutorial/Guide
Next Steps
[Suggested actions based on findings]
Output saved to:
- Research archive:
$RESEARCH_DIR/summary.md - Quick access:
docs/[filename].mdResearch conducted using: Firecrawl mapping and scraping
**Saving the summary:**
```bash
# Save summary to both locations
cat > "$RESEARCH_DIR/summary.md" << 'EOF'
[Your generated summary content here]
EOF
# Copy to docs/ for easy access
cp "$RESEARCH_DIR/summary.md" "docs/${TOPIC}-research.md"
echo "Research completed!"
echo "Archive: $RESEARCH_DIR"
echo "Summary: docs/${TOPIC}-research.md"
Reading strategy:
- Don’t load entire files at onceâuse
head,grep, or incremental reads - Focus on sections relevant to the user’s question
- Extract code examples, configuration patterns, and API signatures
- Cross-reference information across multiple pages
- Identify process flows and workflows that would benefit from visual diagrams
Visualization best practices:
- Use Mermaid diagrams to illustrate complex processes, flows, and relationships
- Always include source URLs for each diagram to trace back to the documentation
- Common diagram types:
sequenceDiagram: API interactions, authentication flows, multi-service communicationflowchart: Decision logic, request handling, data pipelinesstateDiagram-v2: Component lifecycle, application states, form flowsclassDiagram: Type hierarchies, data models, interface relationshipserDiagram: Database schemas, entity relationshipsgantt: Project timelines, migration schedules, deprecation roadmaps
Directory structure after completion:
.firecrawl/
âââ 2026_02_08_14_30_45_react-hooks/
âââ docs-urls.json # Mapped URLs
âââ pages/ # Scraped documentation
â âââ page1.md
â âââ page2.md
â âââ ...
âââ summary.md # Research summary
docs/
âââ react-hooks-research.md # Copy for quick access
Advanced Patterns
Multi-Site Research
Research across multiple documentation sources:
# Create research directory for multi-site research
TIMESTAMP=$(date +"%Y_%m_%d_%H_%M_%S")
TOPIC="react-nextjs-comparison"
RESEARCH_DIR=".firecrawl/${TIMESTAMP}_${TOPIC}"
mkdir -p "$RESEARCH_DIR/pages"
# Map multiple sites
firecrawl map https://docs.react.dev --search "hooks" -o "$RESEARCH_DIR/react-urls.json" --json &
firecrawl map https://nextjs.org/docs --search "routing" -o "$RESEARCH_DIR/nextjs-urls.json" --json &
wait
# Combine and scrape
cat "$RESEARCH_DIR/react-urls.json" "$RESEARCH_DIR/nextjs-urls.json" | \
jq -r '.urls[]' | \
xargs -P 10 -I {} sh -c 'firecrawl scrape "{}" --only-main-content -o "'"$RESEARCH_DIR"'/pages/$(echo {} | md5sum | cut -d\" \" -f1).md"'
Topic-Focused Deep Dive
When researching a specific API or feature:
# 1. Create research directory
TIMESTAMP=$(date +"%Y_%m_%d_%H_%M_%S")
TOPIC="useeffect-deep-dive"
RESEARCH_DIR=".firecrawl/${TIMESTAMP}_${TOPIC}"
mkdir -p "$RESEARCH_DIR/pages"
# 2. Search for the exact topic
firecrawl map https://docs.example.com --search "useEffect hook" -o "$RESEARCH_DIR/topic-urls.json" --json
# 3. Scrape with additional context (include related sections)
jq -r '.urls[]' "$RESEARCH_DIR/topic-urls.json" | \
xargs -P 5 -I {} firecrawl scrape "{}" -o "$RESEARCH_DIR/pages/$(basename {}).md"
# 4. Extract all code examples
grep -r "```" "$RESEARCH_DIR/pages/" -A 10 > "$RESEARCH_DIR/code-examples.txt"
Version-Specific Research
Compare documentation across versions:
# Create research directory
TIMESTAMP=$(date +"%Y_%m_%d_%H_%M_%S")
TOPIC="framework-v4-to-v5-migration"
RESEARCH_DIR=".firecrawl/${TIMESTAMP}_${TOPIC}"
mkdir -p "$RESEARCH_DIR/pages/v4" "$RESEARCH_DIR/pages/v5"
# Map different versions
firecrawl map https://v4.docs.example.com --search "migration" -o "$RESEARCH_DIR/v4-urls.json" --json
firecrawl map https://v5.docs.example.com --search "migration" -o "$RESEARCH_DIR/v5-urls.json" --json
# Scrape into version-specific directories
jq -r '.urls[]' "$RESEARCH_DIR/v4-urls.json" | \
xargs -P 5 -I {} sh -c 'firecrawl scrape "{}" -o "'"$RESEARCH_DIR"'/pages/v4/$(echo {} | md5sum | cut -d\" \" -f1).md"'
jq -r '.urls[]' "$RESEARCH_DIR/v5-urls.json" | \
xargs -P 5 -I {} sh -c 'firecrawl scrape "{}" -o "'"$RESEARCH_DIR"'/pages/v5/$(echo {} | md5sum | cut -d\" \" -f1).md"'
Tips
- Start narrow, expand if needed: Begin with specific search terms, then broaden if results are insufficient
- Check file sizes: Use
wc -landls -lhto gauge content volume before reading - Use grep effectively: Search for specific terms, function names, or error codes across all scraped files
- Respect rate limits: Monitor concurrency with
firecrawl --statusand adjust parallel operations - Organized archives: Each research session creates a timestamped directory in
.firecrawl/for complete traceability - Dual saving: Save summaries to both
$RESEARCH_DIR/summary.md(archive) anddocs/(quick access) - Review past research: Browse
.firecrawl/to find previous research sessions by timestamp and topic name
Common Use Cases
- API Integration: Research authentication, endpoints, rate limits, and SDKs
- Migration Planning: Gather breaking changes, deprecations, and migration guides
- Feature Implementation: Find usage patterns, configuration options, and examples
- Troubleshooting: Search error codes, known issues, and solutions in official docs
- Best Practices: Extract recommended patterns, performance tips, and security guidelines