tech-docs-research

📁 puwenyin/skills 📅 4 days ago

总安装量

周安装量

#41419

全站排名

安装命令

npx skills add https://github.com/puwenyin/skills --skill tech-docs-research

Agent 安装分布

amp 1

opencode 1

kimi-cli 1

codex 1

gemini-cli 1

Skill 文档

Tech Docs Research

Systematic workflow for researching technical documentation using Firecrawl’s mapping and scraping capabilities.

Workflow

Follow this 5-step process for comprehensive documentation research:

Step 1: Create Research Directory

Create a timestamped directory for this research session:

# Generate timestamp and topic-based folder name
TIMESTAMP=$(date +"%Y_%m_%d_%H_%M_%S")
TOPIC="react-hooks"  # Replace with sanitized topic name (lowercase, hyphens instead of spaces)
RESEARCH_DIR=".firecrawl/${TIMESTAMP}_${TOPIC}"

# Create the research directory structure
mkdir -p "$RESEARCH_DIR/pages"

echo "Research directory: $RESEARCH_DIR"

Folder naming rules:

Format: YYYY_MM_DD_HH_mm_ss_<topic>
Topic should be lowercase, use hyphens for spaces
Examples: 2026_02_08_14_30_45_react-hooks, 2026_02_08_15_20_10_nextjs-routing

Step 2: Identify Documentation Base URL

Ask the user for the base documentation URL if not provided, or infer from the technology name:

# Examples:
# React: https://react.dev
# Next.js: https://nextjs.org/docs
# Python: https://docs.python.org
# FastAPI: https://fastapi.tiangolo.com

Step 3: Map Documentation Structure

Use firecrawl map to discover all documentation URLs, filtering by the user’s topic:

# Basic mapping with search filter
firecrawl map https://docs.example.com --search "authentication" -o "$RESEARCH_DIR/docs-urls.txt"

# For comprehensive research (more URLs)
firecrawl map https://docs.example.com --search "api" --limit 100 -o "$RESEARCH_DIR/docs-urls.json" --json

# Include subdomains if documentation spans multiple domains
firecrawl map https://example.com --include-subdomains --search "guides" -o "$RESEARCH_DIR/all-docs.txt"

Key points:

Use --search to filter URLs by topic keywords
Output as JSON (--json) for easier processing
Adjust --limit based on scope (default: all URLs found)
Review the mapped URLs before scraping to ensure relevance

Step 4: Scrape Documentation in Parallel

Extract URLs from the map output and scrape them in parallel:

# Check concurrency limit first
firecrawl --status

# Extract URLs from JSON and scrape in parallel (example for 5 URLs)
jq -r '.urls[]' "$RESEARCH_DIR/docs-urls.json" | head -5 | while read url; do
  filename=$(echo "$url" | sed 's|https://||' | sed 's|/|_|g')
  firecrawl scrape "$url" --only-main-content -o "$RESEARCH_DIR/pages/${filename}.md" &
done
wait

# Or use xargs for better parallel control (adjust -P based on concurrency limit)
jq -r '.urls[]' "$RESEARCH_DIR/docs-urls.json" | head -10 | \
  xargs -P 10 -I {} sh -c 'firecrawl scrape "{}" --only-main-content -o "'"$RESEARCH_DIR"'/pages/$(echo {} | md5sum | cut -d\" \" -f1).md"'

Best practices:

Always check firecrawl --status for concurrency limits
Use --only-main-content to remove navigation and boilerplate
Organize scraped pages in $RESEARCH_DIR/pages/ subdirectory
Use meaningful filenames or hash-based names for URLs
Scrape incrementally for large documentation sets (10-20 pages at a time)

Step 5: Analyze and Summarize

Read the scraped documentation incrementally and generate a structured summary:

# Check what was scraped
ls -lh "$RESEARCH_DIR/pages/"

# Preview first file to understand structure
head -50 "$RESEARCH_DIR/pages/"[first-file].md

# Use grep to find specific information across all files
grep -r "example" "$RESEARCH_DIR/pages/" | head -20
grep -r "configuration" "$RESEARCH_DIR/pages/" -A 5

Summary structure:

Generate a summary following this template:

IMPORTANT:

Save to both locations:
- Save the final summary to $RESEARCH_DIR/summary.md (for archival)
- Also save to docs/ directory for easy access (e.g., docs/react-hooks-research.md)
Include source URLs: Every finding, code example, and key point MUST include the source documentation URL for traceability and reference
Preserve research directory: The $RESEARCH_DIR folder contains all raw scraped pages, URLs, and the summary for future reference
Use Mermaid diagrams: When documenting processes, workflows, or relationships, use Mermaid diagrams (flowcharts, sequence diagrams, etc.) to make complex concepts visual and easier to understand

# [Technology/Topic] Documentation Research

**Date**: [YYYY-MM-DD]
**Research Scope**: [Brief description of what was researched]
**Pages Analyzed**: [Number of documentation pages]

## Overview
[Brief description of what was researched and total pages analyzed]

## Key Findings

### [Topic Area 1]
- **Main concept**: [explanation]
- **Key points**:
  - [point 1] ([Source URL])
  - [point 2] ([Source URL])
- **Code example**:
  ```[language]
  [code snippet]

Source: [URL to documentation page]

[Topic Area 2]

Summary: [explanation] ([Source URL])
Details:
- [finding with source link]
- [finding with source link]

Process Flow (Use Mermaid diagrams when applicable)

Authentication Flow Example

sequenceDiagram
    participant Client
    participant Server
    participant Database

    Client->>Server: POST /login {username, password}
    Server->>Database: Query user credentials
    Database-->>Server: User data
    Server->>Server: Validate password
    Server-->>Client: JWT token
    Client->>Server: API request with token
    Server->>Server: Verify token
    Server-->>Client: Protected resource

Source: [URL to authentication documentation]

Request Lifecycle Example

flowchart TD
    A[Incoming Request] --> B{Middleware}
    B -->|Authenticated| C[Route Handler]
    B -->|Not Authenticated| D[Return 401]
    C --> E{Validation}
    E -->|Valid| F[Process Request]
    E -->|Invalid| G[Return 400]
    F --> H[Query Database]
    H --> I[Transform Data]
    I --> J[Return Response]

Source: [URL to request handling documentation]

State Machine Example

stateDiagram-v2
    [*] --> Idle
    Idle --> Loading: startFetch()
    Loading --> Success: onSuccess()
    Loading --> Error: onError()
    Success --> Idle: reset()
    Error --> Idle: reset()
    Error --> Loading: retry()

Source: [URL to state management documentation]

When to use Mermaid diagrams:

Sequence Diagrams: API calls, authentication flows, multi-step processes, component interactions
Flowcharts: Decision trees, request lifecycle, data processing pipelines, workflow logic
State Diagrams: Component states, application lifecycle, form validation states
Class/ER Diagrams: Data models, database schemas, type relationships
Gantt Charts: Migration timelines, deprecation schedules

Common Patterns

[Recurring themes, best practices, or conventions found across documentation]

Pattern 1 – [Description] ([Source URLs])
Pattern 2 – [Description] ([Source URLs])

Important Notes

[Warnings, deprecations, or critical information highlighted in the docs]

Warning: [description] – See: [URL]
Deprecation: [description] – See: [URL]

Documentation Resources

Page Title 1 – [Brief description]
Page Title 2 – [Brief description]
API Reference
Tutorial/Guide

Next Steps

[Suggested actions based on findings]

Output saved to:

Research archive: $RESEARCH_DIR/summary.md
Quick access: docs/[filename].md Research conducted using: Firecrawl mapping and scraping


**Saving the summary:**

```bash
# Save summary to both locations
cat > "$RESEARCH_DIR/summary.md" << 'EOF'
[Your generated summary content here]
EOF

# Copy to docs/ for easy access
cp "$RESEARCH_DIR/summary.md" "docs/${TOPIC}-research.md"

echo "Research completed!"
echo "Archive: $RESEARCH_DIR"
echo "Summary: docs/${TOPIC}-research.md"

Reading strategy:

Don’t load entire files at onceâuse head, grep, or incremental reads
Focus on sections relevant to the user’s question
Extract code examples, configuration patterns, and API signatures
Cross-reference information across multiple pages
Identify process flows and workflows that would benefit from visual diagrams

Visualization best practices:

Use Mermaid diagrams to illustrate complex processes, flows, and relationships
Always include source URLs for each diagram to trace back to the documentation
Common diagram types:
- sequenceDiagram: API interactions, authentication flows, multi-service communication
- flowchart: Decision logic, request handling, data pipelines
- stateDiagram-v2: Component lifecycle, application states, form flows
- classDiagram: Type hierarchies, data models, interface relationships
- erDiagram: Database schemas, entity relationships
- gantt: Project timelines, migration schedules, deprecation roadmaps

Directory structure after completion:

.firecrawl/
âââ 2026_02_08_14_30_45_react-hooks/
    âââ docs-urls.json          # Mapped URLs
    âââ pages/                  # Scraped documentation
    â   âââ page1.md
    â   âââ page2.md
    â   âââ ...
    âââ summary.md              # Research summary

docs/
âââ react-hooks-research.md     # Copy for quick access

Advanced Patterns

Multi-Site Research

Research across multiple documentation sources:

# Create research directory for multi-site research
TIMESTAMP=$(date +"%Y_%m_%d_%H_%M_%S")
TOPIC="react-nextjs-comparison"
RESEARCH_DIR=".firecrawl/${TIMESTAMP}_${TOPIC}"
mkdir -p "$RESEARCH_DIR/pages"

# Map multiple sites
firecrawl map https://docs.react.dev --search "hooks" -o "$RESEARCH_DIR/react-urls.json" --json &
firecrawl map https://nextjs.org/docs --search "routing" -o "$RESEARCH_DIR/nextjs-urls.json" --json &
wait

# Combine and scrape
cat "$RESEARCH_DIR/react-urls.json" "$RESEARCH_DIR/nextjs-urls.json" | \
  jq -r '.urls[]' | \
  xargs -P 10 -I {} sh -c 'firecrawl scrape "{}" --only-main-content -o "'"$RESEARCH_DIR"'/pages/$(echo {} | md5sum | cut -d\" \" -f1).md"'

Topic-Focused Deep Dive

When researching a specific API or feature:

# 1. Create research directory
TIMESTAMP=$(date +"%Y_%m_%d_%H_%M_%S")
TOPIC="useeffect-deep-dive"
RESEARCH_DIR=".firecrawl/${TIMESTAMP}_${TOPIC}"
mkdir -p "$RESEARCH_DIR/pages"

# 2. Search for the exact topic
firecrawl map https://docs.example.com --search "useEffect hook" -o "$RESEARCH_DIR/topic-urls.json" --json

# 3. Scrape with additional context (include related sections)
jq -r '.urls[]' "$RESEARCH_DIR/topic-urls.json" | \
  xargs -P 5 -I {} firecrawl scrape "{}" -o "$RESEARCH_DIR/pages/$(basename {}).md"

# 4. Extract all code examples
grep -r "```" "$RESEARCH_DIR/pages/" -A 10 > "$RESEARCH_DIR/code-examples.txt"

Version-Specific Research

Compare documentation across versions:

# Create research directory
TIMESTAMP=$(date +"%Y_%m_%d_%H_%M_%S")
TOPIC="framework-v4-to-v5-migration"
RESEARCH_DIR=".firecrawl/${TIMESTAMP}_${TOPIC}"
mkdir -p "$RESEARCH_DIR/pages/v4" "$RESEARCH_DIR/pages/v5"

# Map different versions
firecrawl map https://v4.docs.example.com --search "migration" -o "$RESEARCH_DIR/v4-urls.json" --json
firecrawl map https://v5.docs.example.com --search "migration" -o "$RESEARCH_DIR/v5-urls.json" --json

# Scrape into version-specific directories
jq -r '.urls[]' "$RESEARCH_DIR/v4-urls.json" | \
  xargs -P 5 -I {} sh -c 'firecrawl scrape "{}" -o "'"$RESEARCH_DIR"'/pages/v4/$(echo {} | md5sum | cut -d\" \" -f1).md"'
jq -r '.urls[]' "$RESEARCH_DIR/v5-urls.json" | \
  xargs -P 5 -I {} sh -c 'firecrawl scrape "{}" -o "'"$RESEARCH_DIR"'/pages/v5/$(echo {} | md5sum | cut -d\" \" -f1).md"'

Tips

Start narrow, expand if needed: Begin with specific search terms, then broaden if results are insufficient
Check file sizes: Use wc -l and ls -lh to gauge content volume before reading
Use grep effectively: Search for specific terms, function names, or error codes across all scraped files
Respect rate limits: Monitor concurrency with firecrawl --status and adjust parallel operations
Organized archives: Each research session creates a timestamped directory in .firecrawl/ for complete traceability
Dual saving: Save summaries to both $RESEARCH_DIR/summary.md (archive) and docs/ (quick access)
Review past research: Browse .firecrawl/ to find previous research sessions by timestamp and topic name

Common Use Cases

API Integration: Research authentication, endpoints, rate limits, and SDKs
Migration Planning: Gather breaking changes, deprecations, and migration guides
Feature Implementation: Find usage patterns, configuration options, and examples
Troubleshooting: Search error codes, known issues, and solutions in official docs
Best Practices: Extract recommended patterns, performance tips, and security guidelines

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台