crawl4ai

📁 tao3k/omni-dev-fusion 📅 Jan 23, 2026

总安装量

周安装量

#9603

全站排名

安装命令

npx skills add https://github.com/tao3k/omni-dev-fusion --skill crawl4ai

Agent 安装分布

gemini-cli 20

opencode 19

codex 17

github-copilot 16

claude-code 14

kimi-cli 10

Skill 文档

crawl4ai

High-performance web crawler with intelligent chunking. Crawls web pages and extracts content as markdown using LLM-based skeleton planning.

Commands

`crawl_url` (alias: `webCrawl`)

Crawl a web page with LangGraph workflow and LLM-based intelligent chunking.

Parameters:

Parameter	Type	Default	Description
`url`	str	–	Target URL to crawl (required)
`action`	str	“smart”	Action mode: “smart”, “skeleton”, “crawl”
`fit_markdown`	bool	true	Clean and simplify markdown output
`max_depth`	int	0	Maximum crawling depth (0=single page)
`return_skeleton`	bool	false	Also return document skeleton (TOC)
`chunk_indices`	list[int]	–	List of section indices to extract

Action Modes:

Mode	Description	Use Case
`smart` (default)	LLM generates chunk plan, then extracts relevant sections	Large docs where you need specific info
`skeleton`	Extract lightweight TOC without full content	Quick overview, decide what to read
`crawl`	Return full markdown content	Small pages, complete content needed

Examples:

# Smart crawl with LLM chunking (default)
@omni("crawl4ai.CrawlUrl", {"url": "https://example.com"})

# Skeleton only - get TOC quickly
@omni("crawl4ai.CrawlUrl", {"url": "https://example.com", "action": "skeleton"})

# Full content crawl
@omni("crawl4ai.CrawlUrl", {"url": "https://example.com", "action": "crawl"})

# Extract specific sections
@omni("crawl4ai.CrawlUrl", {"url": "https://example.com", "chunk_indices": [0, 1, 2]})

# Deep crawl (follow links up to depth N)
@omni("crawl4ai.CrawlUrl", {"url": "https://example.com", "max_depth": 2})

# Get skeleton with full content
@omni("crawl4ai.CrawlUrl", {"url": "https://example.com", "return_skeleton": true})

Core Concepts

Topic	Description	Reference
Skeleton Planning	LLM sees TOC (~500 tokens) not full content (~10k+)	smart-chunking.md
Chunk Extraction	Token-aware section extraction	chunking.md
Deep Crawling	Multi-page crawling with BFS strategy	deep-crawl.md

Best Practices

Use skeleton mode first for large documents to understand structure
Use chunk_indices to extract specific sections instead of full content
Set max_depth > 0 carefully – limits pages crawled to prevent runaway crawling
Keep fit_markdown=true for cleaner output, false for raw content

Advanced

Batch multiple URLs with separate calls
Combine with knowledge tools for RAG pipelines
Use skeleton + LLM to auto-generate chunk plans for custom extraction

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台

crawl4ai

Agent 安装分布

Skill 文档

crawl4ai

Commands

crawl_url (alias: webCrawl)

Core Concepts

Best Practices

Advanced

`crawl_url` (alias: `webCrawl`)