crawl4ai
25
总安装量
25
周安装量
#7783
全站排名
安装命令
npx skills add https://github.com/tao3k/omni-dev-fusion --skill crawl4ai
Agent 安装分布
gemini-cli
20
opencode
19
codex
17
github-copilot
16
claude-code
14
kimi-cli
10
Skill 文档
crawl4ai
High-performance web crawler with intelligent chunking. Crawls web pages and extracts content as markdown using LLM-based skeleton planning.
Commands
crawl_url (alias: webCrawl)
Crawl a web page with LangGraph workflow and LLM-based intelligent chunking.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
url |
str | – | Target URL to crawl (required) |
action |
str | “smart” | Action mode: “smart”, “skeleton”, “crawl” |
fit_markdown |
bool | true | Clean and simplify markdown output |
max_depth |
int | 0 | Maximum crawling depth (0=single page) |
return_skeleton |
bool | false | Also return document skeleton (TOC) |
chunk_indices |
list[int] | – | List of section indices to extract |
Action Modes:
| Mode | Description | Use Case |
|---|---|---|
smart (default) |
LLM generates chunk plan, then extracts relevant sections | Large docs where you need specific info |
skeleton |
Extract lightweight TOC without full content | Quick overview, decide what to read |
crawl |
Return full markdown content | Small pages, complete content needed |
Examples:
# Smart crawl with LLM chunking (default)
@omni("crawl4ai.CrawlUrl", {"url": "https://example.com"})
# Skeleton only - get TOC quickly
@omni("crawl4ai.CrawlUrl", {"url": "https://example.com", "action": "skeleton"})
# Full content crawl
@omni("crawl4ai.CrawlUrl", {"url": "https://example.com", "action": "crawl"})
# Extract specific sections
@omni("crawl4ai.CrawlUrl", {"url": "https://example.com", "chunk_indices": [0, 1, 2]})
# Deep crawl (follow links up to depth N)
@omni("crawl4ai.CrawlUrl", {"url": "https://example.com", "max_depth": 2})
# Get skeleton with full content
@omni("crawl4ai.CrawlUrl", {"url": "https://example.com", "return_skeleton": true})
Core Concepts
| Topic | Description | Reference |
|---|---|---|
| Skeleton Planning | LLM sees TOC (~500 tokens) not full content (~10k+) | smart-chunking.md |
| Chunk Extraction | Token-aware section extraction | chunking.md |
| Deep Crawling | Multi-page crawling with BFS strategy | deep-crawl.md |
Best Practices
- Use
skeletonmode first for large documents to understand structure - Use
chunk_indicesto extract specific sections instead of full content - Set
max_depth> 0 carefully – limits pages crawled to prevent runaway crawling - Keep
fit_markdown=truefor cleaner output, false for raw content
Advanced
- Batch multiple URLs with separate calls
- Combine with knowledge tools for RAG pipelines
- Use skeleton + LLM to auto-generate chunk plans for custom extraction