extract
528
总安装量
538
周安装量
#541
全站排名
安装命令
npx skills add https://github.com/tavily-ai/skills --skill extract
Agent 安装分布
claude-code
369
opencode
348
gemini-cli
303
codex
274
openclaw
193
Skill 文档
Extract Skill
Extract clean content from specific URLs. Ideal when you know which pages you want content from.
Prerequisites
Tavily API Key Required – Get your key at https://tavily.com
Add to ~/.claude/settings.json:
{
"env": {
"TAVILY_API_KEY": "tvly-your-api-key-here"
}
}
Quick Start
Using the Script
./scripts/extract.sh '<json>'
Examples:
# Single URL
./scripts/extract.sh '{"urls": ["https://example.com/article"]}'
# Multiple URLs
./scripts/extract.sh '{"urls": ["https://example.com/page1", "https://example.com/page2"]}'
# With query focus and chunks
./scripts/extract.sh '{"urls": ["https://example.com/docs"], "query": "authentication API", "chunks_per_source": 3}'
# Advanced extraction for JS pages
./scripts/extract.sh '{"urls": ["https://app.example.com"], "extract_depth": "advanced", "timeout": 60}'
Basic Extraction
curl --request POST \
--url https://api.tavily.com/extract \
--header "Authorization: Bearer $TAVILY_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"urls": ["https://example.com/article"]
}'
Multiple URLs with Query Focus
curl --request POST \
--url https://api.tavily.com/extract \
--header "Authorization: Bearer $TAVILY_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"urls": [
"https://example.com/ml-healthcare",
"https://example.com/ai-diagnostics"
],
"query": "AI diagnostic tools accuracy",
"chunks_per_source": 3
}'
API Reference
Endpoint
POST https://api.tavily.com/extract
Headers
| Header | Value |
|---|---|
Authorization |
Bearer <TAVILY_API_KEY> |
Content-Type |
application/json |
Request Body
| Field | Type | Default | Description |
|---|---|---|---|
urls |
array | Required | URLs to extract (max 20) |
query |
string | null | Reranks chunks by relevance |
chunks_per_source |
integer | 3 | Chunks per URL (1-5, requires query) |
extract_depth |
string | "basic" |
basic or advanced (for JS pages) |
format |
string | "markdown" |
markdown or text |
include_images |
boolean | false | Include image URLs |
timeout |
float | varies | Max wait (1-60 seconds) |
Response Format
{
"results": [
{
"url": "https://example.com/article",
"raw_content": "# Article Title\n\nContent..."
}
],
"failed_results": [],
"response_time": 2.3
}
Extract Depth
| Depth | When to Use |
|---|---|
basic |
Simple text extraction, faster |
advanced |
Dynamic/JS-rendered pages, tables, structured data |
Examples
Single URL Extraction
curl --request POST \
--url https://api.tavily.com/extract \
--header "Authorization: Bearer $TAVILY_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"urls": ["https://docs.python.org/3/tutorial/classes.html"],
"extract_depth": "basic"
}'
Targeted Extraction with Query
curl --request POST \
--url https://api.tavily.com/extract \
--header "Authorization: Bearer $TAVILY_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"urls": [
"https://example.com/react-hooks",
"https://example.com/react-state"
],
"query": "useState and useEffect patterns",
"chunks_per_source": 2
}'
JavaScript-Heavy Pages
curl --request POST \
--url https://api.tavily.com/extract \
--header "Authorization: Bearer $TAVILY_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"urls": ["https://app.example.com/dashboard"],
"extract_depth": "advanced",
"timeout": 60
}'
Batch Extraction
curl --request POST \
--url https://api.tavily.com/extract \
--header "Authorization: Bearer $TAVILY_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"urls": [
"https://example.com/page1",
"https://example.com/page2",
"https://example.com/page3",
"https://example.com/page4",
"https://example.com/page5"
],
"extract_depth": "basic"
}'
Tips
- Max 20 URLs per request – batch larger lists
- Use
query+chunks_per_sourceto get only relevant content - Try
basicfirst, fall back toadvancedif content is missing - Set longer
timeoutfor slow pages (up to 60s) - Check
failed_resultsfor URLs that couldn’t be extracted