arcfetch

📁 briansunter/arcfetch 📅 12 days ago

总安装量

周安装量

#64256

全站排名

安装命令

npx skills add https://github.com/briansunter/arcfetch --skill arcfetch

Agent 安装分布

opencode 2

claude-code 2

mcpjam 1

kilo 1

windsurf 1

zencoder 1

Skill 文档

arcfetch

Guide for using the arcfetch CLI to fetch web content, extract articles as clean markdown, and manage cached references.

Overview

arcfetch converts web pages to clean markdown with:

Automatic Playwright fallback when simple HTTP fetch produces low-quality results
Quality scoring (0-100) with boilerplate/error page/login wall/paywall detection
Content-to-source ratio analysis to catch JS-rendered or gated content
Anti-bot detection measures (stealth plugin, viewport/timezone rotation, realistic headers)

Installation

# Use via bunx (no install needed)
bunx arcfetch <command>

# Or install globally
bun install -g arcfetch

Commands

fetch

Fetch a URL and save extracted markdown to the temp folder.

arcfetch fetch <url> [options]

Options:

-q, --query <text> – Search query (saved as metadata)
-o, --output <format> – Output format:
- text – Plain text, LLM-friendly (default)
- json – Structured JSON
- path – Just the filepath
- summary – slug|filepath format
--pretty – Human-friendly output with emojis
-v, --verbose – Show detailed output (quality scores, pipeline decisions)
--min-quality <n> – Minimum quality score 0-100 (default: 60)
--temp-dir <path> – Temp folder (default: .tmp/arcfetch)
--docs-dir <path> – Docs folder (default: docs/ai/references)
--wait-strategy <mode> – Playwright wait: networkidle (default), domcontentloaded, load
--force-playwright – Skip simple fetch, use Playwright directly
--refetch – Re-fetch even if URL already cached

Examples:

# Basic fetch (plain text output for LLMs)
arcfetch fetch https://example.com/article

# Get just the filepath
arcfetch fetch https://example.com -o path

# Human-friendly output
arcfetch fetch https://example.com --pretty

# JSON output for scripting
arcfetch fetch https://example.com -o json

# With search query metadata
arcfetch fetch https://example.com -q "search term"

# Verbose mode to see pipeline decisions
arcfetch fetch https://example.com -v

# Force Playwright for JS-heavy sites
arcfetch fetch https://example.com --force-playwright

Quality Pipeline:

Simple HTTP fetch with browser-like User-Agent
Extract content with Readability + Turndown
Quality score (0-100) with boilerplate detection
Score >= 85: accept as-is
Score 60-84: try Playwright, use whichever scores higher
Score < 60: require Playwright, fail if still below threshold

list

List all cached references.

arcfetch list [options]

Options:

-o, --output <format> – Output format: text, json
--pretty – Human-friendly output

arcfetch list --pretty
arcfetch list -o json

links

Extract all links from a cached reference.

arcfetch links <ref-id> [options]

Options:

-o, --output <format> – Output format: text, json
--pretty – Human-friendly output

arcfetch links my-article --pretty
arcfetch links my-article -o json

fetch-links

Fetch all links from a cached reference (parallel, max 5 concurrent).

arcfetch fetch-links <ref-id> [options]

Options:

--refetch – Force re-fetch even if already cached
-o, --output <format> – Output format: text, json
--pretty – Human-friendly output

arcfetch fetch-links my-article --pretty

promote

Move reference from temp to permanent docs folder.

arcfetch promote <ref-id> [options]

arcfetch promote my-article --pretty
arcfetch promote my-article -o json

delete

Delete a cached reference.

arcfetch delete <ref-id> [options]

arcfetch delete my-article --pretty

config

Show current configuration.

arcfetch config

mcp

Start the MCP server (for Claude Code integration).

arcfetch mcp

Workflow Patterns

Single Article

arcfetch fetch https://example.com/guide --pretty
cat .tmp/arcfetch/example-guide.md  # Review content
arcfetch promote example-guide      # Move to docs if good

Batch Fetch

for url in "url1" "url2" "url3"; do
  arcfetch fetch "$url" --pretty
done
arcfetch list --pretty         # Review all
arcfetch promote my-article    # Promote desired ones

Fetch All Links from a Page

arcfetch fetch https://example.com/resources --pretty
arcfetch links resources --pretty           # See what links exist
arcfetch fetch-links resources --pretty     # Fetch them all

Scripting with JSON Output

RESULT=$(arcfetch fetch https://example.com -o json)
REF_ID=$(echo "$RESULT" | jq -r '.refId')
QUALITY=$(echo "$RESULT" | jq -r '.quality')

if (( QUALITY >= 85 )); then
  arcfetch promote "$REF_ID"
fi

Cleanup

arcfetch list --pretty
arcfetch delete unwanted-ref

Configuration

Priority Order

CLI arguments
Environment variables
arcfetch.config.json
.arcfetchrc / .arcfetchrc.json
Built-in defaults

Config File (`arcfetch.config.json`)

{
  "quality": {
    "minScore": 60,
    "jsRetryThreshold": 85
  },
  "paths": {
    "tempDir": ".tmp/arcfetch",
    "docsDir": "docs/ai/references"
  },
  "playwright": {
    "timeout": 30000,
    "waitStrategy": "networkidle"
  }
}

Environment Variables

export SOFETCH_MIN_SCORE=60
export SOFETCH_TEMP_DIR=".tmp/arcfetch"
export SOFETCH_DOCS_DIR="docs/ai/references"

Quality Scoring

Score starts at 100 and deductions apply:

Check	Deduction	Severity
Blank content	Score = 0	Issue
Content < 50 chars	-50	Issue
Content < 300 chars	-15	Warning
HTML tags > 100	-40	Issue
HTML tags > 50	-20	Warning
HTML tags > 10	-5	Warning
HTML ratio > 30%	-25	Issue
HTML ratio > 15%	-10	Warning
Table tags > 50	-30	Issue
Script tags present	-15	Warning
Style tags present	-10	Warning
Extraction ratio < 0.5% (from large page)	-35	Issue
Extraction ratio < 2% (from large page)	-20	Warning
Boilerplate detected (error/login/paywall)	-40	Issue
Excessive newlines	-5	Warning

Boilerplate patterns detected (on short content < 2000 chars):

Error pages: “something went wrong”, “an error occurred”, “unexpected error”
404 pages: “page not found”, “404 not found”
Login walls: “log in to continue”, “please log in”, “sign in to continue”
Paywalls: “subscribe to continue reading”
Bot detection: “are you a robot”, “complete the captcha”, “verify you are human”
Access denied, JS-required, unsupported browser pages

Score thresholds:

>= 90: Excellent
>= 75: Good
>= 60: Acceptable (minimum to pass)
< 60: Poor (rejected)

MCP Server

The MCP server exposes 6 tools for Claude Code integration:

Tool	Description
`fetch_url`	Fetch URL, extract markdown, save to temp
`list_cached`	List all cached references
`promote_reference`	Move temp reference to docs folder
`delete_cached`	Delete a cached reference
`extract_links`	Extract links from a cached reference
`fetch_links`	Fetch all links from a cached reference

Start via CLI: arcfetch mcp

Or configure in Claude Code MCP settings to run bunx arcfetch as stdio server.

File Format

Cached files use markdown with YAML frontmatter:

---
title: "Article Title"
source_url: https://example.com/article
fetched_date: 2026-02-06
type: web
status: temporary
query: "optional search query"
---

# Article Title

Extracted markdown content...

Ref IDs are slugified titles (e.g., how-to-build-react-apps)
Temp storage: .tmp/arcfetch/<slug>.md (status: temporary)
Permanent storage: docs/ai/references/<slug>.md (status: permanent, after promote)
Duplicate detection: re-fetching same URL returns existing ref unless --refetch

References

Package: https://www.npmjs.com/package/arcfetch
Repository: https://github.com/briansunter/arcfetch
MCP Protocol: https://modelcontextprotocol.io

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台

arcfetch

Agent 安装分布

Skill 文档

arcfetch

Overview

Installation

Commands

fetch

list

links

fetch-links

promote

delete

config

mcp

Workflow Patterns

Single Article

Batch Fetch

Fetch All Links from a Page

Scripting with JSON Output

Cleanup

Configuration

Priority Order

Config File (arcfetch.config.json)

Environment Variables

Quality Scoring

MCP Server

File Format

References

Config File (`arcfetch.config.json`)