llms-txt-crawler

📁 agykit/agykit 📅 Jan 30, 2026
8
总安装量
8
周安装量
#33882
全站排名
安装命令
npx skills add https://github.com/agykit/agykit --skill llms-txt-crawler

Agent 安装分布

opencode 6
claude-code 5
openclaw 5
trae 4
kiro-cli 4
cursor 4

Skill 文档

llms.txt Crawler Skill

This skill enables you to fetch llms.txt files from websites and crawl all pages listed within them. The llms.txt format is a standard way for websites to provide LLM-friendly content listings.

Overview

The llms.txt file typically follows this format:

# Site Name

## Section Name

- [Page Title](https://example.com/page.md): Description of the page
- [Another Page](https://example.com/another.md): Another description

This skill parses these files and downloads all linked content.

Usage

Basic Usage

Run the crawl script with a target URL:

cd /path/to/skills/llms-txt-crawler/scripts
npm install  # First time only
node crawl.js --url https://example.com

Command Line Options

Option Short Description Default
--url -u Base URL of the site with llms.txt Required
--output -o Output directory for crawled files ./output
--format -f Output format: md, json, or txt md
--delay -d Delay between requests in milliseconds 500
--concurrent -c Maximum concurrent requests 3

Examples

Crawl agentskills.io documentation:

node crawl.js --url https://agentskills.io --output ./agentskills-docs

Crawl with custom rate limiting:

node crawl.js --url https://example.com --delay 1000 --concurrent 2

Output as JSON:

node crawl.js --url https://example.com --format json

Output Structure

The script creates the following output structure:

output/
├── llms.txt              # Original llms.txt file
├── index.json            # Metadata about all crawled pages
└── pages/
    ├── page-1.md
    ├── page-2.md
    └── ...

Error Handling

  • Network errors: Retries up to 3 times with exponential backoff
  • Rate limiting: Respects delay settings between requests
  • Missing pages: Logs warnings but continues crawling other pages
  • Invalid URLs: Skips and logs invalid URLs

Integration Tips

When using this skill in an agent workflow:

  1. First run the crawler to download content
  2. The index.json file contains metadata about all pages
  3. Use the downloaded markdown files for context or analysis

See Also