llms-txt-crawler

📁 agykit/agykit 📅 Jan 30, 2026

总安装量

周安装量

#33882

全站排名

安装命令

npx skills add https://github.com/agykit/agykit --skill llms-txt-crawler

Agent 安装分布

opencode 6

claude-code 5

openclaw 5

trae 4

kiro-cli 4

cursor 4

Skill 文档

llms.txt Crawler Skill

This skill enables you to fetch llms.txt files from websites and crawl all pages listed within them. The llms.txt format is a standard way for websites to provide LLM-friendly content listings.

Overview

The llms.txt file typically follows this format:

# Site Name

## Section Name

- [Page Title](https://example.com/page.md): Description of the page
- [Another Page](https://example.com/another.md): Another description

This skill parses these files and downloads all linked content.

Usage

Basic Usage

Run the crawl script with a target URL:

cd /path/to/skills/llms-txt-crawler/scripts
npm install  # First time only
node crawl.js --url https://example.com

Command Line Options

Option	Short	Description	Default
`--url`	`-u`	Base URL of the site with llms.txt	Required
`--output`	`-o`	Output directory for crawled files	`./output`
`--format`	`-f`	Output format: `md`, `json`, or `txt`	`md`
`--delay`	`-d`	Delay between requests in milliseconds	`500`
`--concurrent`	`-c`	Maximum concurrent requests	`3`

Examples

Crawl agentskills.io documentation:

node crawl.js --url https://agentskills.io --output ./agentskills-docs

Crawl with custom rate limiting:

node crawl.js --url https://example.com --delay 1000 --concurrent 2

Output as JSON:

node crawl.js --url https://example.com --format json

Output Structure

The script creates the following output structure:

output/
âââ llms.txt              # Original llms.txt file
âââ index.json            # Metadata about all crawled pages
âââ pages/
    âââ page-1.md
    âââ page-2.md
    âââ ...

Error Handling

Network errors: Retries up to 3 times with exponential backoff
Rate limiting: Respects delay settings between requests
Missing pages: Logs warnings but continues crawling other pages
Invalid URLs: Skips and logs invalid URLs

Integration Tips

When using this skill in an agent workflow:

First run the crawler to download content
The index.json file contains metadata about all pages
Use the downloaded markdown files for context or analysis