seo-technical-robots
8
总安装量
7
周安装量
#35687
全站排名
安装命令
npx skills add https://github.com/kostja94/marketing-skills --skill seo-technical-robots
Agent 安装分布
opencode
7
gemini-cli
7
github-copilot
7
codex
7
amp
7
cline
7
Skill 文档
SEO Technical: robots.txt
Guides configuration and auditing of robots.txt for search engine and AI crawler control.
Scope (Technical SEO)
- Robots.txt: Review Disallow/Allow; avoid blocking important pages
- Crawler access: Ensure crawlers (including AI crawlers) can access key pages
- Indexing: Misconfigured robots.txt can block indexing; verify no accidental blocks
Initial Assessment
Check for product marketing context first: If .claude/product-marketing-context.md or .cursor/product-marketing-context.md exists, read it for site URL and indexing goals.
Identify:
- Site URL: Base domain (e.g.,
https://example.com) - Indexing scope: Full site, partial, or specific paths to exclude
- AI crawler strategy: Allow search/indexing vs. block training data crawlers
Best Practices
Purpose and Limitations
| Point | Note |
|---|---|
| Purpose | Controls crawler access; does NOT prevent indexing (disallowed URLs may still appear in search without snippet) |
| No-index | Use noindex meta or auth for sensitive content; robots.txt is publicly readable |
| Advisory | Rules are advisory; malicious crawlers may ignore |
Location and Format
| Item | Requirement |
|---|---|
| Path | Site root: https://example.com/robots.txt |
| Encoding | UTF-8 plain text |
| Standard | RFC 9309 (Robots Exclusion Protocol) |
Core Directives
| Directive | Purpose | Example |
|---|---|---|
User-agent: |
Target crawler | User-agent: Googlebot, User-agent: * |
Disallow: |
Block path prefix | Disallow: /admin/ |
Allow: |
Allow path (can override Disallow) | Allow: /public/ |
Sitemap: |
Declare sitemap absolute URL | Sitemap: https://example.com/sitemap.xml |
Clean-param: |
Strip query params (Yandex) | See below |
Critical: Do Not Block Rendering Resources
- Do not block CSS, JS, images; Google needs them to render pages
- Only block paths that don’t need crawling: admin, API, temp files
AI Crawler Strategy
| User-agent | Purpose | Typical |
|---|---|---|
| OAI-SearchBot | ChatGPT search | Allow |
| GPTBot | OpenAI training | Disallow |
| Claude-SearchBot | Claude search | Allow |
| ClaudeBot | Anthropic training | Disallow |
| PerplexityBot | Perplexity search | Allow |
| Google-Extended | Gemini training | Disallow |
| CCBot | Common Crawl | Disallow |
Clean-param (Yandex)
Clean-param: utm_source&utm_medium&utm_campaign&utm_term&utm_content&ref&fbclid&gclid
Output Format
- Current state (if auditing)
- Recommended robots.txt (full file)
- Compliance checklist
- References: Google robots.txt
Related Skills
- seo-technical-sitemap: Sitemap URL to reference in robots.txt
- seo-technical-crawlability: Broader crawl and structure guidance