seo-engine
2
总安装量
2
周安装量
#70746
全站排名
安装命令
npx skills add https://github.com/agharsallah/seo-fix-skills --skill seo-engine
Agent 安装分布
amp
2
gemini-cli
2
github-copilot
2
codex
2
kimi-cli
2
cursor
2
Skill 文档
SEO Engine
Use this skill to run deterministic checks on HTML, headers, robots.txt, and related resources. It returns pass/fail outcomes with minimal heuristics and clear remediation steps.
When to use
- Auditing a pageâs indexability and crawlability
- Verifying content structure (title, headings, links, images)
- Flagging spam policy violations (cloaking, hidden text, keyword stuffing)
- Sanity-checking redirect behavior and HTTP status
- Verifying dashboard filters/metrics for SEO reporting
Included Rules
| Rule ID | Title | Priority | Category | Description |
|---|---|---|---|---|
| PAGE_TITLE_EXISTS | Page title tag present and non-empty | Low | Content Basics | Ensures the <title> tag exists and is not empty. |
| MAIN_HEADING_EXISTS | Main heading (<h1>) present and non-empty | Low | Content Basics | At least one <h1> element exists with non-empty text. |
| IMAGE_ALT_TEXT | All images have non-empty alt attributes | Medium | Content Basics | Every <img> element includes a non-empty alt attribute. |
| CRAWLABLE_LINKS | All anchor links are crawlable (have valid href) | Low | Content Basics | Every <a> element has a non-empty href that does not start with “javascript:”. |
| GOOGLEBOT_NOT_BLOCKED | Googlebot is not blocked by robots.txt | High | Technical Requirements | No Disallow rule for Googlebot (or *) matches the page URL. |
| PAGE_HTTP_200_STATUS | Page returns HTTP 200 status | Critical | Technical Requirements | Page responds with HTTP 200, not error or redirect. |
| PAGE_INDEXABLE_CONTENT | Page has indexable textual content | Medium | Technical Requirements | Body contains at least one alphanumeric character after stripping markup. |
| HEAD_SECTION_VALID_HTML | Head section must be valid HTML | High | Technical Requirements | The page contains exactly one <head> element and the HTML parses without syntax errors. |
| CLOAKING_DETECTION | Detect cloaking by comparing bot vs user content | High | Spam Policies | Content served to Googlebot and regular users is substantially identical (⥠90% similarity). |
| HIDDEN_TEXT_DETECTION | Detect hidden text or links intended for search | Medium | Spam Policies | No elements with hidden styles contain visible text or links. |
| KEYWORD_STUFFING_DETECTION | Detect excessive repetition of keywords | Medium | Spam Policies | No single keyword exceeds a density of 5% of total words. |
| SNEAKY_REDIRECT_DETECTION | Detect sneaky redirects for bots vs users | High | Spam Policies | Both HTTP status codes and final URLs are identical for Googlebot and regular users. |
| MALWARE_HOSTING_DETECTION | Detect presence of known malware signatures | Critical | Security | Resource’s hash does not match any known malware signatures. |
| TITLE_DESCRIPTIVE | Page title is descriptive, specific, and accurate | Medium | Content Optimization | Title text contains at least two words. |
| META_DESCRIPTION_PRESENT | Meta description tag is present and descriptive | Medium | Content Optimization | Meta description is present and its length is at least 50 characters. |
| IMAGE_ALT_ATTRIBUTES | All images have descriptive alt attributes | Medium | Content Optimization | Every <img> element has a non-empty alt attribute. |
| HEADING_HIERARCHY | Page uses heading elements for hierarchy | Low | Content Optimization | At least one <h1> element is present. |
| GA_FILTER_SOURCE_MEDIUM | GA data filtered to source=google, medium=organic | Medium | Dashboard Setup | Both source=google and medium=organic filter conditions are present in dashboard config. |
| DASHBOARD_METRICS_PRESENT | Dashboard includes required five metrics | Medium | Dashboard Setup | All five required metrics are present in the dashboard configuration. |
| DASHBOARD_DATA_SOURCES_CONNECTED | Dashboard connects to GA and SC | Medium | Dashboard Setup | Both Search Console and Google Analytics data sources are referenced in dashboard config. |
| AMP_PAGE_MUST_FOLLOW_SPEC | AMP page must follow AMP HTML specification | Critical | AMP Validation | Ensures the page complies with the AMP HTML specification for Google Search features. |
| BANNER_DATA_NOSNIPPET_PRESENT | Ensure banner or popup uses data-nosnippet attribute | Medium | Site Functionality | Prevents banner or popup content from being shown in search result snippets. |
| ROBOTS_TXT_NOT_503 | robots.txt must not return HTTP 503 status | High | Site Availability | A 503 response for robots.txt blocks all crawling, preventing indexing. |
| RETRY_AFTER_HEADER_PRESENT_ON_503 | 503 error pages must include a Retry-After header | Medium | Site Availability | Provides crawlers with guidance on when to retry, reducing unnecessary load. |
| NO_URL_FRAGMENTS | Avoid URL fragments that change content | High | URL Structure | Google Search may not crawl URLs where fragments are used to change content. |
| HYPHENS_IN_PATH | Use hyphens to separate words in URL path | Medium | URL Structure | Hyphens improve readability for users and search engines, aiding crawlability. |
| PERCENT_ENCODING_NECESSARY | Percentâencode nonâASCII characters in URLs | Medium | URL Structure | Percentâencoding ensures URLs are valid, crawlable, and correctly interpreted. |
| CHECK_REL_CANONICAL_PRESENT | Presence of rel=”canonical” link element | Medium | Canonicalization | rel=”canonical” link annotations influence how Google determines the canonical URL. |
| CHECK_URL_IN_SITEMAP | URL presence in sitemap.xml | Medium | Canonicalization | Presence of the URL in a sitemap is a factor influencing canonical selection. |
| CHECK_HTTP_HTTPS_CONSISTENCY | Consistent use of HTTPS scheme | Low | Canonicalization | The page’s protocol (HTTP vs HTTPS) is a factor that influences canonicalization. |
| DATA_NOSNIPPET_VALID_HTML | Ensure HTML containing data-nosnippet attribute is wellâformed | High | Data Nosnippet | HTML section must be valid HTML for dataânosnippet to be machineâreadable. |
| ROBOTS_TXT_ALLOW_INDEXING_RULES | URLs containing robots meta or XâRobotsâTag must not be disallowed | Critical | Robots.txt Rules | URLs with indexing/serving rules cannot be disallowed from crawling via robots.txt. |
| NO_CLOAKING_DETECTED | Ensure no cloaking between Googlebot and users | High | A/B Testing | Cloaking violates spam policies and can cause demotion or removal from search results. |
| REL_CANONICAL_PRESENT | Use rel=”canonical” on test variant URLs | Medium | A/B Testing | rel=canonical signals the preferred URL, preventing duplicate indexing of test variants. |
| TEMPORARY_REDIRECT_302 | Use 302 redirects for temporary test redirects | Medium | A/B Testing | A 302 redirect signals a temporary change, ensuring the original URL remains indexed. |
| CANONICAL_LINK_IN_HEAD | rel=”canonical” link element must be placed in | High | Canonicalization | The rel=”canonical” link element is only accepted if it appears in the section. |
| CANONICAL_LINK_ABSOLUTE_URL | rel=”canonical” link element must use an absolute URL | Medium | Canonicalization | Documentation recommends using absolute URLs for rel=”canonical” link elements. |
| CANONICAL_HEADER_ABSOLUTE_URL | rel=”canonical” HTTP header must use an absolute URL | Medium | Canonicalization | Documentation states that absolute URLs must be used in the rel=”canonical” HTTP header. |
| AVOID_ROBOTS_TXT_FOR_CANONICAL | Do not use robots.txt for canonicalization | Low | Canonicalization | Documentation explicitly advises against using robots.txt for canonicalization. |
| CONSISTENT_CANONICAL_METHOD | Do not specify different canonical URLs using different methods | High | Canonicalization | Specifying different canonical URLs via different techniques can cause conflicts. |
Rule Categories and Included Rules
Technical Requirements
- PAGE_HTTP_200_STATUS (Critical): Page returns HTTP 200 status. Ensures the page responds with HTTP 200, not error or redirect.
- GOOGLEBOT_NOT_BLOCKED (High): Googlebot is not blocked by robots.txt. No Disallow rule for Googlebot (or *) matches the page URL.
- PAGE_INDEXABLE_CONTENT (Medium): Page has indexable textual content. Body contains at least one alphanumeric character after stripping markup.
- HEAD_SECTION_VALID_HTML (High): Head section must be valid HTML. The page contains exactly one <head> element and the HTML parses without syntax errors.
Spam Policies
- CLOAKING_DETECTION (High): Detect cloaking by comparing bot vs user content. Content served to Googlebot and regular users is substantially identical (⥠90% similarity).
- HIDDEN_TEXT_DETECTION (Medium): Detect hidden text or links intended for search. No elements with hidden styles contain visible text or links.
- KEYWORD_STUFFING_DETECTION (Medium): Detect excessive repetition of keywords. No single keyword exceeds a density of 5% of total words.
- SNEAKY_REDIRECT_DETECTION (High): Detect sneaky redirects for bots vs users. Both HTTP status codes and final URLs are identical for Googlebot and regular users.
Content Basics
- PAGE_TITLE_EXISTS (Low): Page title tag present and non-empty. Ensures the <title> tag exists and is not empty.
- MAIN_HEADING_EXISTS (Low): Main heading (<h1>) present and non-empty. At least one <h1> element exists with non-empty text.
- CRAWLABLE_LINKS (Low): All anchor links are crawlable (have valid href). Every <a> element has a non-empty href that does not start with “javascript:”.
- IMAGE_ALT_TEXT (Medium): All images have non-empty alt attributes. Every <img> element includes a non-empty alt attribute.
Content Optimization
- TITLE_DESCRIPTIVE (Medium): Page title is descriptive, specific, and accurate. Title text contains at least two words.
- META_DESCRIPTION_PRESENT (Medium): Meta description tag is present and descriptive. Meta description is present and its length is at least 50 characters.
- IMAGE_ALT_ATTRIBUTES (Medium): All images have descriptive alt attributes. Every <img> element has a non-empty alt attribute.
- HEADING_HIERARCHY (Low): Page uses heading elements for hierarchy. At least one <h1> element is present.
Security
- MALWARE_HOSTING_DETECTION (Critical): Detect presence of known malware signatures. Resource’s hash does not match any known malware signatures.
Dashboard Setup
- GA_FILTER_SOURCE_MEDIUM (Medium): GA data filtered to source=google, medium=organic. Both source=google and medium=organic filter conditions are present in dashboard config.
- DASHBOARD_METRICS_PRESENT (Medium): Dashboard includes required five metrics. All five required metrics are present in the dashboard configuration.
- DASHBOARD_DATA_SOURCES_CONNECTED (Medium): Dashboard connects to GA and SC. Both Search Console and Google Analytics data sources are referenced in dashboard config.
AMP Validation
- AMP_PAGE_MUST_FOLLOW_SPEC (Critical): AMP page must follow AMP HTML specification. Ensures the page complies with the AMP HTML specification for Google Search features.
Site Functionality
- BANNER_DATA_NOSNIPPET_PRESENT (Medium): Ensure banner or popup uses data-nosnippet attribute. Prevents banner or popup content from being shown in search result snippets.
Site Availability
- ROBOTS_TXT_NOT_503 (High): robots.txt must not return HTTP 503 status. A 503 response for robots.txt blocks all crawling, preventing indexing.
- RETRY_AFTER_HEADER_PRESENT_ON_503 (Medium): 503 error pages must include a Retry-After header. Provides crawlers with guidance on when to retry, reducing unnecessary load.
URL Structure
- NO_URL_FRAGMENTS (High): Avoid URL fragments that change content. Google Search may not crawl URLs where fragments are used to change content.
- HYPHENS_IN_PATH (Medium): Use hyphens to separate words in URL path. Hyphens improve readability for users and search engines, aiding crawlability.
- PERCENT_ENCODING_NECESSARY (Medium): Percentâencode nonâASCII characters in URLs. Percentâencoding ensures URLs are valid, crawlable, and correctly interpreted.
Canonicalization
- CHECK_REL_CANONICAL_PRESENT (Medium): Presence of rel=”canonical” link element. rel=”canonical” link annotations influence how Google determines the canonical URL.
- CHECK_URL_IN_SITEMAP (Medium): URL presence in sitemap.xml. Presence of the URL in a sitemap is a factor influencing canonical selection.
- CHECK_HTTP_HTTPS_CONSISTENCY (Low): Consistent use of HTTPS scheme. The page’s protocol (HTTP vs HTTPS) is a factor that influences canonicalization.
- CANONICAL_LINK_IN_HEAD (High): rel=”canonical” link element must be placed in <head>. The rel=”canonical” link element is only accepted if it appears in the <head> section.
- CANONICAL_LINK_ABSOLUTE_URL (Medium): rel=”canonical” link element must use an absolute URL. Documentation recommends using absolute URLs for rel=”canonical” link elements.
- CANONICAL_HEADER_ABSOLUTE_URL (Medium): rel=”canonical” HTTP header must use an absolute URL. Documentation states that absolute URLs must be used in the rel=”canonical” HTTP header.
- AVOID_ROBOTS_TXT_FOR_CANONICAL (Low): Do not use robots.txt for canonicalization. Documentation explicitly advises against using robots.txt for canonicalization.
- CONSISTENT_CANONICAL_METHOD (High): Do not specify different canonical URLs using different methods. Specifying different canonical URLs via different techniques can cause conflicts.
Data Nosnippet
- DATA_NOSNIPPET_VALID_HTML (High): Ensure HTML containing data-nosnippet attribute is wellâformed. HTML section must be valid HTML for dataânosnippet to be machineâreadable.
Robots.txt Rules
- ROBOTS_TXT_ALLOW_INDEXING_RULES (Critical): URLs containing robots meta or XâRobotsâTag must not be disallowed. URLs with indexing/serving rules cannot be disallowed from crawling via robots.txt.
A/B Testing
- NO_CLOAKING_DETECTED (High): Ensure no cloaking between Googlebot and users. Cloaking violates spam policies and can cause demotion or removal from search results.
- REL_CANONICAL_PRESENT (Medium): Use rel=”canonical” on test variant URLs. rel=canonical signals the preferred URL, preventing duplicate indexing of test variants.
- TEMPORARY_REDIRECT_302 (Medium): Use 302 redirects for temporary test redirects. A 302 redirect signals a temporary change, ensuring the original URL remains indexed.
How to Use
Read individual rule files for detailed explanations and code examples:
rules/PAGE_TITLE_EXISTS.md
rules/MAIN_HEADING_EXISTS.md
rules/IMAGE_ALT_TEXT.md
rules/CRAWLABLE_LINKS.md
rules/GOOGLEBOT_NOT_BLOCKED.md
rules/PAGE_HTTP_200_STATUS.md
rules/PAGE_INDEXABLE_CONTENT.md
rules/CLOAKING_DETECTION.md
rules/HIDDEN_TEXT_DETECTION.md
rules/KEYWORD_STUFFING_DETECTION.md
rules/SNEAKY_REDIRECT_DETECTION.md
rules/MALWARE_HOSTING_DETECTION.md
rules/TITLE_DESCRIPTIVE.md
rules/META_DESCRIPTION_PRESENT.md
rules/IMAGE_ALT_ATTRIBUTES.md
rules/HEADING_HIERARCHY.md
rules/GA_FILTER_SOURCE_MEDIUM.md
rules/DASHBOARD_METRICS_PRESENT.md
rules/DASHBOARD_DATA_SOURCES_CONNECTED.md
rules/HEAD_SECTION_VALID_HTML.md
rules/AMP_PAGE_MUST_FOLLOW_SPEC.md
rules/BANNER_DATA_NOSNIPPET_PRESENT.md
rules/ROBOTS_TXT_NOT_503.md
rules/RETRY_AFTER_HEADER_PRESENT_ON_503.md
rules/NO_URL_FRAGMENTS.md
rules/HYPHENS_IN_PATH.md
rules/PERCENT_ENCODING_NECESSARY.md
rules/CHECK_REL_CANONICAL_PRESENT.md
rules/CHECK_URL_IN_SITEMAP.md
rules/CHECK_HTTP_HTTPS_CONSISTENCY.md
rules/DATA_NOSNIPPET_VALID_HTML.md
rules/ROBOTS_TXT_ALLOW_INDEXING_RULES.md
rules/NO_CLOAKING_DETECTED.md
rules/REL_CANONICAL_PRESENT.md
rules/TEMPORARY_REDIRECT_302.md
rules/CANONICAL_LINK_IN_HEAD.md
rules/CANONICAL_LINK_ABSOLUTE_URL.md
rules/CANONICAL_HEADER_ABSOLUTE_URL.md
rules/AVOID_ROBOTS_TXT_FOR_CANONICAL.md
rules/CONSISTENT_CANONICAL_METHOD.md
Each rule file contains:
- Brief explanation of why it matters
- Incorrect example with explanation
- Correct example with explanation
- Additional context and references
Full Compiled Document
For the complete guide with all rules expanded: rules/_sections.md