devtu-docs-quality
npx skills add https://github.com/mims-harvard/tooluniverse --skill devtu-docs-quality
Agent 安装分布
Skill 文档
Documentation Quality Assurance
Systematic documentation quality system combining automated validation scripts with ToolUniverse-specific structural audits.
When to Use
- Pre-release documentation review
- After major refactoring (commands, APIs, tool counts changed)
- User reports confusing or outdated documentation
- Circular navigation or structural problems suspected
- Want to establish automated validation pipeline
Approach: Two-Phase Strategy
Phase A: Automated Validation (15-20 min)
- Create validation scripts for systematic detection
- Test commands, links, terminology consistency
- Priority-based fixes (blockers â polish)
Phase B: ToolUniverse-Specific Audit (20-25 min)
- Circular navigation checks
- MCP configuration duplication
- Tool count consistency
- Auto-generated file conflicts
Phase A: Automated Validation
A1. Build Validation Script
Create scripts/validate_documentation.py:
#!/usr/bin/env python3
"""Documentation validator for ToolUniverse"""
import re
import glob
from pathlib import Path
DOCS_ROOT = Path("docs")
# ToolUniverse-specific patterns
DEPRECATED_PATTERNS = [
(r"python -m tooluniverse\.server", "tooluniverse-server"),
(r"600\+?\s+tools", "1000+ tools"),
(r"750\+?\s+tools", "1000+ tools"),
]
def is_false_positive(match, content):
"""Smart context checking to avoid false positives"""
start = max(0, match.start() - 100)
end = min(len(content), match.end() + 100)
context = content[start:end].lower()
# Skip if discussing deprecation itself
if any(kw in context for kw in ['deprecated', 'old version', 'migration']):
return True
# Skip technical values (ports, dimensions, etc.)
if any(kw in context for kw in ['width', 'height', 'port', '":"']):
return True
return False
def validate_file(filepath):
"""Check one file for issues"""
with open(filepath, 'r', encoding='utf-8') as f:
content = f.read()
issues = []
# Check deprecated patterns
for old_pattern, new_text in DEPRECATED_PATTERNS:
matches = re.finditer(old_pattern, content)
for match in matches:
if is_false_positive(match, content):
continue
line_num = content[:match.start()].count('\n') + 1
issues.append({
'file': filepath,
'line': line_num,
'severity': 'HIGH',
'found': match.group(),
'suggestion': new_text
})
return issues
# Scan all docs
all_issues = []
for doc_file in glob.glob(str(DOCS_ROOT / "**/*.md"), recursive=True):
all_issues.extend(validate_file(doc_file))
for doc_file in glob.glob(str(DOCS_ROOT / "**/*.rst"), recursive=True):
all_issues.extend(validate_file(doc_file))
# Report
if all_issues:
print(f"â Found {len(all_issues)} issues\n")
for issue in all_issues:
print(f"{issue['file']}:{issue['line']} [{issue['severity']}]")
print(f" Found: {issue['found']}")
print(f" Should be: {issue['suggestion']}\n")
exit(1)
else:
print("â
Documentation validation passed")
exit(0)
A2. Command Accuracy Check
Test that commands in docs actually work:
# Extract and test commands
grep -r "^\s*\$\s*" docs/ | while read line; do
cmd=$(echo "$line" | sed 's/.*\$ //' | cut -d' ' -f1)
if ! command -v "$cmd" &> /dev/null; then
echo "â Command not found: $cmd in $line"
fi
done
A3. Link Integrity Check
For RST docs:
def check_rst_links(docs_root):
"""Validate :doc: references"""
pattern = r':doc:`([^`]+)`'
for rst_file in glob.glob(f"{docs_root}/**/*.rst", recursive=True):
with open(rst_file) as f:
content = f.read()
matches = re.finditer(pattern, content)
for match in matches:
ref = match.group(1)
# Check if target exists
possible = [f"{ref}.rst", f"{ref}.md", f"{ref}/index.rst"]
if not any(Path(docs_root, p).exists() for p in possible):
print(f"â Broken link in {rst_file}: {ref}")
A4. Terminology Consistency
Track variations and standardize:
# Define standard terms
TERMINOLOGY = {
'api_endpoint': ['endpoint', 'url', 'route', 'path'],
'tool_count': ['tools', 'resources', 'integrations'],
}
def check_terminology(content):
"""Find inconsistent terminology"""
for standard, variations in TERMINOLOGY.items():
counts = {v: content.lower().count(v) for v in variations}
if len([c for c in counts.values() if c > 0]) > 2:
return f"Inconsistent terminology: {counts}"
return None
Phase B: ToolUniverse-Specific Audit
B1. Circular Navigation Check
Issue: Documentation pages that reference each other in loops.
Check manually:
# Find cross-references
grep -r ":doc:\`" docs/*.rst | grep -E "(quickstart|getting_started|installation)"
Checklist:
- Is there a clear “Start Here” on
docs/index.rst? - Does navigation follow linear path: index â quickstart â getting_started â guides?
- No “you should have completed X first” statements that create dependency loops?
Common patterns to fix:
quickstart.rstâ “See getting_started”getting_started.rstâ “Complete quickstart first”
B2. Duplicate Content Check
Common duplicates in ToolUniverse:
- Multiple FAQs:
docs/faq.rstanddocs/help/faq.rst - Getting started:
docs/installation.rst,docs/quickstart.rst,docs/getting_started.rst - MCP configuration: All files in
docs/guide/building_ai_scientists/
Detection:
# Find MCP config duplication
rg "MCP.*configuration" docs/ -l | wc -l
rg "pip install tooluniverse" docs/ -l | wc -l
Action: Consolidate or clearly differentiate
B3. Tool Count Consistency
Standard: Use “1000+ tools” consistently.
Detection:
# Find all tool count mentions
rg "[0-9]+\+?\s+(tools|resources|integrations)" docs/ --no-filename | sort -u
Check:
- Are different numbers used (600, 750, 1195)?
- Is “1000+ tools” used consistently?
- Exact counts avoided in favor of “1000+”?
B4. Auto-Generated File Headers
Auto-generated directories:
docs/tools/*_tools.rst(fromgenerate_config_index.py)docs/api/*.rst(fromsphinx-apidoc)
Required header:
.. AUTO-GENERATED - DO NOT EDIT MANUALLY
.. Generated by: docs/generate_config_index.py
.. Last updated: 2024-02-05
..
.. To modify, edit source files and regenerate.
Check:
head -5 docs/tools/*_tools.rst | grep "AUTO-GENERATED"
B5. CLI Tools Documentation
Check pyproject.toml for all CLIs:
grep -A 20 "\[project.scripts\]" pyproject.toml
Common undocumented:
tooluniverse-expert-feedbacktooluniverse-expert-feedback-webgenerate-mcp-tools
Action: Ensure all in docs/reference/cli_tools.rst
B6. Environment Variables
Discovery:
# Find all env vars in code
rg "os\.getenv|os\.environ" src/tooluniverse/ -o | sort -u
rg "TOOLUNIVERSE_[A-Z_]+" src/tooluniverse/ -o | sort -u
Categories to document:
- Cache:
TOOLUNIVERSE_CACHE_* - Logging:
TOOLUNIVERSE_LOG_* - LLM:
TOOLUNIVERSE_LLM_* - API keys:
*_API_KEY
Check:
- Does
docs/reference/environment_variables.rstexist? - Are variables categorized?
- Each has: default, description, example?
- Is there
.env.templateat project root?
B7. ToolUniverse-Specific Jargon
Terms to define on first use:
- Tool Specification
- EFO ID
- MCP, SMCP
- Compact Mode
- Tool Finder
- AI Scientist
Check:
- Is there
docs/glossary.rst? - Terms defined inline with
:term:references? - Glossary linked from main index?
B8. CI/CD Documentation Regeneration
Required in .github/workflows/deploy-docs.yml:
- name: Regenerate tool documentation
run: |
cd docs
python generate_config_index.py
python generate_remote_tools_docs.py
python generate_tool_reference.py
Check:
- CI/CD regenerates docs before build?
- Regeneration happens BEFORE Sphinx build?
-
docs/api/excluded from cache?
Priority Framework
Issue Severity
| Severity | Definition | Examples | Timeline |
|---|---|---|---|
| CRITICAL | Blocks release | Broken builds, dangerous instructions | Immediate |
| HIGH | Blocks users | Wrong commands, broken setup | Same day |
| MEDIUM | Causes confusion | Inconsistent terminology, unclear examples | Same week |
| LOW | Reduces quality | Long files, minor formatting | Future task |
Fix Order
- Run automated validation â Fix HIGH issues
- Check circular navigation â Fix CRITICAL loops
- Verify tool counts â Standardize to “1000+”
- Check auto-generated headers â Add missing
- Validate CLI docs â Document all from pyproject.toml
- Check env vars â Create reference page
- Review jargon â Create/update glossary
- Verify CI/CD â Add regeneration steps
Validation Checklist
Before considering docs “done”:
Accuracy
- Automated validation passes
- All commands tested
- Version numbers current
- Counts match reality
Structure (ToolUniverse-specific)
- No circular navigation
- Clear “Start Here” entry point
- Linear learning path
- Max 2-3 level hierarchy
Consistency
- “1000+ tools” everywhere
- Same terminology throughout
- Auto-generated files have headers
- All CLIs documented
Completeness
- All features documented
- All CLIs in pyproject.toml covered
- All env vars documented
- Glossary includes all jargon
Output: Audit Report
# Documentation Quality Report
**Date**: [date]
**Scope**: Automated validation + ToolUniverse audit
## Executive Summary
- Files scanned: X
- Issues found: Y (Critical: A, High: B, Medium: C, Low: D)
## Critical Issues
1. **[Issue]** - Location: file:line
- Problem: [description]
- Fix: [action]
- Effort: [time]
## Automated Validation Results
- Deprecated commands: X instances
- Inconsistent counts: Y instances
- Broken links: Z instances
## ToolUniverse-Specific Findings
- Circular navigation: [yes/no]
- Tool count variations: [list]
- Missing CLI docs: [list]
- Auto-generated headers: X missing
## Recommendations
1. Immediate (today): [list]
2. This week: [list]
3. Next sprint: [list]
## Validation Command
Run `python scripts/validate_documentation.py` to verify fixes
CI/CD Integration
Add to .github/workflows/validate-docs.yml:
name: Validate Documentation
on: [pull_request]
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Install dependencies
run: pip install -r requirements.txt
- name: Run validation
run: python scripts/validate_documentation.py
- name: Check auto-generated headers
run: |
for f in docs/tools/*_tools.rst; do
if ! head -1 "$f" | grep -q "AUTO-GENERATED"; then
echo "Missing header: $f"
exit 1
fi
done
Common Issues Quick Reference
| Issue | Detection | Fix |
|---|---|---|
| Deprecated command | rg "old-cmd" docs/ |
Replace with new-cmd |
| Wrong tool count | rg "[0-9]+ tools" docs/ |
Change to “1000+ tools” |
| Circular nav | Manual trace | Remove back-references |
| Missing header | head -1 file.rst |
Add AUTO-GENERATED header |
| Undocumented CLI | Check pyproject.toml | Add to cli_tools.rst |
| Missing env var | rg "os.getenv" src/ |
Add to env vars reference |
Best Practices
- Automate first – Build validation before manual audit
- Context matters – Smart pattern matching avoids false positives
- Fix systematically – Batch similar issues together
- Validate continuously – Add to CI/CD pipeline
- ToolUniverse-specific last – Automated checks catch most issues
Success Criteria
Documentation quality achieved when:
- â Automated validation reports 0 HIGH issues
- â No circular navigation
- â “1000+ tools” used consistently
- â All auto-generated files have headers
- â All CLIs from pyproject.toml documented
- â All env vars have reference page
- â Glossary covers all technical terms
- â CI/CD validates on every PR