ai-check
npx skills add https://github.com/astoreyai/ai_scientist --skill ai-check
Agent 安装分布
Skill 文档
AI-Generated Text Detection Skill
Purpose
Detect patterns typical of LLM-generated text to ensure natural, human-authored academic writing. This skill helps maintain authenticity in research publications, dissertations, and documentation.
When to Use This Skill
Primary Use Cases
- Pre-Commit Validation – Automatically check manuscripts and documentation before git commits
- Manuscript Review – Validate academic writing before submission to journals or committees
- Quality Assurance – Part of systematic QA workflow for research artifacts
- On-Demand Analysis – Manual review of any text file for AI patterns
- Writing Evolution Tracking – Monitor writing style changes over time
Specific Scenarios
- Before submitting dissertation chapters to advisor
- Prior to journal article submission
- When reviewing team member contributions
- After making significant edits to documentation
- When suspicious patterns are noticed in writing
- During peer review or committee review preparation
Detection Methodology
1. Grammar Pattern Analysis
What We Check:
- Excessive Perfection: Zero typos, missing commas, or minor errors throughout
- Comma Placement: Perfect comma usage in all complex sentences
- Formal Register: Consistent formal tone with no informal elements
- Grammar Consistency: No variations in grammatical choices
Red Flags:
- Absolutely no grammatical errors in 10+ pages
- Every semicolon and colon used perfectly
- No sentence fragments or run-ons even in appropriate contexts
- Overly formal language even in methods sections
Human Writing Typically Has:
- Occasional minor typos or comma splices
- Some inconsistency in formality
- Natural variations in grammar choices
- Context-appropriate informality
2. Sentence Structure Uniformity
What We Check:
- Sentence Length Distribution: Variation in sentence lengths
- Structural Patterns: Repetitive sentence structures
- Complexity Variation: Mix of simple, compound, complex sentences
- Opening Patterns: How sentences begin
Red Flags:
- Most sentences 15-25 words (AI sweet spot)
- Repetitive subject-verb-object patterns
- Every paragraph starts with topic sentence
- Excessive use of transition words at sentence starts
- Predictable sentence complexity patterns
Human Writing Typically Has:
- Wide variation (5-40+ word sentences)
- Unpredictable sentence structures
- Occasional fragments for emphasis
- Natural flow without forced transitions
3. Paragraph Structure Analysis
What We Check:
- Paragraph Length: Uniformity vs. natural variation
- Structural Pattern: Topic sentence + support + conclusion pattern
- Information Flow: Natural vs. algorithmic organization
- Paragraph Transitions: Connection between paragraphs
Red Flags:
- All paragraphs 4-6 sentences long
- Every paragraph follows same structure
- Mechanical transitions between paragraphs
- Perfectly balanced paragraph lengths
- No single-sentence paragraphs
Human Writing Typically Has:
- Paragraph length variation (1-10+ sentences)
- Structural diversity based on content
- Natural transitions
- Strategic use of short/long paragraphs for emphasis
4. Word Frequency Analysis (AI-Typical Words)
High-Risk AI Words (overused by LLMs):
Verbs:
- “delve” (rarely used by humans)
- “leverage” (business jargon)
- “utilize” (instead of “use”)
- “facilitate” (overly formal)
- “demonstrate” (overused)
- “implement” (in non-technical contexts)
- “enhance” (marketing language)
Adjectives:
- “robust” (technical overuse)
- “comprehensive” (vague intensifier)
- “innovative” (buzzword)
- “cutting-edge” (cliché)
- “significant” (statistical overuse)
- “substantial” (formal overuse)
- “considerable” (formal overuse)
- “crucial” (intensity overuse)
Transition Words (overused):
- “furthermore” (very formal)
- “moreover” (archaic feeling)
- “additionally” (redundant)
- “consequently” (overused)
- “subsequently” (temporal overuse)
- “nevertheless” (formal overuse)
- “nonetheless” (synonym overuse)
Phrases:
- “it is important to note that”
- “it should be emphasized that”
- “a comprehensive analysis of”
- “in the context of”
- “with respect to”
- “in terms of”
- “in order to” (instead of “to”)
Detection Criteria:
- Count frequency per 1000 words
- Compare to human academic writing baselines
- Flag if 3+ high-risk words per 1000 words
- Weight by word rarity (delve = high weight)
5. Punctuation Patterns
What We Check:
- Semicolon Usage: Frequency and correctness
- Colon Usage: Perfect usage patterns
- Em-Dash Usage: Consistent stylistic choices
- Comma Patterns: Perfection vs. natural variation
- Ellipsis/Exclamation: Absence in informal contexts
Red Flags:
- Excessive semicolon use (2+ per paragraph)
- Perfect colon usage throughout
- Consistent em-dash formatting (â)
- No missing commas anywhere
- Zero informal punctuation
Human Writing Typically Has:
- Inconsistent punctuation choices
- Occasional missing/extra commas
- Variable dash formatting (- vs — vs â)
- Some informal punctuation where appropriate
Confidence Scoring System
Scoring Formula
Overall Confidence = Weighted average of:
- Grammar perfection: 20%
- Sentence uniformity: 25%
- Paragraph structure: 20%
- AI-typical words: 25%
- Punctuation patterns: 10%
Each metric scored 0-100, then combined with weights.
Confidence Levels
Low Confidence (0-30%): Likely Human Writing
Characteristics:
- Natural sentence length variation (5-40+ words)
- Occasional grammatical imperfections
- Authentic voice and natural flow
- Domain-specific terminology used naturally
- Structural variety in paragraphs
- Minimal AI-typical words (0-2 per 1000 words)
Action: â Writing appears authentic, no changes needed
Medium Confidence (30-70%): Possible AI Assistance
Characteristics:
- Some uniformity in sentence structure
- Mix of AI-typical and natural patterns
- May be human-edited AI output
- Overly formal in places
- Some transition word overuse
- 3-5 AI-typical words per 1000 words
Action: â ï¸ Review flagged sections, apply suggestions selectively
Examples of Mixed Writing:
- AI-generated first draft with heavy human editing
- Human writing that mimics academic formality excessively
- Non-native English speakers using formal templates
- Multiple authors with different styles
High Confidence (70-100%): Likely AI-Generated
Characteristics:
- Excessive uniformity across all metrics
- Multiple AI-typical word clusters
- Perfect grammar and punctuation throughout
- Artificial transition patterns
- Mechanical paragraph structure
- 6+ AI-typical words per 1000 words
Action: ð« Significant revision needed, rewrite in authentic voice
Output Format
When running AI-check analysis, generate a comprehensive report:
1. Executive Summary
Overall Confidence Score: 65%
Status: MEDIUM - Possible AI assistance detected
Files Analyzed: 1
Total Words: 3,456
Recommendation: Review flagged sections
2. Metric Breakdown
Grammar Perfection: 85% (High - suspiciously few errors)
Sentence Uniformity: 72% (High - repetitive structures)
Paragraph Structure: 68% (Medium - some variation)
AI-Typical Words: 58% (Medium - 4.2 per 1000 words)
Punctuation Patterns: 45% (Low - natural variation)
3. Flagged Sections
Lines 45-67 (Confidence: 82%)
Pattern: Excessive transition words + uniform sentences
AI Words: "moreover", "furthermore", "leverage", "robust"
Lines 112-134 (Confidence: 76%)
Pattern: Perfect grammar + mechanical structure
AI Words: "delve", "comprehensive", "facilitate"
4. Specific Issues Detected
High-Risk AI Words Found (per 1000 words):
⢠"delve" (2 occurrences) - RARELY used by humans
⢠"leverage" (3 occurrences) - Business jargon overuse
⢠"robust" (4 occurrences) - Technical overuse
⢠"furthermore" (6 occurrences) - Formal transition overuse
Sentence Uniformity Issues:
⢠67% of sentences are 15-25 words (AI sweet spot)
⢠82% of paragraphs start with transition words
⢠Low variation in sentence complexity
Paragraph Structure Issues:
⢠All paragraphs 4-6 sentences long
⢠Mechanical topic-sentence pattern throughout
5. Word Frequency Report
Top AI-Typical Words:
1. "furthermore" - 6x (baseline: 0.5x per 1000 words)
2. "robust" - 4x (baseline: 0.8x per 1000 words)
3. "leverage" - 3x (baseline: 0.3x per 1000 words)
4. "comprehensive" - 3x (baseline: 1.2x per 1000 words)
5. "delve" - 2x (baseline: 0.1x per 1000 words)
Comparison to Human Academic Writing:
Your text: 4.2 AI-typical words per 1000
Human baseline: 1.5 AI-typical words per 1000
Ratio: 2.8x higher than human baseline
Improvement Suggestions
For High Confidence (70-100%) Detections
Sentence Structure:
- â “Furthermore, the results demonstrate a comprehensive analysis of the robust dataset.”
- â “The results show our analysis covered the full dataset.”
Why Better: Simpler words, no transition word, more direct
Word Choice:
- â “This study delves into the utilization of innovative methodologies.”
- â “We examine how researchers use new methods.”
Why Better: Active voice, common words, clearer meaning
Paragraph Variation:
- â All paragraphs 5 sentences, topic sentence + 3 support + conclusion
- â Mix paragraph lengths: 2, 7, 4, 3, 6 sentences based on content needs
Why Better: Natural flow based on content, not formula
Specific Suggestion Categories
1. Vary Sentence Lengths
Current: 15-25 word sentences consistently
Suggestion: Mix short (5-10), medium (15-20), long (25-35) sentences
Example:
- Short: "The effect was significant."
- Medium: "We observed a 23% increase across all conditions."
- Long: "This finding aligns with previous work showing that..."
2. Replace AI-Typical Words
Replace â With
- "delve into" â "examine", "explore", "investigate"
- "leverage" â "use", "apply", "employ"
- "utilize" â "use"
- "robust" â "strong", "reliable", "thorough"
- "facilitate" â "enable", "help", "allow"
- "furthermore" â "also", "next", [or remove]
- "moreover" â "additionally", "also", [or use dash]
- "comprehensive" â "complete", "thorough", "full"
3. Add Natural Imperfections (Where Appropriate)
- Use contractions in appropriate contexts ("it's", "we'll")
- Include domain-specific jargon naturally
- Allow informal phrasing in methods/procedures
- Use occasional sentence fragments for emphasis
- Add personal observations or interpretations
- Include field-specific colloquialisms
4. Break Paragraph Uniformity
Current: All paragraphs follow topic-support-support-support-conclusion
Suggestion: Vary based on content
- Use single-sentence paragraphs for emphasis
- Combine related ideas into longer paragraphs
- Don't force every paragraph to have 5 sentences
- Let content determine structure, not formula
5. Remove Mechanical Transitions
â "Furthermore, the results show... Moreover, the analysis reveals..."
â
"The results show... The analysis also reveals..." [simpler transitions]
â
"The results show... Looking closer, the analysis..." [natural bridges]
Integration Points
1. Pre-Commit Hook Integration
Automatic checking before git commits
# Configured in .claude/settings.json
"gitPreCommit": {
"command": "python3 hooks/pre-commit-ai-check.py",
"enabled": true
}
Behavior:
- Runs on staged
.md,.tex,.rstfiles - Warns if confidence 30-70%
- Blocks commit if confidence >70%
- User can override with
git commit --no-verify
Exit Codes:
- 0: Pass (confidence <30%)
- 1: Warning (confidence 30-70%, commit allowed)
- 2: Block (confidence >70%, commit blocked)
2. Quality Assurance Integration
Part of comprehensive QA workflow
Integrated into code/quality_assurance/qa_manager.py:
- Runs during manuscript phase QA validation
- Checks all deliverable documents
- Generates detailed QA report section
- Fails QA if confidence >40%
Configuration (.ai-check-config.yaml):
qa_integration:
enabled: true
max_confidence_threshold: 0.40
check_manuscripts: true
check_documentation: true
generate_detailed_reports: true
3. Manuscript Writer Agent Integration
Real-time feedback during writing
Agent checks writing incrementally:
- After drafting each section
- Before moving to next phase
- Applies suggestions automatically
- Re-checks until confidence <30%
Agent Workflow:
- Draft section
- Run ai-check skill
- Review detection results
- Apply improvement suggestions
- Re-check until authentic
- Proceed to next section
4. Standalone Skill Usage
Manual invocation by user or agents
User Invocation:
Please run ai-check on docs/manuscript/discussion.tex and provide detailed feedback.
Agent Invocation:
I'll use the ai-check skill to verify this text before proceeding.
CLI Tool:
python tools/ai_check.py path/to/file.md
python tools/ai_check.py --directory docs/
python tools/ai_check.py --format html --output report.html
Tracking System
Historical Tracking
Log all AI-check runs to database for evolution tracking:
Database Schema (PostgreSQL via research-database MCP):
CREATE TABLE ai_check_history (
id SERIAL PRIMARY KEY,
file_path TEXT NOT NULL,
git_commit TEXT,
timestamp TIMESTAMP DEFAULT NOW(),
overall_confidence FLOAT,
grammar_score FLOAT,
sentence_score FLOAT,
paragraph_score FLOAT,
word_score FLOAT,
punctuation_score FLOAT,
ai_words_found JSONB,
flagged_sections JSONB
);
Trend Analysis
Track writing evolution:
File: docs/manuscript/discussion.tex
Version History:
2025-01-15: 78% confidence (HIGH - likely AI)
2025-01-18: 52% confidence (MEDIUM - revision 1)
2025-01-20: 34% confidence (LOW-MEDIUM - revision 2)
2025-01-22: 18% confidence (LOW - authentic writing)
Trend: â
Improving toward authentic writing
Use Cases:
- Monitor dissertation chapters over time
- Track improvements after applying suggestions
- Demonstrate writing authenticity to committee
- Identify sections needing more work
Configuration
Configuration File: .ai-check-config.yaml
# AI-Check Skill Configuration
# Pre-Commit Hook Settings
pre_commit:
enabled: true
check_files: [".md", ".tex", ".rst", ".txt"]
check_docstrings: true # Check Python docstrings
block_threshold: 0.70 # Block commit if >= 70%
warn_threshold: 0.30 # Warn if >= 30%
exclude_patterns:
- "*/examples/*"
- "*/tests/*"
- "*/node_modules/*"
- "*/.venv/*"
# Quality Assurance Integration
qa_integration:
enabled: true
max_confidence_threshold: 0.40 # Fail QA if >= 40%
check_manuscripts: true
check_documentation: true
generate_detailed_reports: true
track_history: true
# Detection Parameters
detection:
# Weight each metric (must sum to 1.0)
weights:
grammar_perfection: 0.20
sentence_uniformity: 0.25
paragraph_structure: 0.20
ai_word_frequency: 0.25
punctuation_patterns: 0.10
# AI-typical word lists
ai_words:
high_risk: ["delve", "leverage", "utilize"]
medium_risk: ["robust", "comprehensive", "facilitate"]
transitions: ["furthermore", "moreover", "additionally"]
# Thresholds
ai_words_per_1000_threshold: 3.0
human_baseline_per_1000: 1.5
# Report Generation
reporting:
default_format: "markdown" # markdown, json, html
include_suggestions: true
include_word_frequency: true
include_flagged_sections: true
max_flagged_sections: 10
# Tracking
tracking:
enabled: true
database: "research-database-mcp"
retention_days: 365
Per-Project Overrides
Create .ai-check.local.yaml for project-specific settings:
# Project-specific overrides
pre_commit:
block_threshold: 0.60 # More lenient for early drafts
detection:
ai_words:
high_risk: ["delve"] # Only flag worst offenders
Examples
Example 1: High Confidence Detection
Input Text:
Furthermore, this comprehensive study delves into the robust
methodologies utilized to facilitate the implementation of innovative
approaches. Moreover, the analysis demonstrates significant findings
that leverage state-of-the-art techniques. Subsequently, the results
indicate substantial improvements across all metrics. Nevertheless,
additional research is crucial to fully comprehend the implications.
AI-Check Report:
Overall Confidence: 89% (HIGH - Likely AI-generated)
Issues Detected:
- 8 AI-typical words in 60 words (13.3 per 1000 words!)
- Every sentence starts with transition word
- Uniform sentence length (15-18 words each)
- Perfect grammar, zero natural imperfections
- Mechanical paragraph structure
AI Words Found:
- furthermore, comprehensive, delves, robust
- utilized, facilitate, innovative, leverage
- demonstrates, significant, subsequently, substantial
- nevertheless, crucial, comprehend
Recommendation: Complete rewrite recommended
Suggested Revision:
We examined the methods used in this approach. The analysis shows
clear improvements across metrics. However, more research is needed
to understand the full implications.
(23 words, 12% confidence - much more natural)
Example 2: Medium Confidence Detection
Input Text:
The experimental design followed standard protocols established in
previous work (Smith et al., 2023). We collected data from 150
participants over six months. Statistical analysis used mixed-effects
models to account for repeated measures. The results showed a
significant main effect of condition (p < 0.001).
AI-Check Report:
Overall Confidence: 35% (MEDIUM - Possible minor AI assistance)
Issues Detected:
- Slightly uniform sentence length (11-15 words)
- One AI-typical word: "significant" (statistical context acceptable)
- Otherwise natural academic writing
Recommendation: Minor revisions optional, writing appears largely authentic
Example 3: Low Confidence (Human Writing)
Input Text:
OK so here's what we found. The effect was huge - way bigger than
expected. Participants in the experimental group scored 23% higher
on average. This wasn't just statistically significant; it was
practically meaningful.
We're still not sure why. Maybe it's the timing? Could be the
instructions were clearer. Need to run follow-ups.
AI-Check Report:
Overall Confidence: 8% (LOW - Clearly human writing)
Human Writing Indicators:
- Natural sentence variation (4-19 words)
- Informal elements ("OK so", "way bigger")
- Incomplete thoughts and questions
- Natural uncertainty expressions
- Zero AI-typical words
- Authentic voice throughout
Recommendation: Writing is authentic, no changes needed
Best Practices
For PhD Students
-
Run Before Advisor Meetings
- Check chapters before sending to advisor
- Ensure authenticity before committee review
- Track improvements over time
-
Use During Drafting
- Check each section after writing
- Apply suggestions immediately
- Develop natural writing habits
-
Pre-Submission Validation
- Run on complete manuscripts before journal submission
- Check supplementary materials
- Verify all documentation
For Research Teams
-
Establish Team Standards
- Set agreed-upon confidence thresholds
- Define when to block vs. warn
- Create team-specific word lists
-
Code Review Integration
- Check documentation in pull requests
- Validate README files and guides
- Ensure authentic technical writing
-
Track Team Writing
- Monitor trends across team members
- Identify systematic issues
- Share improvement strategies
For Journal Submission
-
Pre-Submission Checklist
- Overall confidence <30%
- No flagged high-risk sections
- AI-typical words <2 per 1000 words
- Natural sentence variation present
- Authentic academic voice throughout
-
Demonstrating Authenticity
- Include AI-check reports in submission materials
- Show writing evolution over time
- Document revision process
Limitations
What This Skill Cannot Do
-
Not 100% Accurate
- LLMs constantly improving
- Patterns evolve over time
- False positives possible (very formal human writing)
- False negatives possible (heavily edited AI text)
-
Cannot Detect All AI Usage
- Well-edited AI text may pass
- Human writing in AI style may be flagged
- Paraphrasing tools may evade detection
- Future models may have different patterns
-
Domain Limitations
- Trained primarily on academic writing
- May not work well for creative writing
- Technical jargon may affect scores
- Non-English text not supported
Use Alongside Human Judgment
This skill is a tool, not a replacement for human judgment:
- Use confidence scores as guidance, not absolute truth
- Consider context and field-specific norms
- Combine with plagiarism detection tools
- Maintain academic integrity standards
- Update word lists as AI patterns evolve
Support
Troubleshooting
Problem: False positive on authentic writing Solution: Check if writing is overly formal. Consider field-specific norms. Adjust thresholds in config.
Problem: AI text passing with low confidence Solution: Update AI-typical word lists. Check for heavily edited text. Report patterns for skill updates.
Problem: Pre-commit hook too slow Solution: Reduce checked file types. Enable caching. Check only modified sections.
Problem: Disagreement with manual review Solution: Generate detailed report. Review flagged sections specifically. Consider multiple metrics not just overall score.
Getting Help
- Documentation: See
docs/skills/ai-check-reference.md - Issues: https://github.com/astoreyai/ai_scientist/issues
- Updates: Check for skill updates regularly as AI patterns evolve
Last Updated: 2025-11-09 Version: 1.0.0 License: MIT