deep-dive-analysis
npx skills add https://github.com/acaprino/alfio-claude-plugins --skill deep-dive-analysis
Agent 安装分布
Skill 文档
Deep Dive Analysis Skill
Overview
This skill combines mechanical structure extraction with Claude’s semantic understanding to produce comprehensive codebase documentation. Unlike simple AST parsing, this skill captures:
- WHAT the code does (structure, functions, classes)
- WHY it exists (business purpose, design decisions)
- HOW it integrates (dependencies, contracts, flows)
- CONSEQUENCES of changes (side effects, failure modes)
Capabilities
Mechanical Analysis (Scripts):
- Extract code structure (classes, functions, imports)
- Map dependencies (internal/external)
- Find symbol usages across the codebase
- Track analysis progress
- Classify files by criticality
Semantic Analysis (Claude AI):
- Recognize architectural and design patterns
- Identify red flags and anti-patterns
- Trace data and control flows
- Document contracts and invariants
- Assess quality and maintainability
Documentation Maintenance:
- Review and maintain documentation (Phase 8)
- Fix broken links and update navigation indexes
- Analyze and rewrite code comments (antirez standards)
Use this skill when:
- Analyzing a codebase you’re unfamiliar with
- Generating documentation that explains WHY, not just WHAT
- Identifying architectural patterns and anti-patterns
- Performing code review with semantic understanding
- Onboarding to a new project
- Creating documentation for new contributors
Prerequisites
- analysis_progress.json must exist in project root (created by DEEP_DIVE_PLAN setup)
- DEEP_DIVE_PLAN.md should be reviewed to understand phase structure
CRITICAL PRINCIPLE: ABSOLUTE SOURCE OF TRUTH
THE DOCUMENTATION GENERATED BY THIS SKILL IS THE ABSOLUTE AND UNQUESTIONABLE SOURCE OF TRUTH FOR YOUR PROJECT.
ANY INFORMATION NOT VERIFIED WITH IRREFUTABLE EVIDENCE FROM SOURCE CODE IS FALSE, UNRELIABLE, AND UNACCEPTABLE.
IMPORTANT LIMITATION: Verification is Multi-Layer
ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
â VERIFICATION TRUST MODEL â
â âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ£
â Layer 1: TOOL-VALIDATED â
â âââ Automated checks: file exists, line in range, AST symbol match â
â âââ Marker: [VALIDATED: file.py:123 @ 2025-12-20] â
â â
â Layer 2: HUMAN-VERIFIED â
â âââ Manual review: semantic correctness, behavior match â
â âââ Marker: [VERIFIED: file.py:123 by @reviewer @ 2025-12-20] â
â â
â Layer 3: RUNTIME-CONFIRMED â
â âââ Log/trace evidence of actual behavior â
â âââ Marker: [CONFIRMED: trace_id=abc123 @ 2025-12-20] â
ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
Tool validation catches STRUCTURAL issues (file moved, line shifted, symbol renamed).
Human verification ensures SEMANTIC correctness (code does what doc says).
Runtime confirmation proves BEHAVIORAL truth (system actually works this way).
ALL THREE LAYERS are required for critical documentation.
The Iron Law of Documentation
ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
â DOCUMENTATION = f(SOURCE_CODE) + VERIFICATION â
â â
â If NOT verified_against_code(statement) â statement is FALSE â
â If NOT exists_in_codebase(reference) â reference is FABRICATED â
â If NOT traceable_to_source(claim) â claim is SPECULATION â
ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
Mandatory Rules (VIOLATION = FAILURE)
- NEVER document anything without reading the actual source code first
- NEVER assume any existing documentation, comment, or docstring is accurate
- NEVER write documentation based on memory, inference, or “what should be”
- ALWAYS derive truth EXCLUSIVELY from reading and tracing actual code
- ALWAYS provide source file + line number for every technical claim
- ALWAYS verify state machines, enums, constants against actual definitions
- TREAT all pre-existing docs as unverified claims requiring validation
- MARK any unverifiable statement as
[UNVERIFIED - REQUIRES CODE CHECK]
Verification Requirements
| Documentation Type | Required Evidence |
|---|---|
| Enum/State values | Exact match with source code enum definition |
| Function behavior | Code path tracing, actual implementation reading |
| Constants/Timeouts | Variable definition in source with file:line |
| Message formats | Message class definition, field validation |
| Architecture claims | Import graph analysis, actual class relationships |
| Flow diagrams | Verified against runtime logs OR code path analysis |
Documentation Verification Status
Every section of documentation MUST have one of these status markers:
[VERIFIED: source_file.py:123]– Confirmed against source code[VERIFIED: trace_id=xyz]– Confirmed against runtime logs[UNVERIFIED]– Requires verification before trusting[DEPRECATED]– Code has changed, documentation outdated
UNVERIFIED documentation is UNTRUSTED documentation.
CRITICAL PRINCIPLE: NO HISTORICAL DEPTH
DOCUMENTATION DESCRIBES ONLY THE CURRENT STATE OF THE ART.
NO HISTORY. NO ARCHAEOLOGY. NO “WAS”. ONLY “IS”.
ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
â THE TEMPORAL PURITY PRINCIPLE â
â âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ£
â Documentation = PRESENT_TENSE(current_implementation) â
â â
â FORBIDDEN: â
â â "was/were/previously/formerly/used to" â
â â "deprecated since version X" â just REMOVE it â
â â "changed from X to Y" â only describe Y â
â â "in the old system..." â irrelevant, delete â
â â inline changelogs â use CHANGELOG.md or git â
â â
â REQUIRED: â
â â Present tense: "The system uses..." not "The system used..." â
â â Current state only: Document what IS, not what WAS â
â â Git for archaeology: History lives in version control, not docs â
ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
The Rule:
When you find documentation containing historical language, DELETE IT. Git blame exists for archaeology. Documentation exists for the present.
Available Commands
1. Analyze Single File
Extract structure, dependencies, and usages for one file:
python .claude/skills/deep-dive-analysis/scripts/analyze_file.py \
--file src/utils/circuit_breaker.py \
--output-format markdown
Parameters:
--file/-f: Relative path to file to analyze – REQUIRED--output-format/-o: Output format (json, markdown, summary) – default: summary--find-usages/-u: Also find all usages of exported symbols – default: false--update-progress/-p: Update analysis_progress.json – default: false
Output includes:
- File classification (Critical/High-Complexity/Standard/Utility)
- Classes with methods and attributes
- Functions with signatures
- Internal imports (within project)
- External imports (third-party)
- External calls (database, network, filesystem, messaging, ipc)
- State mutations identified
- Error handling patterns
2. Check Progress
View analysis progress by phase:
python .claude/skills/deep-dive-analysis/scripts/check_progress.py \
--phase 1 \
--status pending
Parameters:
--phase/-p: Filter by phase number (1-7)--status/-s: Filter by status (pending, analyzing, done, blocked)--classification/-c: Filter by classification (critical, high-complexity, standard, utility)--verification-needed: Show only files needing runtime verification
3. Find Usages
Find all usages of a symbol across the codebase:
python .claude/skills/deep-dive-analysis/scripts/analyze_file.py \
--symbol CircuitBreaker \
--file src/utils/circuit_breaker.py
4. Generate Phase Report
Generate documentation for an entire phase:
python .claude/skills/deep-dive-analysis/scripts/analyze_file.py \
--phase 1 \
--output-format markdown \
--output-file docs/01_domains/COMMON_LIBRARY.md
Phase 8: Documentation Review Commands
5. Scan Documentation Health
Discover all documentation files and generate health report:
python .claude/skills/deep-dive-analysis/scripts/doc_review.py scan \
--path docs/ \
--output doc_health_report.json
Output includes:
- Total file count per directory
- Files with TODO/FIXME/TBD markers
- Files missing last_updated metadata
- Large files (>1500 lines) candidates for splitting
6. Validate Links
Find all broken links in documentation:
python .claude/skills/deep-dive-analysis/scripts/doc_review.py validate-links \
--path docs/ \
--fix # Optional: auto-remove broken links
Actions:
- Extracts all relative markdown links
](../path/to/file.md) - Verifies target files exist
- Reports broken links with source file and line number
- With
--fix: removes or updates broken references
7. Verify Against Source Code
Verify documentation accuracy against actual source code:
python .claude/skills/deep-dive-analysis/scripts/doc_review.py verify \
--doc docs/agents/lifecycle.md \
--source src/agents/lifecycle.py
Verification includes:
- Documented states vs actual enum values
- Documented methods vs actual class methods
- Documented constants vs actual values
- Flags discrepancies as DRIFT
8. Update Navigation Indexes
Refresh SEARCH_INDEX.md and BY_DOMAIN.md with current file counts:
python .claude/skills/deep-dive-analysis/scripts/doc_review.py update-indexes \
--search-index docs/00_navigation/SEARCH_INDEX.md \
--by-domain docs/00_navigation/BY_DOMAIN.md
Updates:
- Total file counts
- Files per directory statistics
- Version and last_updated timestamps
- Removes references to deleted files
9. Full Documentation Maintenance
Run complete Phase 8 workflow:
python .claude/skills/deep-dive-analysis/scripts/doc_review.py full-maintenance \
--path docs/ \
--auto-fix \
--output doc_health_report.json
Executes in order:
- Scan documentation health
- Validate and fix broken links
- Identify obsolete files (no inbound links, references deleted code)
- Update navigation indexes
- Generate final health report
Comment Quality Commands (Antirez Standards)
These commands analyze and rewrite code comments following the antirez commenting standards.
10. Analyze Comment Quality
Analyze comments in a single file:
python .claude/skills/deep-dive-analysis/scripts/rewrite_comments.py analyze \
src/main.py \
--report
Options:
--report/-r: Generate detailed markdown report--json: Output as JSON for programmatic use--issues-only/-i: Show only problematic comments
Output includes:
- Comment classification (function, design, why, teacher, checklist, guide)
- Issue detection (trivial, debt, backup comments)
- Suggested rewrites for problematic comments
- Statistics and ratios
11. Scan Directory for Comment Issues
Analyze all Python files in a directory:
python .claude/skills/deep-dive-analysis/scripts/rewrite_comments.py scan \
src/ \
--recursive \
--issues-only
Options:
--recursive/-r: Include subdirectories--issues-only/-i: Show only files with issues--json: Output as JSON
12. Generate Comment Health Report
Create comprehensive markdown report for entire codebase:
python .claude/skills/deep-dive-analysis/scripts/rewrite_comments.py report \
src/ \
--output comment_health.md
Report includes:
- Executive summary with totals
- Comment quality breakdown (keep/enhance/rewrite/delete)
- Comment type distribution
- Files needing attention (ranked by issue count)
- Sample issues with file:line references
- Actionable recommendations
13. Rewrite Comments
Apply comment improvements to a file:
# Dry run (preview changes)
python .claude/skills/deep-dive-analysis/scripts/rewrite_comments.py rewrite \
src/main.py
# Apply changes with backup
python .claude/skills/deep-dive-analysis/scripts/rewrite_comments.py rewrite \
src/main.py \
--apply \
--backup
Options:
--apply/-a: Actually modify the file (default: dry run)--backup/-b: Create .bak backup before modifying--output/-o: Write to different file instead of in-place
Actions taken:
- DELETE: Remove trivial comments and backup (commented-out code)
- REWRITE: Add suggested improvements for debt comments (TODO/FIXME)
14. View Standards Reference
Display the antirez commenting standards:
python .claude/skills/deep-dive-analysis/scripts/rewrite_comments.py standards
Shows the complete taxonomy of good vs bad comments with examples.
Comment Type Classification
| Type | Category | Description | Action |
|---|---|---|---|
| function | GOOD | API docs at function/class top | Keep/Enhance |
| design | GOOD | File-level algorithm explanations | Keep |
| why | GOOD | Explains reasoning behind code | Keep |
| teacher | GOOD | Educates about domain concepts | Keep |
| checklist | GOOD | Reminds of coordinated changes | Keep |
| guide | GOOD | Section dividers, structure | Keep sparingly |
| trivial | BAD | Restates what code says | Delete |
| debt | BAD | TODO/FIXME without plan | Rewrite/Resolve |
| backup | BAD | Commented-out code | Delete |
Comment Quality Workflow
1. SCAN
âââ Run: rewrite_comments.py scan <dir> --recursive
âââ Review files with most issues
âââ Generate: rewrite_comments.py report <dir> --output report.md
2. TRIAGE
âââ Identify high-priority files (critical modules)
âââ Focus on DEBT comments (convert to issues or design docs)
âââ Plan bulk TRIVIAL/BACKUP deletions
3. REWRITE
âââ Run: rewrite_comments.py rewrite <file> --apply --backup
âââ Review changes in diff
âââ Verify no functional changes
4. VERIFY
âââ Run tests to confirm no breakage
âââ Re-scan to confirm improvements
âââ Update comment_health.md report
File Classification Criteria
| Classification | Criteria | Verification |
|---|---|---|
| Critical | Handles authentication, security, encryption, sensitive data | Mandatory |
| High-Complexity | >300 LOC, >5 dependencies, state machines, async patterns | Mandatory |
| Standard | Normal business logic, data models, utilities | Recommended |
| Utility | Pure functions, helpers, constants | Optional |
AI-Powered Semantic Analysis
This skill leverages Claude’s code comprehension capabilities for deep semantic analysis beyond mechanical structure extraction.
The Semantic Analysis Mandate
ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
â STRUCTURE vs MEANING â
â âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ£
â â
â Scripts extract STRUCTURE: "class Foo with method bar()" â
â Claude extracts MEANING: "Foo implements Repository pattern for â
â caching user sessions with TTL expiration" â
â â
â NEVER stop at structure. ALWAYS pursue understanding. â
â â
ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
Five Layers of Understanding
| Layer | What | Who Does It |
|---|---|---|
| 1. WHAT | Classes, functions, imports | Scripts (AST) |
| 2. HOW | Algorithm details, data flow | Claude’s first pass |
| 3. WHY | Business purpose, design decisions | Claude’s deep analysis |
| 4. WHEN | Triggers, lifecycle, concurrency | Claude’s behavioral analysis |
| 5. CONSEQUENCES | Side effects, failure modes | Claude’s systems thinking |
Semantic Analysis Questions
For every code unit, Claude must answer:
Identity:
- What is this code’s single responsibility?
- What abstraction does it represent?
- What would break if this didn’t exist?
Behavior:
- What are ALL inputs and outputs (including side effects)?
- What state does it read? What does it mutate?
- What are preconditions and postconditions?
Integration:
- Who calls this? Under what circumstances?
- What does this call? Why those dependencies?
- What contracts does it fulfill?
Quality:
- What could go wrong? How is failure handled?
- Are there implicit assumptions that could break?
- Are there race conditions or timing dependencies?
Pattern Recognition
Claude should actively recognize and document common patterns:
| Pattern Type | Examples | Documentation Focus |
|---|---|---|
| Architectural | Repository, Service, CQRS, Event-Driven | Responsibilities, boundaries |
| Behavioral | State Machine, Strategy, Observer, Chain | Transitions, variations |
| Resilience | Circuit Breaker, Retry, Bulkhead, Timeout | Thresholds, fallbacks |
| Data | DTO, Value Object, Aggregate | Invariants, relationships |
| Concurrency | Producer-Consumer, Worker Pool | Thread safety, backpressure |
See references/SEMANTIC_PATTERNS.md for detailed recognition guides.
Red Flags to Identify
Claude should actively flag these issues:
ARCHITECTURE:
â GOD CLASS: >10 public methods or >500 LOC
â CIRCULAR DEPENDENCY: A â B â C â A
â LEAKY ABSTRACTION: Implementation details in interface
RELIABILITY:
â SWALLOWED EXCEPTION: Empty catch blocks
â MISSING TIMEOUT: Network calls without timeout
â RACE CONDITION: Shared mutable state without sync
SECURITY:
â HARDCODED SECRET: Passwords, API keys in code
â SQL INJECTION: String concatenation in queries
â MISSING VALIDATION: Unsanitized user input
Semantic Analysis Template
Use templates/semantic_analysis.md for comprehensive per-file analysis that includes:
- Executive summary (purpose, responsibility, patterns)
- Behavioral analysis (triggers, processing, side effects)
- Dependency analysis (why each dependency exists)
- Quality assessment (strengths, concerns, red flags)
- Contract documentation (full interface semantics)
- Flow tracing (primary and error paths)
- Testing implications (what must be tested)
AI Analysis Workflow
1. SCRIPTS RUN FIRST
âââ classifier.py â File classification
âââ ast_parser.py â Structure extraction
âââ usage_finder.py â Cross-references
2. CLAUDE ANALYZES
âââ Read actual source code
âââ Apply semantic questions
âââ Recognize patterns
âââ Identify red flags
âââ Trace flows
3. CLAUDE DOCUMENTS
âââ Use semantic_analysis.md template
âââ Explain WHY, not just WHAT
âââ Document contracts and invariants
âââ Flag concerns with severity
4. VERIFY
âââ Check against runtime behavior
âââ Validate with code traces
âââ Mark verification status
Reference Documents
references/AI_ANALYSIS_METHODOLOGY.md– Complete analysis methodologyreferences/SEMANTIC_PATTERNS.md– Pattern recognition guidetemplates/semantic_analysis.md– Per-file analysis template
Analysis Loop Workflow
When analyzing a file, follow this sequence:
1. CLASSIFY
âââ Count lines of code
âââ Count dependencies
âââ Check for critical patterns (auth, security, encryption)
âââ Assign classification
2. READ & MAP
âââ Parse AST to extract structure
âââ Identify classes and their methods
âââ Identify standalone functions
âââ Find global variables and constants
âââ Detect state mutations
3. DEPENDENCY CHECK
âââ Internal imports (from project modules)
âââ External imports (third-party)
âââ External calls (database, network, filesystem, messaging, ipc)
4. CONTEXT ANALYSIS
âââ Where are exported symbols used?
âââ What modules import this file?
âââ What message types flow through here?
5. RUNTIME VERIFICATION (if Critical/High-Complexity)
âââ Use log analysis to trace actual behavior
âââ Verify documented flow matches actual flow
âââ Note any discrepancies
6. DOCUMENTATION
âââ Update analysis_progress.json
âââ Generate module report section
âââ Cross-reference with CONTEXT.md
Runtime Verification Integration
For runtime verification of critical/high-complexity files, use your project’s log aggregation system:
- Trace actual behavior through components using correlation IDs
- Verify that documented flows match actual runtime behavior
- Use distributed tracing or structured logs to follow request paths
The goal is to confirm that code paths match documented behavior through runtime evidence.
Output Interpretation
JSON Output Structure
{
"file": "src/utils/circuit_breaker.py",
"classification": "critical",
"metrics": {
"lines_of_code": 245,
"num_classes": 2,
"num_functions": 8,
"num_dependencies": 12
},
"structure": {
"classes": [...],
"functions": [...],
"constants": [...]
},
"dependencies": {
"internal": [...],
"external": [...],
"external_calls": [...]
},
"usages": [...],
"verification_required": true
}
Markdown Output Format
The markdown output follows the template in templates/analysis_report.md and produces sections suitable for inclusion in phase deliverable documents.
Best Practices
Source Code Analysis (Phases 1-7)
- Start with Phase 1: Foundation modules inform understanding of everything else
- Track Progress: Always use
--update-progresswhen completing analysis - Verify Critical Files: Never skip runtime verification for critical/high-complexity
- Cross-Reference: After analysis, update CONTEXT.md links
- Document Drift: Note any discrepancies between existing docs and actual code
Documentation Maintenance (Phase 8)
- Run scan first: Always start with
doc_review.py scanto understand current state - Fix links before content: Broken links indicate structural issues to address first
- Verify against code: Never update documentation without verifying against actual source
- Update indexes last: Navigation indexes should reflect final state after all changes
- Generate health report: Always produce
doc_health_report.jsonas evidence of completion
Documentation Maintenance Workflow
When invoking Phase 8 documentation maintenance, follow this sequence:
1. PLANNING
âââ Run: doc_review.py scan --path docs/
âââ Review health report
âââ Identify priority fixes (broken links, obsolete files)
âââ Create todo list with specific actions
2. EXECUTION (in batches)
âââ Batch 1: Fix broken links
â âââ Run: doc_review.py validate-links --fix
âââ Batch 2: Verify critical docs against source
â âââ Run: doc_review.py verify --doc <file> --source <code>
âââ Batch 3: Delete obsolete files
â âââ Manual review + deletion
âââ Batch 4: Update navigation indexes
â âââ Run: doc_review.py update-indexes
âââ Batch 5: Update timestamps
âââ Set last_updated on verified files
3. VERIFICATION
âââ Run: doc_review.py scan (confirm improvements)
âââ Run: doc_review.py validate-links (confirm zero broken)
âââ Generate final doc_health_report.json
Resources
- Scripts:
scripts/– Python analysis toolsanalyze_file.py– Source code analysis (Phases 1-7)check_progress.py– Progress trackingdoc_review.py– Documentation maintenance (Phase 8)comment_rewriter.py– Comment analysis engine (antirez standards)rewrite_comments.py– Comment quality CLI tool
- Templates:
templates/– Output templatesanalysis_report.md– Module-level report templatesemantic_analysis.md– AI-powered per-file analysis template
- References:
references/– Analysis methodology docsDEEP_DIVE_PLAN.md– Master analysis plan with all phase definitionsANTIREZ_COMMENTING_STANDARDS.md– Complete antirez comment taxonomyAI_ANALYSIS_METHODOLOGY.md– AI semantic analysis methodologySEMANTIC_PATTERNS.md– Pattern recognition guide for Claude
- analysis_progress.json: Progress tracking state
- doc_health_report.json: Documentation health metrics (generated)
- comment_health.md: Comment quality report (generated)