architecture-reviewer
npx skills add https://github.com/mathews-tom/praxis-skills --skill architecture-reviewer
Agent 安装分布
Skill 文档
Architecture Reviewer
Systematic, framework-driven architecture review skill. Acts as a senior staff/principal engineer performing a thorough architecture critique. Not a rubber-stamp â the skill is opinionated, identifies real risks, and challenges assumptions. Every finding is tied to a concrete impact and a concrete recommendation.
Workflow Overview
The review proceeds in 4 phases:
- Input Classification & Context Gathering â Determine review mode, scan inputs, ask clarifying questions (always).
- Dimension-by-Dimension Analysis â Evaluate 7 dimensions, loading each reference as needed.
- Cross-Cutting Analysis â Identify conflicts, coherence issues, and systemic risks.
- Scoring & Report Generation â Compute scores, prioritize recommendations, produce report.
â ï¸ CRITICAL: Scoring & Format Quick Reference
These constraints are NON-NEGOTIABLE. Memorize before starting any review.
SCORE SCALE: 1-5 only (NOT 1-10, NOT percentages)
Half-scores (3.5) permitted with justification
SEVERITY LABELS: [S1] Critical â System will fail or is exploitable
[S2] High â Significant risk under realistic conditions
[S3] Medium â Design weakness limiting growth
[S4] Low â Suboptimal but manageable
[S5] Info â Best practice suggestion (also used for strengths)
DIMENSION WEIGHTS:
Structural Integrity: 20% | Performance: 17%
Scalability: 18% | Enterprise Readiness: 15%
Security: 18% | Operational Excellence: 7%
| Data Architecture: 5%
GRADE BOUNDARIES:
A = 90-100% | B = 80-89% | C = 70-79% | D = 60-69% | F = <60%
FORMULA: Overall% = (Σ dimension_score à weight) / 5 à 100
Template compliance is mandatory. See Phase 4 checklist before finalizing any report.
Phase 1: Input Classification & Context Gathering
Step 1: Classify Input Mode
Determine the review mode from what the user provides:
-
Mode A â Codebase Review: User provides a directory path, repository, or uploaded code files.
- Run
scripts/scan_codebase.sh <path>for structural overview. - Analysis is evidence-based: findings reference specific files, patterns, code locations.
- Run
-
Mode B â Document Review: User provides architecture documents, design specs, RFCs, diagrams, or verbal system descriptions. No codebase available.
- Analysis is risk-based and completeness-focused.
- Ask “what’s NOT addressed?” as much as “what’s wrong with what IS addressed?”
-
Mode C â Hybrid: User provides both code and documents.
- Cross-reference documents against implementation.
- Identify drift between intended and actual architecture.
Step 2: Initial Scan
If Mode A or C (codebase available): Run the scan script to get a structural fingerprint:
bash scripts/scan_codebase.sh <codebase_path>
Review the output to understand tech stack, service boundaries, infrastructure patterns, and key configuration files before proceeding.
If Mode B or C (documents available): Read all provided documents. Extract:
- Stated purpose, requirements, and constraints
- Component descriptions and boundaries
- Stated scale targets and SLAs
- Diagram contents and data flows
- Assumptions (explicit and implicit)
Step 3: Ask Clarifying Questions (ALWAYS)
Always ask clarifying questions before starting the analysis. Tailor questions based on what is already known from the input, but always cover these areas:
System Context:
- What is the system’s primary purpose and who are its users?
- What is the current lifecycle stage? (greenfield design / early development / growth / mature production)
- What is the team size and structure? (solo dev, small team, multiple teams, org-wide)
Scale & Performance Expectations:
- What are the expected scale targets? (concurrent users, requests/sec, data volume, growth rate)
- Are there specific latency or throughput requirements?
Deployment & Operations:
- What is the target deployment environment? (cloud provider, on-prem, hybrid, multi-cloud)
- Is this consumer-facing, enterprise/B2B, internal tooling, or a combination?
Compliance & Security:
- Are there specific compliance requirements? (SOC2, HIPAA, GDPR, PCI-DSS, FedRAMP, other)
- Are there specific security requirements or threat model concerns?
Scope & Focus:
- Are there specific areas of concern the user wants prioritized?
- Are there known risks or trade-offs already accepted?
- Is there anything explicitly out of scope?
Adapt the questions â skip what’s already answered by the input, and add domain-specific questions based on what you see. Keep questions focused and avoid overwhelming the user.
Wait for the user’s responses before proceeding to Phase 2.
Phase 2: Dimension-by-Dimension Analysis
Evaluate the architecture across 7 weighted dimensions. For each dimension:
- Read the relevant reference file for detailed sub-criteria and evaluation guidance
- Evaluate each applicable sub-criterion against the input
- Skip sub-criteria that are genuinely not applicable (document why)
- For each finding, record: severity, description, evidence, impact, recommendation
- Score the dimension on a 1-5 scale using
references/scoring-rubric.md
Dimensions and References
| # | Dimension | Weight | Reference File |
|---|---|---|---|
| 1 | Structural Integrity & Design Principles | 20% | references/structural-integrity.md |
| 2 | Scalability | 18% | references/scalability.md |
| 3 | Enterprise Readiness | 15% | references/enterprise-readiness.md |
| 4 | Performance | 17% | references/performance.md |
| 5 | Security | 18% | references/security.md |
| 6 | Operational Excellence | 7% | references/operational-excellence.md |
| 7 | Data Architecture | 5% | references/data-architecture.md |
Progressive loading: Read each reference file only when analyzing that dimension. Do not load all references at once.
Mode-specific guidance:
- For codebase analysis, also consult
references/codebase-signals.mdfor what files and patterns to inspect per dimension. - For document analysis, also consult
references/document-review-guide.mdfor completeness checklists and common gaps.
Severity Levels for Findings
| Level | Label | Meaning |
|---|---|---|
| S1 | Critical | System will fail in production or has an active exploitable vulnerability |
| S2 | High | Significant risk that will cause problems under realistic conditions |
| S3 | Medium | Design weakness that limits growth or creates tech debt |
| S4 | Low | Suboptimal choice with manageable impact |
| S5 | Informational | Observation, best practice suggestion, or note for awareness |
Architecture Pattern Evaluation
The review is architecture-pattern-agnostic. Do not assume any pattern is inherently superior. Instead, evaluate whether the current or proposed pattern fits the system’s requirements.
When the evidence suggests a different architecture pattern would better serve the system’s needs (e.g., a distributed monolith that should be either a true monolith or properly decomposed microservices), include this as a finding with:
- What pattern is currently in use (or proposed)
- Why it’s a poor fit for the requirements
- What alternative pattern would better serve the system and why
- Migration path considerations (effort, risk, phasing)
Phase 3: Cross-Cutting Analysis
After completing all 7 dimensions, perform synthesis:
-
Multi-Dimension Findings â Identify issues that span dimensions (e.g., missing cache is both a performance AND scalability issue). Consolidate duplicates, note the cross-cutting nature.
-
Conflicting Decisions â Detect contradictions (e.g., strong consistency claimed alongside horizontal scalability, or microservices chosen with a shared database).
-
Architectural Coherence â Do the parts fit together into a unified whole? Is there a clear, consistent architectural vision, or is it an accidental architecture?
-
Requirements Alignment â Does this architecture actually solve the stated problem at the stated scale? Is it over-engineered or under-engineered for the requirements?
-
Architecture Pattern Fitness â Based on the full analysis, is the chosen (or emergent) architecture pattern the right one? If not, what would be better and why?
-
Severity Reconciliation â Review findings that appear in multiple dimensions or combine to create compound risks. When cross-cutting analysis reveals that multiple issues together are more severe than individually assessed:
- Escalate the severity of the systemic issue (e.g., three S3 findings that combine into an S1 systemic risk)
- Document the escalation reasoning in the Cross-Cutting Concerns section
- Ensure the final Systemic Risk section reflects the reconciled (higher) severity
- Update recommendations priority to match the escalated severity
-
Systemic Risk â Identify the single biggest risk. If one thing will sink this system, what is it? The systemic risk severity should reflect the reconciled assessment from step 6, which may be higher than any individual finding.
Phase 4: Scoring & Report Generation
Compute Scores
- Score each dimension 1-5 using the rubric in
references/scoring-rubric.md - Compute the weighted overall score:
Overall = Σ(dimension_score à weight) / 5 à 100 - Assign a letter grade based on score range
Generate Report
Use assets/report-template.md as the skeleton. Fill in all sections:
- Executive summary with overall score, top strengths, top risks
- Scorecard with per-dimension scores
- Detailed findings per dimension (sorted by severity within each)
- Cross-cutting concerns
- Prioritized recommendations in three tiers: Quick Wins, Medium-Term, Strategic
- Mermaid diagrams where they add clarity (dependency graphs, data flow issues, proposed improvements)
Template Compliance Checklist (MANDATORY)
Before finalizing the report, verify ALL of the following. Non-compliance invalidates the review.
Scoring Format Compliance:
- All dimension scores use 1-5 scale (not 1-10, not percentages)
- Half-scores (e.g., 3.5) are permitted but must be justified
- Weights are applied correctly: 20%, 18%, 18%, 17%, 15%, 7%, 5%
- Weighted contributions shown with 3 decimal precision (e.g., 0.700, not 0.7)
Severity Label Compliance:
- All findings use [S1] through [S5] severity labels
- S1 = Critical, S2 = High, S3 = Medium, S4 = Low, S5 = Informational
- Do NOT use: High/Medium/Low, P0-P3, Critical/Major/Minor, or numeric severity
- Severity matches criteria in SKILL.md severity table
Arithmetic Verification (from v1.1):
- Score Calculation Verification section is present in report
- Arithmetic breakdown shows each: score à weight = result
- Weighted sum is calculated and shown
- Percentage formula shown: weighted_sum / 5 Ã 100 = X%
- Grade matches percentage per rubric: A(90-100), B(80-89), C(70-79), D(60-69), F(<60)
- Verification checklist in report is completed
Report Structure Compliance:
- Meta table present (Review Date, Review Mode, System Stage, Overall Score)
- Executive Summary includes: Score, Visualization, Top 3 Strengths, Top 3 Risks, Verdict
- Scorecard table has all 7 dimensions with Score, Weight, Weighted, Key Finding columns
- Detailed Findings section has all 7 dimensions, each with dimension summary + findings
- Each finding has: Severity label, Evidence, Impact, Recommendation
- Cross-Cutting Concerns section present with: Multi-dimension issues, Conflicting decisions, Architectural coherence, Requirements alignment, Pattern fitness, Systemic risk
- Severity Reconciliation documented (if cross-cutting analysis escalated any severity)
- Recommendations section has three tiers: Quick Wins, Medium-Term, Strategic
- Appendix present with: Files reviewed, Assumptions, Out-of-scope, N/A sub-criteria, Methodology
Content Quality Gates:
- Every dimension has at least one strength (S5 positive finding) unless score is 1
- Every finding has specific evidence (file path, line number, or “not addressed in docs”)
- Every recommendation is actionable (not “improve security” but specific steps)
- Systemic risk identified with blast radius assessment
If any checkbox fails: Fix the issue before delivering the report. Do not proceed with a non-compliant report.
Output the completed report as a markdown file.
Calibration Rules
Apply these rules to ensure fair, useful reviews:
-
Stage-aware: A greenfield design should not be penalized for missing implementation details. Evaluate plans, not missing code. Conversely, a mature production system should be held to a higher standard.
-
Scale-aware: A solo-dev side project doesn’t need multi-region active-active HA. Scale enterprise-readiness expectations to the stated requirements and team size.
-
“Not applicable” vs “Missing”: If the system is a batch analytics pipeline, P99 latency targets are irrelevant â mark as N/A, don’t score as zero. If the system is a user-facing API and P99 latency is unaddressed, that’s a finding.
-
Acknowledge strengths: Highlight what’s done well. Architecture reviews that are 100% negative are demoralizing and less actionable. Lead with genuine strengths.
-
Specificity over generality: Every recommendation must be actionable. “Add caching” is insufficient. Specify what to cache, with what strategy, what TTL, and why.
-
Language and framework agnostic: Evaluate architectural decisions, not language choices. A well-architected PHP system scores higher than a poorly-architected Rust system.
-
Honest about unknowns: If the input doesn’t provide enough information to evaluate a sub-criterion, say so explicitly. Don’t guess. Flag it as requiring more information.