software-architecture
npx skills add https://github.com/sunnypatneedi/claude-starter-kit --skill software-architecture
Agent 安装分布
Skill 文档
Software Architecture
Complete framework for designing software systems that are scalable, maintainable, and aligned with business requirements.
When to Use
- Starting a new project or greenfield development
- Refactoring a monolith
- System is growing beyond current architecture
- Making technology stack decisions
- Designing for scale (10x users expected)
- Multiple teams working on same codebase
- Performance or reliability issues
- Planning microservices migration
Core Principles
Architecture Serves Business:
- Technology choices follow business needs
- Trade-offs are intentional
- Over-engineering is waste
- Simplest solution that works
SOLID Principles:
S - Single Responsibility Principle
O - Open/Closed Principle
L - Liskov Substitution Principle
I - Interface Segregation Principle
D - Dependency Inversion Principle
Other Key Principles:
- DRY (Don’t Repeat Yourself)
- KISS (Keep It Simple, Stupid)
- YAGNI (You Aren’t Gonna Need It)
- Separation of Concerns
- Principle of Least Surprise
Workflow
Step 1: Understand Requirements
Functional Requirements:
## What the System Must Do
**User Stories:**
- As a [user], I want to [action] so that [benefit]
**Features:**
- User authentication
- Product catalog
- Shopping cart
- Payment processing
- Order tracking
**Business Rules:**
- Discount codes can only be used once per user
- Orders over $50 get free shipping
- Inventory decrements on successful payment
Non-Functional Requirements (The “ilities”):
## How the System Must Perform
**Scalability:**
- Support 10K concurrent users
- Handle 100K products in catalog
- Process 1K orders per hour
**Performance:**
- Page load <2 seconds
- API response <100ms (p95)
- Search results <500ms
**Reliability:**
- 99.9% uptime (8.7 hours downtime/year)
- Zero data loss
- Graceful degradation under load
**Security:**
- PCI DSS compliant for payments
- GDPR compliant for EU users
- Data encrypted at rest and in transit
**Maintainability:**
- New developers productive in 1 week
- Deploy multiple times per day
- Rollback within 5 minutes
**Observability:**
- Full request tracing
- Error rate monitoring
- Performance metrics
Step 2: Choose Architectural Pattern
Monolith:
Best for:
- Small teams (<10 people)
- Simple domains
- Early-stage startups
- Rapid iteration
Architecture:
âââââââââââââââââââââââââââ
â Web Application â
â ââââââââ¬âââââââ¬âââââââ â
â â UI âLogic â Data â â
â ââââââââ´âââââââ´âââââââ â
âââââââââââââââââââââââââââ
â
Single Database
Pros:
â
Simple to develop
â
Simple to deploy
â
Simple to test
â
Low latency between components
Cons:
â Scaling requires scaling everything
â Tight coupling
â One failure affects all
â Hard to work on independently
Microservices:
Best for:
- Large teams (multiple squads)
- Complex domains
- Independent scaling needs
- Polyglot requirements
Architecture:
ââââââââââââ ââââââââââââ ââââââââââââ
â User â â Order â â Payment â
â Service â â Service â â Service â
ââââââ¬ââââââ ââââââ¬ââââââ ââââââ¬ââââââ
â â â
â â â
User DB Order DB Payment DB
Pros:
â
Independent deployment
â
Technology flexibility
â
Team autonomy
â
Fault isolation
Cons:
â Network complexity
â Distributed transactions hard
â More operational overhead
â Debugging across services
Event-Driven:
Best for:
- Async workflows
- Real-time data processing
- Audit trails
- Decoupled systems
Architecture:
âââââââââââ ââââââââââââââ
âProducer âââââââ>âEvent Queue â
âââââââââââ âââââââ¬âââââââ
â
ââââââââââââââââ¼âââââââââââââââ
â â â
Consumer 1 Consumer 2 Consumer 3
Pros:
â
Loose coupling
â
Easy to add consumers
â
Natural audit log
â
Handles spikes well
Cons:
â Eventual consistency
â Harder to debug
â Message ordering challenges
â More moving parts
Layered Architecture (N-Tier):
Best for:
- Traditional enterprise apps
- Clear separation of concerns
- Team specialization (frontend/backend/data)
Architecture:
âââââââââââââââââââââââââââ
â Presentation Layer â (UI, API)
âââââââââââââââââââââââââââ¤
â Business Logic Layer â (Domain, Services)
âââââââââââââââââââââââââââ¤
â Data Access Layer â (Repositories, ORM)
âââââââââââââââââââââââââââ¤
â Database Layer â (PostgreSQL, etc.)
âââââââââââââââââââââââââââ
Rules:
- Upper layers can call lower layers
- Lower layers cannot call upper layers
- Each layer has clear responsibility
Pros:
â
Clear separation
â
Testable layers
â
Familiar pattern
Cons:
â Can become rigid
â Changes ripple across layers
â Performance overhead
Hexagonal Architecture (Ports & Adapters):
Best for:
- Domain-driven design
- Testing-heavy environments
- Swappable infrastructure
Architecture:
âââââââââââââââ
â Domain â
â (Core) â
ââââââââ¬âââââââ
â
âââââââââââ¼ââââââââââ
â â â
HTTP API Database Queue
(Adapter) (Adapter) (Adapter)
Core never depends on adapters
Adapters depend on core
Pros:
â
Highly testable
â
Infrastructure-agnostic
â
DDD-friendly
Cons:
â More abstraction
â Steeper learning curve
â Can be over-engineered
Step 3: Design System Components
Component Design Template:
## [Component Name]
**Purpose:**
What does this component do?
**Responsibilities:**
- Responsibility 1
- Responsibility 2
**Dependencies:**
- Component A (for X)
- Component B (for Y)
**Interfaces:**
```typescript
interface ComponentAPI {
operation1(input: Type): Promise<Result>;
operation2(input: Type): Result;
}
Data: What data does it own/manage?
Events: What events does it emit/consume?
Error Handling: How does it handle failures?
**Example - Order Service:**
```markdown
## Order Service
**Purpose:**
Manage order lifecycle from creation to fulfillment
**Responsibilities:**
- Create orders
- Update order status
- Calculate totals with discounts
- Validate inventory availability
**Dependencies:**
- User Service (get user details)
- Inventory Service (check/reserve stock)
- Payment Service (process payment)
**Interfaces:**
```typescript
interface OrderService {
createOrder(cart: Cart, userId: string): Promise<Order>;
getOrder(orderId: string): Promise<Order>;
updateStatus(orderId: string, status: OrderStatus): Promise<void>;
}
Events Emitted:
- OrderCreated
- OrderPaid
- OrderShipped
- OrderCancelled
Events Consumed:
- PaymentSucceeded
- PaymentFailed
Error Handling:
- Invalid cart â 400 Bad Request
- Out of stock â 409 Conflict
- Payment fails â Reverse inventory reservation
### Step 4: Make Technology Choices
**Decision Framework:**
```markdown
## Technology Decision: [Name]
**Problem:**
What are we trying to solve?
**Options:**
1. Option A
2. Option B
3. Option C
**Criteria:**
- Performance requirements
- Team expertise
- Community support
- Cost
- Scalability
- Security
**Evaluation:**
| Criteria | Option A | Option B | Option C |
|----------|----------|----------|----------|
| Performance | 8/10 | 9/10 | 7/10 |
| Expertise | 9/10 | 5/10 | 8/10 |
| Community | 10/10 | 7/10 | 9/10 |
| Cost | Free | $X/mo | Free |
| Scalability | 7/10 | 10/10 | 8/10 |
**Decision:** Option A
**Rationale:**
Why we chose this option.
**Trade-offs:**
What we're giving up.
**Review Date:**
When we'll reconsider this decision.
Example – Database Choice:
## Database for Order Service
**Problem:**
Need persistent storage for orders with ACID guarantees
**Options:**
1. PostgreSQL (Relational)
2. MongoDB (Document)
3. DynamoDB (NoSQL)
**Criteria:**
- ACID compliance (critical)
- Complex queries (important)
- Scalability (important)
- Team expertise (important)
**Evaluation:**
| Criteria | PostgreSQL | MongoDB | DynamoDB |
|----------|------------|---------|----------|
| ACID | â
Full | â ï¸ Limited | â ï¸ Eventual |
| Queries | â
Excellent | â ï¸ Good | â Limited |
| Scale | â
Vertical+ | â
Horizontal | â
Managed |
| Expertise | â
High | â ï¸ Medium | â Low |
**Decision:** PostgreSQL
**Rationale:**
- ACID compliance is non-negotiable for financial transactions
- Team has 5 years PostgreSQL experience
- Can scale vertically to meet current needs
- Complex reporting queries needed
**Trade-offs:**
- Harder to horizontally scale than MongoDB
- More expensive at large scale than DynamoDB
- Self-managed vs fully managed
**Review Date:** When we hit 100K orders/day
Step 5: Plan for Scale
Scaling Strategies:
## Vertical Scaling (Scale Up)
Add more resources to single machine
**When:**
- Quick fix needed
- Simple deployment
- Under 10K users
**How:**
- Bigger CPU
- More RAM
- Faster disk
**Limits:**
- Hardware ceiling
- Single point of failure
- Expensive at scale
---
## Horizontal Scaling (Scale Out)
Add more machines
**When:**
- Growth expected
- High availability needed
- Cost-effective at scale
**How:**
- Load balancer
- Stateless services
- Shared database or sharding
**Challenges:**
- Session management
- Distributed state
- Data consistency
---
## Caching Strategy
Reduce load on database/services
**Layers:**
Browser Cache â CDN â App Cache â Database Cache
Patterns:
- Cache-Aside (lazy loading)
- Write-Through (sync write)
- Write-Behind (async write)
- Refresh-Ahead (proactive)
Example:
async function getUser(id: string): Promise<User> {
// 1. Check cache
const cached = await cache.get(`user:${id}`);
if (cached) return cached;
// 2. Cache miss: fetch from DB
const user = await db.users.findById(id);
// 3. Store in cache (TTL: 1 hour)
await cache.set(`user:${id}`, user, 3600);
return user;
}
Database Scaling
Read Replicas:
ââââââââââ
âPrimary â (writes)
âââââ¬âââââ
â
ââââââââââââ¬âââââââââââ
â â â
Replica Replica Replica
(reads) (reads) (reads)
Sharding:
User IDs 0-999 â Shard 1
User IDs 1000-1999 â Shard 2
User IDs 2000-2999 â Shard 3
Challenges:
- Rebalancing
- Cross-shard queries
- Transactions across shards
Partitioning:
Orders by date:
âââ 2024-Q1 â Partition 1
âââ 2024-Q2 â Partition 2
âââ 2024-Q3 â Partition 3
âââ 2024-Q4 â Partition 4
Benefits:
- Query performance
- Easier archival
- Smaller indexes
Step 6: Document Decisions (ADRs)
Architecture Decision Record Template:
# ADR [Number]: [Title]
**Status:** [Proposed | Accepted | Deprecated | Superseded]
**Date:** YYYY-MM-DD
**Deciders:** [Names]
---
## Context
What is the issue we're trying to solve?
**Current Situation:**
[Describe current state]
**Problem:**
[What needs to change and why]
**Constraints:**
- Technical constraints
- Business constraints
- Time constraints
---
## Decision
We will [decision].
**Details:**
[Explain the decision in detail]
---
## Options Considered
### Option 1: [Name]
**Pros:**
- Pro 1
- Pro 2
**Cons:**
- Con 1
- Con 2
### Option 2: [Name]
**Pros:**
- Pro 1
- Pro 2
**Cons:**
- Con 1
- Con 2
---
## Consequences
**Positive:**
- What improves
- What becomes easier
**Negative:**
- What becomes harder
- What we give up
**Risks:**
- What could go wrong
- Mitigation strategies
**Technical Debt:**
- What shortcuts are we taking
- When will we revisit
---
## Follow-up Actions
- [ ] Action 1 (Owner, Due Date)
- [ ] Action 2 (Owner, Due Date)
---
## References
- Link to design doc
- Link to RFC
- Related ADRs
Example ADR:
# ADR 001: Migrate from Monolith to Microservices
**Status:** Accepted
**Date:** 2026-01-15
**Deciders:** Architecture Team, Engineering Leads
---
## Context
**Current Situation:**
Single Rails monolith serving all traffic. 50K daily active users.
**Problem:**
- Deployment takes 30 minutes, blocks all teams
- Database at 80% capacity
- Cannot scale teams independently
- Different services have different scaling needs (API vs background jobs)
**Constraints:**
- Must maintain 99.9% uptime during migration
- Complete within 6 months
- Team of 15 engineers
---
## Decision
We will migrate to microservices using the Strangler Fig pattern.
**Approach:**
1. Start with highest-value, lowest-risk services (User Service, Notifications)
2. Extract one service per month
3. API Gateway routes to new services
4. Monolith remains for remaining functionality
5. Gradual data migration
**Tech Stack:**
- Services: Node.js/TypeScript
- Communication: REST + Message Queue (RabbitMQ)
- Deployment: Kubernetes
- Data: PostgreSQL per service
---
## Options Considered
### Option 1: Continue Scaling Monolith
**Pros:**
- Simplest
- Team already knows it
- No migration risk
**Cons:**
- Doesn't solve team scaling
- Database still bottleneck
- Deployment still blocking
### Option 2: Big Bang Rewrite
**Pros:**
- Fresh start
- Modern architecture
**Cons:**
- High risk
- 6+ months no features
- Likely to fail
### Option 3: Strangler Fig Migration (CHOSEN)
**Pros:**
- Low risk (gradual)
- Continuous value delivery
- Reversible
- Learn as we go
**Cons:**
- Longer timeline
- Temporary complexity
- Some duplication
---
## Consequences
**Positive:**
- Teams can deploy independently
- Services scale independently
- Technology flexibility
- Fault isolation
**Negative:**
- Operational complexity (15+ services)
- Distributed debugging harder
- Network latency between services
- More infrastructure cost
**Risks:**
- Data consistency across services
- Authentication/authorization complexity
- Monitoring/observability gaps
**Mitigation:**
- Event sourcing for data sync
- Shared auth service
- OpenTelemetry from day 1
**Technical Debt:**
- Monolith will coexist for 12-18 months
- Some duplication during migration
- Revisit architecture Q3 2026
---
## Follow-up Actions
- [x] Create migration roadmap (Sarah, 2026-01-20)
- [x] Set up Kubernetes cluster (DevOps, 2026-01-25)
- [ ] Extract User Service (Team A, 2026-02-15)
- [ ] Implement API Gateway (Team B, 2026-02-01)
- [ ] Set up observability (DevOps, 2026-01-30)
---
## References
- [Migration Roadmap](link)
- [Microservices RFC](link)
- Related: ADR 002 (Service Communication Pattern)
Common Patterns & Practices
API Gateway Pattern:
Client
â
API Gateway (routes, auth, rate limiting)
ââââ User Service
ââââ Order Service
ââââ Payment Service
Benefits:
- Single entry point
- Handles cross-cutting concerns
- Backend for frontend
Circuit Breaker Pattern:
class CircuitBreaker {
state = 'CLOSED'; // CLOSED, OPEN, HALF_OPEN
failures = 0;
threshold = 5;
async call(fn: Function) {
if (this.state === 'OPEN') {
throw new Error('Circuit breaker OPEN');
}
try {
const result = await fn();
this.onSuccess();
return result;
} catch (error) {
this.onFailure();
throw error;
}
}
onFailure() {
this.failures++;
if (this.failures >= this.threshold) {
this.state = 'OPEN';
setTimeout(() => this.state = 'HALF_OPEN', 60000);
}
}
onSuccess() {
this.failures = 0;
this.state = 'CLOSED';
}
}
Saga Pattern (Distributed Transactions):
Order Saga:
1. Create Order â Success
2. Reserve Inventory â Success
3. Charge Payment â FAILS
Compensation (rollback):
3. Refund Payment â (skipped, never charged)
2. Release Inventory â Execute
1. Cancel Order â Execute
Result: Consistent state, no partial orders
CQRS (Command Query Responsibility Segregation):
Commands (Writes): Queries (Reads):
Create Order Get Order
Update User List Orders
Delete Product Search Products
â â
Write DB âââââââ Read DB
(normalized) (denormalized)
Benefits:
- Optimize read/write separately
- Scale independently
- Complex queries without impacting writes
Architecture Checklist
## Pre-Development
- [ ] Functional requirements documented
- [ ] Non-functional requirements defined
- [ ] Architecture pattern chosen
- [ ] Technology stack decided
- [ ] Data model designed
- [ ] API contracts defined
- [ ] Security reviewed
- [ ] Scalability plan created
## During Development
- [ ] Code organized by domain/feature
- [ ] Dependencies point inward (clean architecture)
- [ ] Interfaces define contracts
- [ ] Error handling consistent
- [ ] Logging and monitoring instrumented
- [ ] Tests cover critical paths
- [ ] Documentation up to date
## Pre-Production
- [ ] Load testing completed
- [ ] Security audit passed
- [ ] Monitoring dashboards ready
- [ ] Alerts configured
- [ ] Runbooks written
- [ ] Rollback plan tested
- [ ] DR plan documented
- [ ] Team trained
Common Mistakes
| Don’t | Do |
|---|---|
| Microservices for everything | Start monolith, extract when needed |
| Premature optimization | Optimize when you have data |
| Architecture astronaut | Solve today’s problems, not future maybes |
| Copy Big Tech architecture | Your scale != their scale |
| Ignore non-functional requirements | Performance/security/reliability matter |
| Big Bang rewrites | Incremental refactoring |
| One size fits all | Different components, different patterns |
| Skip documentation | ADRs, diagrams, runbooks |
Tools & Resources
Diagramming:
- draw.io (free, versatile)
- Lucidchart (collaborative)
- Mermaid (code-based)
- C4 Model (structured approach)
Books:
- “Clean Architecture” by Robert Martin
- “Designing Data-Intensive Applications” by Martin Kleppmann
- “Building Microservices” by Sam Newman
- “Domain-Driven Design” by Eric Evans
Patterns:
- microservices.io (pattern catalog)
- martinfowler.com (architecture articles)
Related Skills
/systems-decompose– Break down features/database-schema– Design data models/api-design– Design API contracts/code-review– Review architectural decisions
Last Updated: 2026-01-22