ml-system-design
8
总安装量
5
周安装量
#35513
全站排名
安装命令
npx skills add https://github.com/melodic-software/claude-code-plugins --skill ml-system-design
Agent 安装分布
antigravity
4
trae
3
windsurf
3
codex
3
gemini-cli
3
Skill 文档
ML System Design
This skill provides frameworks for designing production machine learning systems, from data pipelines to model serving.
When to Use This Skill
Keywords: ML pipeline, machine learning system, feature store, model training, model serving, ML infrastructure, MLOps, A/B testing ML, feature engineering, model deployment
Use this skill when:
- Designing end-to-end ML systems for production
- Planning feature store architecture
- Designing model training pipelines
- Planning model serving infrastructure
- Preparing for ML system design interviews
- Evaluating ML platform tools and frameworks
ML System Architecture Overview
The ML System Lifecycle
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
â ML SYSTEM LIFECYCLE â
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ¤
â â
â ââââââââââââ ââââââââââââ ââââââââââââ ââââââââââââ ââââââââââ â
â â Data ââââ¶â Feature ââââ¶â Model ââââ¶â Model ââââ¶â Monitorâ â
â â Ingestionâ â Pipeline â â Training â â Serving â â & Eval â â
â ââââââââââââ ââââââââââââ ââââââââââââ ââââââââââââ ââââââââââ â
â â â â â â â
â â¼ â¼ â¼ â¼ â¼ â
â ââââââââââââ ââââââââââââ ââââââââââââ ââââââââââââ ââââââââââ â
â â Data â â Feature â â Model â â Inferenceâ â Metricsâ â
â â Lake â â Store â â Registry â â Cache â â Store â â
â ââââââââââââ ââââââââââââ ââââââââââââ ââââââââââââ ââââââââââ â
â â
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
Key Components
| Component | Purpose | Examples |
|---|---|---|
| Data Ingestion | Collect raw data from sources | Kafka, Kinesis, Pub/Sub |
| Feature Pipeline | Transform raw data to features | Spark, Flink, dbt |
| Feature Store | Store and serve features | Feast, Tecton, Vertex AI |
| Model Training | Train and validate models | SageMaker, Vertex AI, Kubeflow |
| Model Registry | Version and track models | MLflow, Weights & Biases |
| Model Serving | Serve predictions | TensorFlow Serving, Triton, vLLM |
| Monitoring | Track model performance | Evidently, WhyLabs, Arize |
Feature Store Architecture
Why Feature Stores?
Problems without a feature store:
- Training-serving skew (features computed differently)
- Duplicate feature computation across teams
- No feature versioning or lineage
- Slow feature experimentation
Feature Store Components
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
â FEATURE STORE â
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ¤
â â
â âââââââââââââââââââââââ âââââââââââââââââââââââ â
â â OFFLINE STORE â â ONLINE STORE â â
â â â â â â
â â - Historical data â â - Low-latency â â
â â - Training queries â âââââ¶ â - Point lookups â â
â â - Batch features â sync â - Real-time servingâ â
â â â â â â
â â (Data Warehouse) â â (Redis, DynamoDB) â â
â âââââââââââââââââââââââ âââââââââââââââââââââââ â
â â
â ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
â â FEATURE REGISTRY ââ
â â - Feature definitions - Version control ââ
â â - Data lineage - Access control ââ
â ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
Feature Types
| Type | Computation | Storage | Example |
|---|---|---|---|
| Batch | Scheduled (hourly/daily) | Offline â Online | User purchase count (30 days) |
| Streaming | Real-time event processing | Direct to online | Items in cart (current) |
| On-demand | Request-time computation | Not stored | Distance to nearest store |
Training-Serving Consistency
TRAINING (Historical):
ââââââââââââââââ ââââââââââââââââ ââââââââââââââââ
â Historical âââââ¶â Point-in-Timeâââââ¶â Training â
â Events â â Join â â Dataset â
ââââââââââââââââ ââââââââââââââââ ââââââââââââââââ
â
Uses feature
definitions
â
SERVING (Real-time): â¼
ââââââââââââââââ ââââââââââââââââ ââââââââââââââââ
â Online âââââ¶â Same Feature âââââ¶â Prediction â
â Store â â Definitions â â Request â
ââââââââââââââââ ââââââââââââââââ ââââââââââââââââ
Model Training Infrastructure
Training Pipeline Components
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
â TRAINING PIPELINE â
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ¤
â â
â ââââââââââââââ ââââââââââââââ ââââââââââââââ ââââââââââââââ â
â â Data ââââ¶â Feature ââââ¶â Model ââââ¶â Model â â
â â Loader â â Transformâ â Train â â Validate â â
â ââââââââââââââ ââââââââââââââ ââââââââââââââ ââââââââââââââ â
â â â â â â
â â¼ â¼ â¼ â¼ â
â ââââââââââââââ ââââââââââââââ ââââââââââââââ ââââââââââââââ â
â â Experiment â â Hyperparameterâ â Checkpoint â â Model â â
â â Tracking â â Tuning â â Storage â â Registry â â
â ââââââââââââââ ââââââââââââââ ââââââââââââââ ââââââââââââââ â
â â
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
Training Infrastructure Patterns
| Pattern | Use Case | Tools |
|---|---|---|
| Single-node | Small datasets, quick experiments | Jupyter, local GPU |
| Distributed data-parallel | Large datasets, same model | Horovod, PyTorch DDP |
| Model-parallel | Large models that don’t fit in memory | DeepSpeed, FSDP, Megatron |
| Hyperparameter tuning | Automated model optimization | Optuna, Ray Tune |
Experiment Tracking
Track for reproducibility:
| What to Track | Why |
|---|---|
| Hyperparameters | Reproduce training runs |
| Metrics | Compare model performance |
| Artifacts | Model files, datasets |
| Code version | Git commit hash |
| Environment | Docker image, dependencies |
| Data version | Dataset hash or snapshot |
Model Serving Architecture
Serving Patterns
| Pattern | Latency | Throughput | Use Case |
|---|---|---|---|
| Online (REST/gRPC) | Low (<100ms) | Medium | Real-time predictions |
| Batch | High (hours) | Very high | Bulk scoring |
| Streaming | Medium | High | Event-driven predictions |
| Embedded | Very low | Varies | Edge/mobile inference |
Online Serving Architecture
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
â MODEL SERVING SYSTEM â
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ¤
â â
â ââââââââââââââââ â
â â Clients â â
â ââââââââ¬ââââââââ â
â â â
â â¼ â
â ââââââââââââââââ â
â â Load Balancerâ â
â ââââââââ¬ââââââââ â
â â â
â â¼ â
â ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ â
â â API Gateway â â
â â - Authentication - Rate limiting - Request validation â â
â ââââââââââââââââââââââââââââââââ¬ââââââââââââââââââââââââââââââââ â
â â â
â âââââââââââââââââââââââââ¼ââââââââââââââââââââââââ â
â â¼ â¼ â¼ â
â ââââââââââââââ ââââââââââââââ ââââââââââââââ â
â â Model A â â Model B â â Model C â â
â â (v1.2) â â (v2.0) â â (v1.0) â â
â ââââââââââââââ ââââââââââââââ ââââââââââââââ â
â â â â â
â âââââââââââââââââââââââââ¼ââââââââââââââââââââââââ â
â â¼ â
â ââââââââââââââââââ â
â â Feature Store â â
â â (Online) â â
â ââââââââââââââââââ â
â â
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
Latency Optimization
| Technique | Latency Impact | Trade-off |
|---|---|---|
| Batching | Reduces per-request | Increases latency for first request |
| Caching | 10-100x faster | May serve stale predictions |
| Quantization | 2-4x faster | Slight accuracy loss |
| Distillation | Variable | Training overhead |
| GPU inference | 10-100x faster | Cost increase |
A/B Testing ML Models
Experiment Design
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
â A/B TESTING ARCHITECTURE â
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ¤
â â
â ââââââââââââââââ â
â â Traffic â â
â ââââââââ¬ââââââââ â
â â â
â â¼ â
â ââââââââââââââââââââââââ â
â â Experiment Assignment â ââââââââ Experiment Config â
â â - User bucketing â - Allocation % â
â â - Feature flags â - Target segments â
â ââââââââââââ¬ââââââââââââ - Guardrails â
â â â
â ââââââââââ´âââââââââ â
â â¼ â¼ â
â ââââââââââ ââââââââââ â
â âControl â âTreatmentâ â
â âModel A â âModel B â â
â ââââââ¬ââââ ââââââ¬ââââ â
â â â â
â ââââââââââ¬ââââââââ â
â â¼ â
â ââââââââââââââââââ â
â â Metrics Logger â â
â ââââââââââ¬ââââââââ â
â â¼ â
â ââââââââââââââââââ â
â â Statistical â ââââââ¶ Decision: Ship / Iterate / Kill â
â â Analysis â â
â ââââââââââââââââââ â
â â
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
Metrics to Track
| Metric Type | Examples | Purpose |
|---|---|---|
| Model metrics | AUC, RMSE, precision/recall | Model quality |
| Business metrics | CTR, conversion, revenue | Business impact |
| Guardrail metrics | Latency, error rate, engagement | Prevent regressions |
| Segment metrics | Metrics by user segment | Detect heterogeneous effects |
Statistical Considerations
- Sample size: Calculate power before experiment
- Duration: Account for novelty effects and time patterns
- Multiple testing: Adjust for multiple metrics (Bonferroni, FDR)
- Early stopping: Use sequential testing methods
Model Monitoring
What to Monitor
| Category | Metrics | Alert Threshold |
|---|---|---|
| Data quality | Missing values, schema drift | >1% change |
| Feature drift | Distribution shift (PSI, KL) | PSI >0.2 |
| Prediction drift | Output distribution shift | Depends on use case |
| Model performance | Accuracy, AUC (when labels available) | >5% degradation |
| Operational | Latency, throughput, errors | SLO violations |
Drift Detection
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
â DRIFT DETECTION PIPELINE â
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ¤
â â
â Training Data Production Data â
â ââââââââââââââââ ââââââââââââââââ â
â â Reference â â Current â â
â â Distribution â â Distribution â â
â ââââââââ¬ââââââââ ââââââââ¬ââââââââ â
â â â â
â ââââââââââââââââ¬âââââââââââââââ â
â â¼ â
â ââââââââââââââââââââ â
â â Statistical Test â â
â â - PSI (Population Stability Index) â
â â - KS Test â
â â - Chi-squared â
â ââââââââââ¬ââââââââââ â
â â¼ â
â ââââââââââââââââââââ â
â â Drift Score â â
â ââââââââââ¬ââââââââââ â
â â â
â âââââââââââââ¼ââââââââââââ â
â â¼ â¼ â¼ â
â No Drift Warning Critical â
â (< 0.1) (0.1-0.2) (> 0.2) â
â â â â â
â â¼ â¼ â¼ â
â Continue Investigate Retrain â
â â
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
Common ML System Design Patterns
Pattern 1: Recommendation System
Components needed:
- Candidate Generation (retrieve 100s-1000s)
- Ranking Model (score and sort)
- Feature Store (user features, item features)
- Real-time personalization (recent behavior)
- A/B testing infrastructure
Pattern 2: Fraud Detection
Components needed:
- Real-time feature computation
- Low-latency model serving (<50ms)
- High recall focus (can't miss fraud)
- Explainability for compliance
- Human-in-the-loop review
- Feedback loop for labels
Pattern 3: Search Ranking
Components needed:
- Two-stage ranking (retrieval + ranking)
- Feature store for query/document features
- Low latency (<200ms end-to-end)
- Learning to rank models
- Click-through rate prediction
- A/B testing with interleaving
Estimation for ML Systems
Training Infrastructure
Training time estimation:
- Dataset size: 100M examples
- Model: Transformer (100M params)
- GPU: A100 (80GB, 312 TFLOPS)
- Batch size: 32
- Training steps: Dataset / batch = 3.1M steps
- Time per step: ~100ms
- Total time: ~86 hours single GPU
- With 8 GPUs (data parallel): ~11 hours
Serving Infrastructure
Inference estimation:
- QPS: 10,000
- Model latency: 20ms
- Batch size: 1 (real-time)
- GPU utilization: 50% (latency constraint)
- Requests per GPU/sec: 25
- GPUs needed: 10,000 / 25 = 400 GPUs
- With batching (batch 8): 100 GPUs (4x reduction)
Related Skills
llm-serving-patterns– LLM-specific serving and optimizationrag-architecture– Retrieval-Augmented Generation patternsvector-databases– Vector search and embeddingsml-inference-optimization– Latency and cost optimizationestimation-techniques– Back-of-envelope calculationsquality-attributes-taxonomy– NFR definitions
Related Commands
/sd:ml-pipeline <problem>– Design ML system interactively/sd:estimate <scenario>– Capacity calculations
Related Agents
ml-systems-designer– Design ML architecturesml-interviewer– Mock ML system design interviews
Version History
- v1.0.0 (2025-12-26): Initial release
Last Updated
Date: 2025-12-26 Model: claude-opus-4-5-20251101