mlops-engineer
31
总安装量
31
周安装量
#6581
全站排名
安装命令
npx skills add https://github.com/404kidwiz/claude-supercode-skills --skill mlops-engineer
Agent 安装分布
claude-code
23
opencode
23
gemini-cli
21
cursor
19
windsurf
17
Skill 文档
MLOps Engineer
Purpose
Provides expertise in Machine Learning Operations, bridging data science and DevOps practices. Specializes in end-to-end ML lifecycles from training pipelines to production serving, model versioning, and monitoring.
When to Use
- Building ML training and serving pipelines
- Implementing model versioning and registry
- Setting up feature stores
- Deploying models to production
- Monitoring model performance and drift
- Automating ML workflows (CI/CD for ML)
- Implementing A/B testing for models
- Managing experiment tracking
Quick Start
Invoke this skill when:
- Building ML pipelines and workflows
- Deploying models to production
- Setting up model versioning and registry
- Implementing feature stores
- Monitoring production ML systems
Do NOT invoke when:
- Model development and training â use
/ml-engineer - Data pipeline ETL â use
/data-engineer - Kubernetes infrastructure â use
/kubernetes-specialist - General CI/CD without ML â use
/devops-engineer
Decision Framework
ML Lifecycle Stage?
âââ Experimentation
â âââ MLflow/Weights & Biases for tracking
âââ Training Pipeline
â âââ Kubeflow/Airflow/Vertex AI
âââ Model Registry
â âââ MLflow Registry/Vertex Model Registry
âââ Serving
â âââ Batch â Spark/Dataflow
â âââ Real-time â TF Serving/Seldon/KServe
âââ Monitoring
âââ Evidently/Fiddler/custom metrics
Core Workflows
1. ML Pipeline Setup
- Define pipeline stages (data prep, training, eval)
- Choose orchestrator (Kubeflow, Airflow, Vertex)
- Containerize each pipeline step
- Implement artifact storage
- Add experiment tracking
- Configure automated retraining triggers
2. Model Deployment
- Register model in model registry
- Build serving container
- Deploy to serving infrastructure
- Configure autoscaling
- Implement canary/shadow deployment
- Set up monitoring and alerts
3. Model Monitoring
- Define key metrics (latency, throughput, accuracy)
- Implement data drift detection
- Set up prediction monitoring
- Create alerting thresholds
- Build dashboards for visibility
- Automate retraining triggers
Best Practices
- Version everything: code, data, models, configs
- Use feature stores for consistency between training and serving
- Implement CI/CD specifically designed for ML workflows
- Monitor data drift and model performance continuously
- Use canary deployments for model rollouts
- Keep training and serving environments consistent
Anti-Patterns
| Anti-Pattern | Problem | Correct Approach |
|---|---|---|
| Manual deployments | Error-prone, slow | Automated ML CI/CD |
| Training-serving skew | Prediction errors | Feature stores |
| No model versioning | Can’t reproduce or rollback | Model registry |
| Ignoring data drift | Silent degradation | Continuous monitoring |
| Notebook-to-production | Unmaintainable | Proper pipeline code |