mlops-engineer

📁 404kidwiz/claude-supercode-skills 📅 Jan 24, 2026

总安装量

周安装量

#6581

全站排名

安装命令

npx skills add https://github.com/404kidwiz/claude-supercode-skills --skill mlops-engineer

Agent 安装分布

claude-code 23

opencode 23

gemini-cli 21

cursor 19

windsurf 17

Skill 文档

MLOps Engineer

Purpose

Provides expertise in Machine Learning Operations, bridging data science and DevOps practices. Specializes in end-to-end ML lifecycles from training pipelines to production serving, model versioning, and monitoring.

When to Use

Building ML training and serving pipelines
Implementing model versioning and registry
Setting up feature stores
Deploying models to production
Monitoring model performance and drift
Automating ML workflows (CI/CD for ML)
Implementing A/B testing for models
Managing experiment tracking

Quick Start

Invoke this skill when:

Building ML pipelines and workflows
Deploying models to production
Setting up model versioning and registry
Implementing feature stores
Monitoring production ML systems

Do NOT invoke when:

Model development and training â use /ml-engineer
Data pipeline ETL â use /data-engineer
Kubernetes infrastructure â use /kubernetes-specialist
General CI/CD without ML â use /devops-engineer

Decision Framework

ML Lifecycle Stage?
âââ Experimentation
â   âââ MLflow/Weights & Biases for tracking
âââ Training Pipeline
â   âââ Kubeflow/Airflow/Vertex AI
âââ Model Registry
â   âââ MLflow Registry/Vertex Model Registry
âââ Serving
â   âââ Batch â Spark/Dataflow
â   âââ Real-time â TF Serving/Seldon/KServe
âââ Monitoring
    âââ Evidently/Fiddler/custom metrics

Core Workflows

1. ML Pipeline Setup

Define pipeline stages (data prep, training, eval)
Choose orchestrator (Kubeflow, Airflow, Vertex)
Containerize each pipeline step
Implement artifact storage
Add experiment tracking
Configure automated retraining triggers

2. Model Deployment

Register model in model registry
Build serving container
Deploy to serving infrastructure
Configure autoscaling
Implement canary/shadow deployment
Set up monitoring and alerts

3. Model Monitoring

Define key metrics (latency, throughput, accuracy)
Implement data drift detection
Set up prediction monitoring
Create alerting thresholds
Build dashboards for visibility
Automate retraining triggers

Best Practices

Version everything: code, data, models, configs
Use feature stores for consistency between training and serving
Implement CI/CD specifically designed for ML workflows
Monitor data drift and model performance continuously
Use canary deployments for model rollouts
Keep training and serving environments consistent

Anti-Patterns

Anti-Pattern	Problem	Correct Approach
Manual deployments	Error-prone, slow	Automated ML CI/CD
Training-serving skew	Prediction errors	Feature stores
No model versioning	Can’t reproduce or rollback	Model registry
Ignoring data drift	Silent degradation	Continuous monitoring
Notebook-to-production	Unmaintainable	Proper pipeline code

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台