mlops-engineer
12
总安装量
12
周安装量
#26219
全站排名
安装命令
npx skills add https://github.com/anton-abyzov/specweave --skill mlops-engineer
Agent 安装分布
claude-code
10
opencode
9
antigravity
9
codex
9
gemini-cli
9
cursor
8
Skill 文档
MLOps Engineer
Expert in ML infrastructure, automation, and production ML systems.
â ï¸ Chunking Rule
Large MLOps platforms = 1000+ lines. Generate ONE component per response:
- Experiment Tracking â 2. Model Registry â 3. Training Pipelines â 4. Deployment â 5. Monitoring
Core Capabilities
ML Pipelines
- Kubeflow Pipelines: K8s-native ML workflows
- Apache Airflow: DAG-based orchestration
- Prefect: Modern dataflow automation
- MLflow Projects: Reproducible ML runs
Model Registry
- Model versioning and staging
- Model metadata and lineage
- Promotion workflows (dev â staging â prod)
- A/B testing infrastructure
Deployment
- Docker containerization
- Kubernetes deployment (Seldon, KServe)
- Serverless (AWS Lambda, GCP Functions)
- Edge deployment (ONNX, TensorRT)
Monitoring
- Model performance drift detection
- Data quality monitoring
- Inference latency tracking
- Alerting and auto-retraining triggers
CI/CD for ML
- Automated testing (unit, integration, model)
- Model validation gates
- Automated retraining pipelines
- GitOps for ML
Best Practices
# Kubeflow Pipeline Example
from kfp import dsl, compiler
@dsl.component
def preprocess_data(input_path: str, output_path: str):
# Data preprocessing logic
pass
@dsl.component
def train_model(data_path: str, model_path: str):
# Training logic
pass
@dsl.pipeline(name="ml-training-pipeline")
def ml_pipeline(input_data: str):
preprocess = preprocess_data(input_path=input_data, output_path="/data/processed")
train = train_model(data_path=preprocess.outputs["output_path"], model_path="/models")
# Model Registry with MLflow
import mlflow.sklearn
# Register model
model_uri = f"runs:/{run_id}/model"
mlflow.register_model(model_uri, "fraud-detection-model")
# Transition to production
client = mlflow.tracking.MlflowClient()
client.transition_model_version_stage(
name="fraud-detection-model",
version=3,
stage="Production"
)
# Kubernetes Deployment (Seldon)
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: fraud-detector
spec:
predictors:
- name: default
replicas: 3
graph:
name: model
type: MODEL
modelUri: s3://models/fraud-v3
DAG Patterns
Training DAG
data_ingestion â validation â preprocessing â training â evaluation â registration
Inference DAG
request â preprocessing â model_inference â postprocessing â response
Monitoring DAG
collect_metrics â detect_drift â alert_if_needed â trigger_retrain
When to Use
- Building ML training pipelines
- Setting up model registry
- Deploying models to production
- ML monitoring and observability
- CI/CD for machine learning
- Infrastructure automation for ML