ml-deployment-helper

📁 anton-abyzov/specweave 📅 Jan 22, 2026

总安装量

周安装量

#21208

全站排名

安装命令

npx skills add https://github.com/anton-abyzov/specweave --skill ml-deployment-helper

Agent 安装分布

claude-code 13

antigravity 11

cursor 11

opencode 10

codex 10

Skill 文档

ML Deployment Helper

Overview

Bridges the gap between trained models and production systems. Generates deployment artifacts, APIs, monitoring, and A/B testing infrastructure following MLOps best practices.

Deployment Checklist

Before deploying any model, this skill ensures:

â Model versioned and tracked
â Dependencies documented (requirements.txt/Dockerfile)
â API endpoint created
â Input validation implemented
â Monitoring configured
â A/B testing ready
â Rollback plan documented
â Performance benchmarked

Deployment Patterns

Pattern 1: REST API (FastAPI)

from specweave import create_model_api

# Generates production-ready API
api = create_model_api(
    model_path="models/model-v3.pkl",
    increment="0042",
    framework="fastapi"
)

# Creates:
# - api/
#   âââ main.py (FastAPI app)
#   âââ models.py (Pydantic schemas)
#   âââ predict.py (Prediction logic)
#   âââ Dockerfile
#   âââ requirements.txt
#   âââ tests/

Generated main.py:

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import joblib

app = FastAPI(title="Recommendation Model API", version="0042-v3")

model = joblib.load("model-v3.pkl")

class PredictionRequest(BaseModel):
    user_id: int
    context: dict

@app.post("/predict")
async def predict(request: PredictionRequest):
    try:
        prediction = model.predict([request.dict()])
        return {
            "recommendations": prediction.tolist(),
            "model_version": "0042-v3",
            "timestamp": datetime.now()
        }
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

@app.get("/health")
async def health():
    return {"status": "healthy", "model_loaded": model is not None}

Pattern 2: Batch Prediction

from specweave import create_batch_predictor

# For offline scoring
batch_predictor = create_batch_predictor(
    model_path="models/model-v3.pkl",
    increment="0042",
    input_path="s3://bucket/data/",
    output_path="s3://bucket/predictions/"
)

# Creates:
# - batch/
#   âââ predictor.py
#   âââ scheduler.yaml (Airflow/Kubernetes CronJob)
#   âââ monitoring.py

Pattern 3: Real-Time Streaming

from specweave import create_streaming_predictor

# For Kafka/Kinesis streams
streaming = create_streaming_predictor(
    model_path="models/model-v3.pkl",
    increment="0042",
    input_topic="user-events",
    output_topic="predictions"
)

# Creates:
# - streaming/
#   âââ consumer.py
#   âââ predictor.py
#   âââ producer.py
#   âââ docker-compose.yaml

Containerization

from specweave import containerize_model

# Generates optimized Dockerfile
dockerfile = containerize_model(
    model_path="models/model-v3.pkl",
    framework="sklearn",
    python_version="3.10",
    increment="0042"
)

Generated Dockerfile:

FROM python:3.10-slim

WORKDIR /app

# Copy model and dependencies
COPY models/model-v3.pkl /app/model.pkl
COPY requirements.txt /app/

# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy application
COPY api/ /app/api/

# Health check
HEALTHCHECK --interval=30s --timeout=3s \
  CMD curl -f http://localhost:8000/health || exit 1

# Run API
CMD ["uvicorn", "api.main:app", "--host", "0.0.0.0", "--port", "8000"]

Monitoring Setup

from specweave import setup_model_monitoring

# Configures monitoring for production
monitoring = setup_model_monitoring(
    model_name="recommendation-model",
    increment="0042",
    metrics=[
        "prediction_latency",
        "throughput",
        "error_rate",
        "prediction_distribution",
        "feature_drift"
    ]
)

# Creates:
# - monitoring/
#   âââ prometheus.yaml
#   âââ grafana-dashboard.json
#   âââ alerts.yaml
#   âââ drift-detector.py

A/B Testing Infrastructure

from specweave import create_ab_test

# Sets up A/B test framework
ab_test = create_ab_test(
    control_model="model-v2.pkl",
    treatment_model="model-v3.pkl",
    traffic_split=0.1,  # 10% to new model
    success_metric="click_through_rate",
    increment="0042"
)

# Creates:
# - ab-test/
#   âââ router.py (traffic splitting)
#   âââ metrics.py (success tracking)
#   âââ statistical-tests.py (significance testing)
#   âââ dashboard.py (real-time monitoring)

A/B Test Router:

import random

def route_prediction(user_id, control_model, treatment_model):
    """Route to control or treatment based on user_id hash"""
    
    # Consistent hashing (same user always gets same model)
    user_bucket = hash(user_id) % 100
    
    if user_bucket < 10:  # 10% to treatment
        return treatment_model.predict(features), "treatment"
    else:
        return control_model.predict(features), "control"

Model Versioning

from specweave import ModelVersion

# Register model version
version = ModelVersion.register(
    model_path="models/model-v3.pkl",
    increment="0042",
    metadata={
        "accuracy": 0.87,
        "training_date": "2024-01-15",
        "data_version": "v2024-01",
        "framework": "xgboost==1.7.0"
    }
)

# Easy rollback
if production_metrics["error_rate"] > threshold:
    ModelVersion.rollback(to_version="0042-v2")

Load Testing

from specweave import load_test_model

# Benchmark model performance
results = load_test_model(
    api_url="http://localhost:8000/predict",
    requests_per_second=[10, 50, 100, 500, 1000],
    duration_seconds=60,
    increment="0042"
)

Output:

Load Test Results:
==================

| RPS  | Latency P50 | Latency P95 | Latency P99 | Error Rate |
|------|-------------|-------------|-------------|------------|
| 10   | 35ms        | 45ms        | 50ms        | 0.00%      |
| 50   | 38ms        | 52ms        | 65ms        | 0.00%      |
| 100  | 45ms        | 70ms        | 95ms        | 0.02%      |
| 500  | 120ms       | 250ms       | 400ms       | 1.20%      |
| 1000 | 350ms       | 800ms       | 1200ms      | 8.50%      |

Recommendation: Deploy with max 100 RPS per instance
Target: <100ms P95 latency (achieved at 100 RPS)

Deployment Commands

# Generate deployment artifacts
/ml:deploy-prepare 0042

# Create API
/ml:create-api --increment 0042 --framework fastapi

# Setup monitoring
/ml:setup-monitoring 0042

# Create A/B test
/ml:create-ab-test --control v2 --treatment v3 --split 0.1

# Load test
/ml:load-test 0042 --rps 100 --duration 60s

# Deploy to production
/ml:deploy 0042 --environment production

Deployment Increment

The skill creates a deployment increment:

.specweave/increments/0043-deploy-recommendation-model/
âââ spec.md (deployment requirements)
âââ plan.md (deployment strategy)
âââ tasks.md
â   âââ [ ] Containerize model
â   âââ [ ] Create API
â   âââ [ ] Setup monitoring
â   âââ [ ] Configure A/B test
â   âââ [ ] Load test
â   âââ [ ] Deploy to staging
â   âââ [ ] Validate staging
â   âââ [ ] Deploy to production
âââ api/ (FastAPI app)
âââ monitoring/ (Grafana dashboards)
âââ ab-test/ (A/B testing logic)
âââ load-tests/ (Performance benchmarks)

Best Practices

Always load test before production
Start with 1-5% traffic in A/B test
Monitor model drift in production
Version everything (model, data, code)
Document rollback plan before deploying
Set up alerts for anomalies
Gradual rollout (canary deployment)

Integration with SpecWeave

# After training model (increment 0042)
/sw:inc "0043-deploy-recommendation-model"

# Generates deployment increment with all artifacts
/sw:do

# Deploy to production when ready
/ml:deploy 0043 --environment production

Model deployment is not the endâit’s the beginning of the MLOps lifecycle.

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台