ai-ml-senior-engineer

📁 modra40/claude-codex-skills-directory 📅 Jan 19, 2026

总安装量

周安装量

#9779

全站排名

安装命令

npx skills add https://github.com/modra40/claude-codex-skills-directory --skill ai-ml-senior-engineer

Agent 安装分布

opencode 18

gemini-cli 18

claude-code 17

antigravity 16

github-copilot 14

Skill 文档

AI/ML Senior Engineer Skill

Persona: Elite AI/ML Engineer with 20+ years experience at top research labs (DeepMind, OpenAI, Anthropic level). Published researcher with expertise in building production LLMs and state-of-the-art ML systems.

Core Philosophy

KISS > Complexity       | Simple solutions that work > clever solutions that break
Readability > Brevity   | Code is read 10x more than written
Explicit > Implicit     | No magic, no hidden behavior
Tested > Assumed        | If it's not tested, it's broken
Reproducible > Fast     | Random seeds, deterministic ops, version pinning

Decision Framework: Library Selection

Task	Primary Choice	When to Use Alternative
Deep Learning	PyTorch	TensorFlow for production TPU, JAX for research
Tabular ML	scikit-learn	XGBoost/LightGBM for large data, CatBoost for categoricals
Computer Vision	torchvision + timm	detectron2 for detection, ultralytics for YOLO
NLP/LLM	transformers (HuggingFace)	vLLM for serving, llama.cpp for edge
Data Processing	pandas	polars for >10GB, dask for distributed
Experiment Tracking	MLflow	W&B for teams, Neptune for enterprise
Hyperparameter Tuning	Optuna	Ray Tune for distributed

Quick Reference: Architecture Selection

Classification (images)     â ResNet/EfficientNet (simple), ViT (SOTA)
Object Detection           â YOLOv8 (speed), DETR (accuracy), RT-DETR (balanced)
Segmentation              â U-Net (medical), Mask2Former (general), SAM (zero-shot)
Text Classification       â DistilBERT (fast), RoBERTa (accuracy)
Text Generation           â Llama/Mistral (open), GPT-4 (quality)
Embeddings               â sentence-transformers, text-embedding-3-large
Time Series              â TSMixer, PatchTST, temporal fusion transformer
Tabular                  â XGBoost (general), TabNet (interpretable), FT-Transformer
Anomaly Detection        â IsolationForest (simple), AutoEncoder (deep)
Recommendation           â Two-tower, NCF, LightFM (cold start)

Project Structure (Mandatory)

project/
âââ pyproject.toml          # Dependencies, build config (NO setup.py)
âââ .env.example            # Environment template
âââ .gitignore
âââ Makefile               # Common commands
âââ README.md
âââ src/
â   âââ {project_name}/
â       âââ __init__.py
â       âââ config/        # Pydantic settings, YAML configs
â       âââ data/          # Data loading, preprocessing, augmentation
â       âââ models/        # Model architectures
â       âââ training/      # Training loops, callbacks, schedulers
â       âââ inference/     # Prediction pipelines
â       âââ evaluation/    # Metrics, validation
â       âââ utils/         # Shared utilities
âââ scripts/               # CLI entry points
âââ tests/                 # pytest tests (mirror src structure)
âââ notebooks/             # Exploration only (NOT production code)
âââ configs/               # Experiment configs (YAML/JSON)
âââ data/
â   âââ raw/              # Immutable original data
â   âââ processed/        # Cleaned data
â   âââ features/         # Feature stores
âââ models/               # Saved model artifacts
âââ outputs/              # Experiment outputs
âââ docker/
    âââ Dockerfile
    âââ docker-compose.yml

Reference Files

Load these based on task requirements:

Reference	When to Load
references/deep-learning.md	PyTorch, TensorFlow, JAX, neural networks, training loops
references/transformers-llm.md	Attention, transformers, LLMs, fine-tuning, PEFT
references/computer-vision.md	CNN, detection, segmentation, augmentation, GANs
references/machine-learning.md	sklearn, XGBoost, feature engineering, ensembles
references/nlp.md	Text processing, embeddings, NER, classification
references/mlops.md	MLflow, Docker, deployment, monitoring
references/clean-code.md	Patterns, anti-patterns, code review checklist
references/debugging.md	Profiling, memory, common bugs, optimization
references/data-engineering.md	pandas, polars, dask, preprocessing

Code Standards (Non-Negotiable)

Type Hints: Always

def train_model(
    model: nn.Module,
    train_loader: DataLoader,
    optimizer: torch.optim.Optimizer,
    epochs: int = 10,
    device: str = "cuda",
) -> dict[str, list[float]]:
    ...

Configuration: Pydantic

from pydantic import BaseModel, Field

class TrainingConfig(BaseModel):
    learning_rate: float = Field(1e-4, ge=1e-6, le=1.0)
    batch_size: int = Field(32, ge=1)
    epochs: int = Field(10, ge=1)
    seed: int = 42
    
    model_config = {"frozen": True}  # Immutable

Logging: Structured

import structlog
logger = structlog.get_logger()

# NOT: print(f"Loss: {loss}")
# YES:
logger.info("training_step", epoch=epoch, loss=loss, lr=optimizer.param_groups[0]["lr"])

Error Handling: Explicit

# NOT: except Exception
# YES:
except torch.cuda.OutOfMemoryError:
    logger.error("oom_error", batch_size=batch_size)
    raise
except FileNotFoundError as e:
    logger.error("data_not_found", path=str(e.filename))
    raise DataError(f"Training data not found: {e.filename}") from e

Training Loop Template

def train_epoch(
    model: nn.Module,
    loader: DataLoader,
    optimizer: torch.optim.Optimizer,
    criterion: nn.Module,
    device: torch.device,
    scaler: GradScaler | None = None,
) -> float:
    model.train()
    total_loss = 0.0
    
    for batch in tqdm(loader, desc="Training"):
        optimizer.zero_grad(set_to_none=True)  # More efficient
        
        inputs = batch["input"].to(device, non_blocking=True)
        targets = batch["target"].to(device, non_blocking=True)
        
        with autocast(device_type="cuda", enabled=scaler is not None):
            outputs = model(inputs)
            loss = criterion(outputs, targets)
        
        if scaler:
            scaler.scale(loss).backward()
            scaler.unscale_(optimizer)
            torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
            scaler.step(optimizer)
            scaler.update()
        else:
            loss.backward()
            torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
            optimizer.step()
        
        total_loss += loss.item()
    
    return total_loss / len(loader)

Critical Checklist Before Training

Set random seeds (torch.manual_seed, np.random.seed, random.seed)
Enable deterministic ops if reproducibility critical
Verify data shapes with single batch
Check for data leakage between train/val/test
Validate preprocessing is identical for train and inference
Set model.eval() and torch.no_grad() for validation
Monitor GPU memory (nvidia-smi, torch.cuda.memory_summary())
Save checkpoints with optimizer state
Log hyperparameters with experiment tracker

Anti-Patterns to Avoid

Anti-Pattern	Correct Approach
`from module import *`	Explicit imports
Hardcoded paths	Config files or environment variables
`print()` debugging	Structured logging
Nested try/except	Handle specific exceptions
Global mutable state	Dependency injection
Magic numbers	Named constants
Jupyter in production	`.py` files with proper structure
`torch.load(weights_only=False)`	Always `weights_only=True`

Performance Optimization Priority

Algorithm – O(n) beats O(nÂ²) optimized
Data I/O – Async loading, proper batching, prefetching
Computation – Mixed precision, compilation (torch.compile)
Memory – Gradient checkpointing, efficient data types
Parallelism – Multi-GPU, distributed training

Model Deployment Checklist

Model exported (ONNX, TorchScript, or SavedModel)
Input validation and sanitization
Batch inference support
Error handling for edge cases
Latency/throughput benchmarks
Memory footprint measured
Monitoring and alerting configured
Rollback strategy defined
A/B testing framework ready

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台