notebook-ml-architect

📁 bjornmelin/dev-skills 📅 Jan 26, 2026
3
总安装量
3
周安装量
#55224
全站排名
安装命令
npx skills add https://github.com/bjornmelin/dev-skills --skill notebook-ml-architect

Agent 安装分布

opencode 2
gemini-cli 2
antigravity 2
claude-code 2
github-copilot 2
codex 2

Skill 文档

Notebook ML Architect

Expert guidance for production-quality ML notebooks.

Quick Reference

Operation Use Case
audit Analyze notebook for anti-patterns, leakage, reproducibility issues
refactor Transform notebook into modular Python pipeline
template Generate new notebook from EDA/classification/experiment template
report Create markdown summary from executed notebook
convert Extract Python script from notebook

Audit Workflow

When auditing a notebook:

  1. Read the notebook using the Read tool
  2. Check structure against ml-workflow-guide.md
  3. Detect anti-patterns using anti-patterns.md
  4. Check for data leakage using leakage-checklist.md
  5. Run analysis script if deeper inspection needed:
    python scripts/analyze_notebook.py <notebook.ipynb>
    

Audit Checklist

  • Execution order: Cells numbered sequentially (no gaps, no out-of-order)
  • Random seeds: Set early (np.random.seed, torch.manual_seed, random.seed)
  • Imports at top: All imports in first code cell(s)
  • No hardcoded paths: Use relative paths or config variables
  • Train/test split: Clear separation before any modeling
  • No data leakage: Pre-processing after split, no test data peeking
  • Modularization: Functions/classes for reusable logic
  • Dependencies documented: requirements.txt or environment.yml referenced

Severity Levels

  • CRITICAL: Data leakage, missing train/test split, results unreproducible
  • HIGH: No seeds, hardcoded paths, execution order issues
  • MEDIUM: Missing modularization, no dependency docs
  • LOW: Naming conventions, missing comments, style issues

Refactoring Guide

Transform notebooks into production pipelines:

Step 1: Identify Sections

Look for markdown headers that indicate logical sections:

  • Data loading
  • Preprocessing
  • Feature engineering
  • Model definition
  • Training
  • Evaluation

Step 2: Extract Functions

Convert repeated or complex cell code into functions:

# Before: inline code
df = pd.read_csv('data.csv')
df = df.dropna()
df['feature'] = df['a'] * df['b']

# After: function
def load_and_prepare_data(path: str) -> pd.DataFrame:
    df = pd.read_csv(path)
    df = df.dropna()
    df['feature'] = df['a'] * df['b']
    return df

Step 3: Create Module Structure

project/
├── data.py          # Data loading and preprocessing
├── features.py      # Feature engineering
├── model.py         # Model definition
├── train.py         # Training loop
├── evaluate.py      # Evaluation metrics
├── config.py        # Configuration parameters
└── main.py          # Pipeline entry point

Step 4: Use convert script

python scripts/convert_to_script.py notebook.ipynb output.py --group-by-sections

Template Generation

Generate new notebooks from templates:

Available Templates

  1. EDA Template (assets/templates/eda_template.ipynb)

    • Data loading, basic info, missing values, distributions, correlations
  2. Classification Template (assets/templates/classification_template.ipynb)

    • Full supervised learning pipeline with evaluation metrics
  3. Experiment Template (assets/templates/experiment_template.ipynb)

    • Parameterized notebook for experiment tracking

Using Templates

Copy template to project and customize:

cp ~/.claude/skills/notebook-ml-architect/assets/templates/classification_template.ipynb ./my_experiment.ipynb

Or generate programmatically with modifications.

Reproducibility Checklist

Required Elements

  1. Random Seeds Use the reproducibility header snippet:

    # Copy from assets/snippets/reproducibility_header.py
    
  2. Environment Capture

    import sys
    print(f"Python: {sys.version}")
    for pkg in ['numpy', 'pandas', 'sklearn', 'torch']:
        try:
            mod = __import__(pkg)
            print(f"{pkg}: {mod.__version__}")
        except ImportError:
            pass
    
  3. Dependency File

    pip freeze > requirements.txt
    # Or for conda:
    conda env export > environment.yml
    
  4. Data Versioning

    • Record data source, download date, preprocessing steps
    • Use relative paths from project root
    • Consider DVC for large datasets

MCP Tool Usage

Context7 – Library API Lookups

When you need accurate API information:

1. Call resolve-library-id with library name
2. Call get-library-docs with the returned ID and topic

Examples:

  • sklearn train_test_split parameters
  • papermill execute_notebook options
  • nbformat cell structure

Exa Search – Current Best Practices

When you need up-to-date recommendations:

  • Use web_search_exa for discovery
  • Use crawling_exa to pull full content from good URLs
  • Use deep_search_exa for focused queries

Examples:

  • “PyTorch reproducibility best practices 2024”
  • “How to handle class imbalance”
  • “MLflow notebook integration”

GitHub Search – Real-World Patterns

When you need to see how others do it:

searchGitHub with:
- query: specific code pattern
- language: ["Python"]
- path: ".ipynb" for notebooks

Examples:

  • Production notebook seeding patterns
  • Evaluation metric implementations
  • Config management in notebooks

Script Reference

analyze_notebook.py

Parse notebook and extract structure:

python scripts/analyze_notebook.py <notebook.ipynb> [--output json|text]

Output includes:

  • Cell counts by type
  • Import statements
  • Function/class definitions
  • Detected issues

run_notebook.py

Execute notebook with parameters:

python scripts/run_notebook.py input.ipynb output.ipynb \
  --params '{"learning_rate": 0.01, "epochs": 100}' \
  --timeout 3600

convert_to_script.py

Extract Python from notebook:

python scripts/convert_to_script.py notebook.ipynb output.py \
  --include-markdown \
  --group-by-sections \
  --add-main

Common Issues and Fixes

Data Leakage

Problem: Preprocessing on full dataset before split

# BAD
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)  # Fits on all data
X_train, X_test = train_test_split(X_scaled)

Fix: Split first, fit on train only

# GOOD
X_train, X_test = train_test_split(X)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)  # Transform only

Hidden State

Problem: Variables from previous runs affect results

# Cell 1 run multiple times
results.append(model.score(X_test, y_test))  # results grows each run

Fix: Initialize state in cell

results = []  # Always start fresh
results.append(model.score(X_test, y_test))

Missing Seeds

Problem: Different results each run

X_train, X_test = train_test_split(X, y)  # Random each time

Fix: Set seeds explicitly

SEED = 42
X_train, X_test = train_test_split(X, y, random_state=SEED)