notebook-ml-architect
npx skills add https://github.com/bjornmelin/dev-skills --skill notebook-ml-architect
Agent 安装分布
Skill 文档
Notebook ML Architect
Expert guidance for production-quality ML notebooks.
Quick Reference
| Operation | Use Case |
|---|---|
| audit | Analyze notebook for anti-patterns, leakage, reproducibility issues |
| refactor | Transform notebook into modular Python pipeline |
| template | Generate new notebook from EDA/classification/experiment template |
| report | Create markdown summary from executed notebook |
| convert | Extract Python script from notebook |
Audit Workflow
When auditing a notebook:
- Read the notebook using the Read tool
- Check structure against ml-workflow-guide.md
- Detect anti-patterns using anti-patterns.md
- Check for data leakage using leakage-checklist.md
- Run analysis script if deeper inspection needed:
python scripts/analyze_notebook.py <notebook.ipynb>
Audit Checklist
- Execution order: Cells numbered sequentially (no gaps, no out-of-order)
- Random seeds: Set early (np.random.seed, torch.manual_seed, random.seed)
- Imports at top: All imports in first code cell(s)
- No hardcoded paths: Use relative paths or config variables
- Train/test split: Clear separation before any modeling
- No data leakage: Pre-processing after split, no test data peeking
- Modularization: Functions/classes for reusable logic
- Dependencies documented: requirements.txt or environment.yml referenced
Severity Levels
- CRITICAL: Data leakage, missing train/test split, results unreproducible
- HIGH: No seeds, hardcoded paths, execution order issues
- MEDIUM: Missing modularization, no dependency docs
- LOW: Naming conventions, missing comments, style issues
Refactoring Guide
Transform notebooks into production pipelines:
Step 1: Identify Sections
Look for markdown headers that indicate logical sections:
- Data loading
- Preprocessing
- Feature engineering
- Model definition
- Training
- Evaluation
Step 2: Extract Functions
Convert repeated or complex cell code into functions:
# Before: inline code
df = pd.read_csv('data.csv')
df = df.dropna()
df['feature'] = df['a'] * df['b']
# After: function
def load_and_prepare_data(path: str) -> pd.DataFrame:
df = pd.read_csv(path)
df = df.dropna()
df['feature'] = df['a'] * df['b']
return df
Step 3: Create Module Structure
project/
âââ data.py # Data loading and preprocessing
âââ features.py # Feature engineering
âââ model.py # Model definition
âââ train.py # Training loop
âââ evaluate.py # Evaluation metrics
âââ config.py # Configuration parameters
âââ main.py # Pipeline entry point
Step 4: Use convert script
python scripts/convert_to_script.py notebook.ipynb output.py --group-by-sections
Template Generation
Generate new notebooks from templates:
Available Templates
-
EDA Template (
assets/templates/eda_template.ipynb)- Data loading, basic info, missing values, distributions, correlations
-
Classification Template (
assets/templates/classification_template.ipynb)- Full supervised learning pipeline with evaluation metrics
-
Experiment Template (
assets/templates/experiment_template.ipynb)- Parameterized notebook for experiment tracking
Using Templates
Copy template to project and customize:
cp ~/.claude/skills/notebook-ml-architect/assets/templates/classification_template.ipynb ./my_experiment.ipynb
Or generate programmatically with modifications.
Reproducibility Checklist
Required Elements
-
Random Seeds Use the reproducibility header snippet:
# Copy from assets/snippets/reproducibility_header.py -
Environment Capture
import sys print(f"Python: {sys.version}") for pkg in ['numpy', 'pandas', 'sklearn', 'torch']: try: mod = __import__(pkg) print(f"{pkg}: {mod.__version__}") except ImportError: pass -
Dependency File
pip freeze > requirements.txt # Or for conda: conda env export > environment.yml -
Data Versioning
- Record data source, download date, preprocessing steps
- Use relative paths from project root
- Consider DVC for large datasets
MCP Tool Usage
Context7 – Library API Lookups
When you need accurate API information:
1. Call resolve-library-id with library name
2. Call get-library-docs with the returned ID and topic
Examples:
- sklearn train_test_split parameters
- papermill execute_notebook options
- nbformat cell structure
Exa Search – Current Best Practices
When you need up-to-date recommendations:
- Use
web_search_exafor discovery - Use
crawling_exato pull full content from good URLs - Use
deep_search_exafor focused queries
Examples:
- “PyTorch reproducibility best practices 2024”
- “How to handle class imbalance”
- “MLflow notebook integration”
GitHub Search – Real-World Patterns
When you need to see how others do it:
searchGitHub with:
- query: specific code pattern
- language: ["Python"]
- path: ".ipynb" for notebooks
Examples:
- Production notebook seeding patterns
- Evaluation metric implementations
- Config management in notebooks
Script Reference
analyze_notebook.py
Parse notebook and extract structure:
python scripts/analyze_notebook.py <notebook.ipynb> [--output json|text]
Output includes:
- Cell counts by type
- Import statements
- Function/class definitions
- Detected issues
run_notebook.py
Execute notebook with parameters:
python scripts/run_notebook.py input.ipynb output.ipynb \
--params '{"learning_rate": 0.01, "epochs": 100}' \
--timeout 3600
convert_to_script.py
Extract Python from notebook:
python scripts/convert_to_script.py notebook.ipynb output.py \
--include-markdown \
--group-by-sections \
--add-main
Common Issues and Fixes
Data Leakage
Problem: Preprocessing on full dataset before split
# BAD
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X) # Fits on all data
X_train, X_test = train_test_split(X_scaled)
Fix: Split first, fit on train only
# GOOD
X_train, X_test = train_test_split(X)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test) # Transform only
Hidden State
Problem: Variables from previous runs affect results
# Cell 1 run multiple times
results.append(model.score(X_test, y_test)) # results grows each run
Fix: Initialize state in cell
results = [] # Always start fresh
results.append(model.score(X_test, y_test))
Missing Seeds
Problem: Different results each run
X_train, X_test = train_test_split(X, y) # Random each time
Fix: Set seeds explicitly
SEED = 42
X_train, X_test = train_test_split(X, y, random_state=SEED)