databricks-repl-consolidate
4
总安装量
4
周安装量
#48823
全站排名
安装命令
npx skills add https://github.com/wedneyyuri/databricks-repl --skill databricks-repl-consolidate
Agent 安装分布
opencode
4
gemini-cli
4
github-copilot
4
codex
4
kimi-cli
4
cursor
4
Skill 文档
Session Consolidation
Produce a single, clean .py file from a Databricks REPL session by reading session.json and the .cmd.py files.
Workflow
- Read session.json â the
stepsarray contains the ordered list of steps with status and command file paths. - Read each
.cmd.pyfile â in step order, skipping failed steps (only successful steps survive). - Strip REPL boilerplate â remove or convert REPL-specific calls (see Boilerplate Rules).
- Deduplicate â if a step was retried after an error, only keep the final successful version.
- Resolve imports â collect all imports from across cells and deduplicate them at the top of the file.
- Write the output â a single
.pyfile with a clear structure.
Output Structure
"""
Consolidated from session: <session_name>
Source: <session_file_path>
Steps: <N> (of <total> attempted)
"""
# --- Dependencies ---
# Requires: scikit-learn, xgboost
# --- Imports ---
import os
import json
from sklearn.ensemble import RandomForestClassifier
# ...
# --- Step 1: load_data ---
df = spark.read.table("catalog.schema.table")
# ...
# --- Step 2: feature_engineering ---
# ...
# --- Step 3: train ---
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
joblib.dump(model, "/Volumes/catalog/schema/vol/model.pkl")
# ...
# --- Step 4: evaluate ---
# ...
Boilerplate Rules
Transform REPL-specific code into clean Python:
| REPL Code | Consolidated Form |
|---|---|
%pip install xgboost |
Move to # Requires: xgboost in header |
sub_llm(prompt, ...) |
Keep as-is (it’s business logic) |
sub_llm_batch(prompts, ...) |
Keep as-is (it’s business logic) |
Key distinctions:
%pip installâ collect into a# Requires:header commentsub_llm()/sub_llm_batch()â keep unchanged, these are meaningful business logicprint()statements used only for REPL feedback â removeprint()statements that display meaningful results â keep
Deduplication Rules
Sessions often contain retries after errors. When multiple steps share the same tag:
- Find all steps with the same tag in
session.json - Keep only the last one with
status: "Finished" - Discard earlier failed attempts
When adjacent steps do the same thing (e.g., loading the same table with slight variations), keep only the final version.
Import Resolution
- Scan all surviving steps for
importandfrom ... importstatements - Deduplicate â same import appearing in multiple steps becomes one line
- Place all imports at the top of the file, after the docstring and dependencies comment
- Remove imports that are no longer used after boilerplate stripping
Before / After Example
Before (3 separate .cmd.py files)
001_install.cmd.py:
%pip install scikit-learn pandas
002_load.cmd.py:
import pandas as pd
df = spark.read.table("catalog.schema.customers").toPandas()
print(f"Loaded {len(df)} rows")
003_train.cmd.py:
from sklearn.ensemble import RandomForestClassifier
import joblib
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(df[features], df["label"])
joblib.dump(model, "/Volumes/catalog/schema/vol/model.pkl")
print("Training complete")
After (consolidated .py)
"""
Consolidated from session: customer-classifier
Source: ./session.json
Steps: 3 (of 3 attempted)
"""
# --- Dependencies ---
# Requires: scikit-learn, pandas
# --- Imports ---
import joblib
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
# --- Step 1: load ---
df = spark.read.table("catalog.schema.customers").toPandas()
# --- Step 2: train ---
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(df[features], df["label"])
joblib.dump(model, "/Volumes/catalog/schema/vol/model.pkl")
Usage
- Ensure
session.jsonhas astepsarray with at least one successful step - Read
session.jsonto understand the session structure - Read each
.cmd.pyfile referenced in the steps - Apply the boilerplate rules, deduplication, and import resolution
- Write the consolidated file (default:
<session_name>.pyin the repo root) - Review the output for correctness â automated consolidation may miss nuances in variable dependencies across steps