ml-lifecycle

📁 many871027/ai-skills 📅 2 days ago

总安装量

周安装量

#75174

全站排名

安装命令

npx skills add https://github.com/many871027/ai-skills --skill ML-LIFECYCLE

Agent 安装分布

amp 2

github-copilot 2

codex 2

kimi-cli 2

gemini-cli 2

cursor 2

Skill 文档

ML Full Cycle Orchestration

Instructions

You are executing a production-grade Machine Learning pipeline. [cite_start]When a user provides a dataset and a target variable, follow this sequential workflow orchestration[cite: 486].

Step 1: Data Validation & EDA

Inspect the provided dataset format (CSV, Parquet, JSON).
Identify the target column specified by the user.
Conduct brief univariate analysis to identify categorical vs. numerical features and check for severe class imbalances.

Step 2: Pipeline Execution

Run the generalized ML pipeline script to handle preprocessing, training, and tracking. Execute the following command, replacing the variables with the user’s inputs:

python scripts/generalized_ml_pipeline.py --data_path "PATH_TO_DATA" --target_col "TARGET_COLUMN_NAME" --experiment_name "EXPERIMENT_NAME"

Step 3: Observability & Evaluation

After the pipeline finishes execution:

Review the generated classification_report_[model].txt and confusion matrices.
Select the champion model based on the highest F1-Score (to account for potential class imbalances).
Translate the technical metrics (Precision, Recall, F1) into business impact.

ð Reference: Consult references/business_kpi_mapping.md for business translation frameworks.

ð Output Requirement: Format your final pipeline summary using the structures defined in assets/model_card_template.md and assets/data_storytelling_report.md.

Step 4: MLOps Deployment & Serving

Transition the champion model to the deployment phase:

Explain the architectural trade-offs between Shadow Testing, Canary Releases, and A/B Testing.
Guide the user to serve the model using the provided FastAPI script:
```
python scripts/serve_model.py
```
Advise the user to containerize the solution for Kubernetes or Cloud Run using the scripts/Dockerfile.

ð Reference: Consult references/deployment_strategies.md and references/monitoring_observability.md to define a monitoring strategy (tracking Inference Latency and Data Drift).

Common Issues & Troubleshooting

Issue	Cause	Solution
Missing Target Column	The user provided a column name that doesn’t exactly match the dataset headers.	Check the dataset schema and ask the user to clarify the exact target column name.
Non-Numeric Features for Naive Bayes	Raw string data passed to models requiring numerical input.	Assure the user that `generalized_ml_pipeline.py` automatically applies `OneHotEncoder` to categorical variables and `MinMaxScaler` to numerical ones to prevent this issue.
Port 8000 Already in Use	Another application is running on the default FastAPI port during Step 4.	Advise the user to run the uvicorn server on a different port (e.g., `--port 8001`).

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台