ml-lifecycle
npx skills add https://github.com/many871027/ai-skills --skill ML-LIFECYCLE
Agent 安装分布
Skill 文档
ML Full Cycle Orchestration
Instructions
You are executing a production-grade Machine Learning pipeline. [cite_start]When a user provides a dataset and a target variable, follow this sequential workflow orchestration[cite: 486].
Step 1: Data Validation & EDA
- Inspect the provided dataset format (CSV, Parquet, JSON).
- Identify the target column specified by the user.
- Conduct brief univariate analysis to identify categorical vs. numerical features and check for severe class imbalances.
Step 2: Pipeline Execution
Run the generalized ML pipeline script to handle preprocessing, training, and tracking. Execute the following command, replacing the variables with the user’s inputs:
python scripts/generalized_ml_pipeline.py --data_path "PATH_TO_DATA" --target_col "TARGET_COLUMN_NAME" --experiment_name "EXPERIMENT_NAME"
Step 3: Observability & Evaluation
After the pipeline finishes execution:
- Review the generated
classification_report_[model].txtand confusion matrices. - Select the champion model based on the highest F1-Score (to account for potential class imbalances).
- Translate the technical metrics (Precision, Recall, F1) into business impact.
ð Reference: Consult
references/business_kpi_mapping.mdfor business translation frameworks.
ð Output Requirement: Format your final pipeline summary using the structures defined in
assets/model_card_template.mdandassets/data_storytelling_report.md.
Step 4: MLOps Deployment & Serving
Transition the champion model to the deployment phase:
-
Explain the architectural trade-offs between Shadow Testing, Canary Releases, and A/B Testing.
-
Guide the user to serve the model using the provided FastAPI script:
python scripts/serve_model.py -
Advise the user to containerize the solution for Kubernetes or Cloud Run using the
scripts/Dockerfile.
ð Reference: Consult
references/deployment_strategies.mdandreferences/monitoring_observability.mdto define a monitoring strategy (tracking Inference Latency and Data Drift).
Common Issues & Troubleshooting
| Issue | Cause | Solution |
|---|---|---|
| Missing Target Column | The user provided a column name that doesn’t exactly match the dataset headers. | Check the dataset schema and ask the user to clarify the exact target column name. |
| Non-Numeric Features for Naive Bayes | Raw string data passed to models requiring numerical input. | Assure the user that generalized_ml_pipeline.py automatically applies OneHotEncoder to categorical variables and MinMaxScaler to numerical ones to prevent this issue. |
| Port 8000 Already in Use | Another application is running on the default FastAPI port during Step 4. | Advise the user to run the uvicorn server on a different port (e.g., --port 8001). |