data-exploration-visualization
npx skills add https://github.com/liangdabiao/claude-data-analysis-ultra-main --skill data-exploration-visualization
Agent 安装分布
Skill 文档
æ°æ®æ¢ç´¢å¯è§åæè½
æè½æ¦è¿°
æ°æ®æ¢ç´¢å¯è§åæè½æ¯ä¸ä¸ªåºäºãæ°æ®åæåå¥åè¯ã第2课ç论çèªå¨åEDAå·¥å ·å ï¼æä¾ä»æ°æ®å è½½å°ä¸ä¸åææ¥åçæç宿´è§£å³æ¹æ¡ã该æè½éæäºæå è¿çæ°æ®æ¢ç´¢ãå¯è§ååæºå¨å¦ä¹ ææ¯ï¼å¸®å©ç¨æ·å¿«éæ·±å ¥çè§£æ°æ®ç¹å¾åè§å¾ã
æ ¸å¿åè½
ð æºè½æ°æ®æ¢ç´¢
- èªå¨æ°æ®è¯æ: æ£æµæ°æ®è´¨éé®é¢ãå¼å¸¸å¼åç¼ºå¤±å¼æ¨¡å¼
- ç»è®¡æè¿°åæ: çæå ¨é¢çç»è®¡æè¦ååå¸ç¹å¾
- ç¸å ³æ§åæ: è¯å«ç¹å¾é´å ³ç³»åä¾èµæ¨¡å¼
- æ°æ®è´¨éæ¥å: ä¸ä¸çº§æ°æ®è´¨éè¯ä¼°å建议
ð ä¸ä¸å¯è§åçæ
- åå¸å¯è§å: ç´æ¹å¾ãå¯åº¦å¾ãå°æç´å¾ãQQå¾
- ç»è®¡å¯è§å: 箱线å¾ã误差æ¡å¾ã置信åºé´å¾
- å ³ç³»å¯è§å: æ£ç¹å¾ãçå¾ãé 对å¾ã3Dæ£ç¹å¾
- ä¸é¨å¾è¡¨: ROCæ²çº¿ãæ··æ·ç©éµãç¹å¾éè¦æ§å¾
- 交äºå¼å¾è¡¨: Plotly驱å¨ç卿å¯è§å
ð¥ å»çæ°æ®ä¸ç²¾
- å»çç¼ç æ¯æ: ICD-10ãSNOMED CTçå»çæ å
- çç©æ è®°ç©åæ: ä¸é¨çå»å¦ææ å¤ç
- è¯ææ¨¡åæå»º: å»ç颿µæ¨¡ååè¯ä¼°
- å»å¦å¯è§£éæ§: 符åå»å¦å®è·µçè§£éæ¡æ¶
ð¤ èªå¨å建模è¯ä¼°
- å¤ç®æ³æ¯æ: é»è¾åå½ãéæºæ£®æãXGBoostãç¥ç»ç½ç»
- èªå¨ç¹å¾å·¥ç¨: ç¹å¾éæ©ã转æ¢åä¼å
- è¶ åæ°è°ä¼: ç½æ ¼æç´¢åè´å¶æ¯ä¼å
- 模åå¯è§£éæ§: SHAPå¼ãç¹å¾éè¦æ§ãé¨åä¾èµå¾
ð ä¸ä¸æ¥åçæ
- HTMLæ¥å: å¯å表级交äºå¼åææ¥å
- PDF导åº: é«è´¨éææ¡£æ ¼å¼è¾åº
- Markdownæ¯æ: è½»é级æ¥åæ ¼å¼
- èªå®ä¹æ¨¡æ¿: å¯é ç½®çæ¥åæ¨¡æ¿ç³»ç»
使ç¨åºæ¯
ð¥ å»çå¥åº·é¢å
- ç¾ç 颿µ: åºäºä¸´åºæ°æ®çç¾ç é£é©é¢æµ
- è¯æè¾ å©: å»å¦å½±å忣éªç»æåæ
- æµè¡ç å¦ç ç©¶: ç«æ æ°æ®åæåè¶å¿é¢æµ
- 临åºè¯éª: è¯éªæ°æ®ç»è®¡åæåå¯è§å
ð° éè飿§é¢å
- ä¿¡ç¨è¯ä¼°: 个人åä¼ä¸ä¿¡ç¨é£é©å»ºæ¨¡
- æ¬ºè¯æ£æµ: å¼å¸¸äº¤ææ¨¡å¼è¯å«
- æèµåæ: å¸åºè¶å¿åé£é©è¯ä¼°
- åè§æ¥å: çç®¡è¦æ±çåææ¥å
ð çµåé¶å®é¢å
- ç¨æ·åæ: 客æ·è¡ä¸ºåå好åæ
- éå®é¢æµ: éé颿µååºåä¼å
- æ¨èç³»ç»: 个æ§åæ¨èç®æ³è¯ä¼°
- å¸åºç»å: 客æ·ç¾¤ä½åæåç»å
ð ç§ç æè²é¢å
- 妿¯ç ç©¶: æ°æ®é©±å¨ç妿¯ç ç©¶æ¯æ
- æå¦æ¡ä¾: æ°æ®åææå¦åå®è·µ
- 论æåä½: ç ç©¶æ°æ®åæåå¾è¡¨å¶ä½
- æè½å¹è®: æ°æ®ç§å¦æè½å¹è®å·¥å ·
å·¥å ·ä½¿ç¨æå
å¿«éå¼å§
-
åºç¡æ°æ®æ¢ç´¢
from scripts.eda_analyzer import EDAAnalyzer # åå§ååæå¨ analyzer = EDAAnalyzer() # å è½½æ°æ®å¹¶èªå¨åæ data = analyzer.load_data('data.csv') report = analyzer.auto_eda(data) -
å¯è§åçæ
from scripts.visualizer import DataVisualizer # åå§åå¯è§åå¨ visualizer = DataVisualizer() # èªå¨çæææå¾è¡¨ charts = visualizer.auto_visualize(data) # çæç¹å®ç±»åå¾è¡¨ dist_plot = visualizer.plot_distribution(data, 'column_name') corr_heatmap = visualizer.plot_correlation(data) -
建模è¯ä¼°
from scripts.modeling_evaluator import ModelingEvaluator # åå§åå»ºæ¨¡å¨ modeler = ModelingEvaluator() # èªå¨å»ºæ¨¡åè¯ä¼° results = modeler.auto_modeling( data=data, target_col='target', algorithms=['logistic', 'rf', 'xgboost'] ) -
æ¥åçæ
from scripts.report_generator import ReportGenerator # çæå®æ´æ¥å generator = ReportGenerator() report = generator.generate_comprehensive_report( data=data, model_results=model_results, output_path='analysis_report.html' )
é«çº§åè½
-
å»çæ°æ®åæ
# å»çæ°æ®ç¹æ®å¤ç from scripts.medical_analyzer import MedicalDataAnalyzer medical_analyzer = MedicalDataAnalyzer() medical_report = medical_analyzer.analyze_medical_data( data=medical_df, diagnosis_col='diagnosis', biomarker_cols=['biomarker1', 'biomarker2'] ) -
交äºå¼ä»ªè¡¨æ¿
# çæäº¤äºå¼ä»ªè¡¨æ¿ dashboard = visualizer.create_dashboard( data=data, charts=['distribution', 'correlation', 'model_performance'] ) -
æ¹éæ°æ®å¤ç
# æ¹éåæå¤ä¸ªæ°æ®é batch_results = analyzer.batch_analyze( data_files=['data1.csv', 'data2.csv'], analysis_types=['eda', 'modeling', 'visualization'] )
ææ¯ä¾èµ
æ ¸å¿åº
- pandas (>=1.3.0): æ°æ®å¤çååæ
- numpy (>=1.20.0): æ°å¼è®¡ç®
- scikit-learn (>=1.0.0): æºå¨å¦ä¹ ç®æ³
- xgboost (>=1.5.0): 梯度æåç®æ³
å¯è§ååº
- matplotlib (>=3.4.0): åºç¡ç»å¾
- seaborn (>=0.11.0): ç»è®¡å¯è§å
- plotly (>=5.0.0): 交äºå¼å¾è¡¨
ç»è®¡åæåº
- scipy (>=1.7.0): ç§å¦è®¡ç®
- statsmodels (>=0.13.0): ç»è®¡å»ºæ¨¡
æ¥åçæ
- jinja2 (>=3.0.0): 模æ¿å¼æ
- weasyprint: PDFçæ
æä½³å®è·µ
æ°æ®åå¤
- ç¡®ä¿æ°æ®æ ¼å¼è§èï¼CSVãExcelçï¼
- æ£æ¥æ°æ®ç¼ç ï¼é¿å 䏿乱ç
- å¤ç缺失å¼åå¼å¸¸å¼
- éªè¯æ°æ®ç±»ååæ ¼å¼
åææµç¨
- æ°æ®å è½½åæ£æ¥: ç¡®è®¤æ°æ®è´¨éå宿´æ§
- æ¢ç´¢æ§åæ: äºè§£æ°æ®åºæ¬ç¹å¾ååå¸
- å¯è§åæ¢ç´¢: éè¿å¾è¡¨åç°æ°æ®æ¨¡å¼
- é¢å¤ç: æ°æ®æ¸ æ´åç¹å¾å·¥ç¨
- 建模åæ: æå»ºåè¯ä¼°é¢æµæ¨¡å
- ç»æè§£é: æåæ´å¯åä¸å¡å»ºè®®
- æ¥åçæ: å建ä¸ä¸åææ¥å
å¯è§åéæ©
- ååéåæ: ç´æ¹å¾ã箱线å¾ãå°æç´å¾
- ååéåæ: æ£ç¹å¾ãåç»ç®±çº¿å¾
- å¤åéåæ: çå¾ãé 对å¾ã3Då¾
- æ¶é´åºå: æ¶é´çº¿å¾ãè¶å¿å¾
- å°çæ°æ®: å°å¾å¯è§å
ç¤ºä¾æ°æ®
å»çæ°æ®ç¤ºä¾
# ä¹³è
ºæ£æ¥æ°æ®ç¤ºä¾
medical_data = {
'patient_id': ['P001', 'P002', ...],
'diagnosis': ['Malignant', 'Benign', ...],
'radius_mean': [17.99, 20.57, ...],
'texture_mean': [10.38, 17.77, ...],
'perimeter_mean': [122.8, 132.9, ...]
}
éèæ°æ®ç¤ºä¾
# ä¿¡ç¨è¯åæ°æ®ç¤ºä¾
financial_data = {
'customer_id': ['C001', 'C002', ...],
'credit_score': [720, 680, ...],
'income': [85000, 62000, ...],
'debt_ratio': [0.15, 0.32, ...],
'default': [0, 1, ...]
}
常è§é®é¢
Q: å¦ä½å¤çä¸ææ°æ®ï¼
A: æè½èªå¨æ£æµåå¤ç䏿ç¼ç ï¼æ¯æUTF-8ãGBKçå¤ç§ç¼ç æ ¼å¼ã
Q: æ¯æåªäºæ°æ®æ ¼å¼ï¼
A: æ¯æCSVãExcelãJSONãParquetçå¸¸è§æ ¼å¼ï¼ä¹æ¯ææ°æ®åºè¿æ¥ã
Q: å¦ä½èªå®ä¹å¯è§åæ ·å¼ï¼
A: å¯ä»¥éè¿é ç½®æä»¶èªå®ä¹é¢è²ãåä½ãå¾è¡¨å¸å±çæ ·å¼åæ°ã
Q: 模ååç¡®æ§å¦ä½ä¿è¯ï¼
A: æè½éç¨äº¤åéªè¯ãå¤ç§è¯ä¼°ææ åéææ¹æ³æ¥ç¡®ä¿æ¨¡åçå¯é æ§åæ³åè½åã
æè½ç¹è²
â æºè½åç¨åº¦é« – 90%çEDAå·¥ä½èªå¨å â ä¸ä¸æ§çªåº – å»çæ°æ®ä¸ç²¾å¤ç â å¯è§åä¸°å¯ – 20+ç§ä¸ä¸å¾è¡¨ç±»å â 建模è½å强 – å¤ç®æ³éæåèªå¨è°ä¼ â æ¥åè´¨éé« – å¯åè¡¨çº§åææ¥å â æç¨æ§å¥½ – ç®åAPIï¼å¤ææµç¨èªå¨å â æ©å±æ§å¼º – 模ååè®¾è®¡ï¼æäºå®å¶æ©å±
æ´æ°æ¥å¿
v1.0.0 (2025-01-19)
- åå§çæ¬åå¸
- 宿´çEDAåè½
- åºç¡å¯è§åæ¯æ
- é»è¾åå½å»ºæ¨¡
- HTMLæ¥åçæ
æªæ¥è®¡å
- æ¯ææ´å¤æºå¨å¦ä¹ ç®æ³
- å¢å 深度å¦ä¹ æ¨¡åæ¯æ
- æ©å±å»çæ°æ®åæåè½
- äºç«¯é¨ç½²æ¯æ
- 宿¶æ°æ®åæè½å
éè¿è¿ä¸ªæè½ï¼æ¨å¯ä»¥å¤§å¹ æåæ°æ®åææçï¼ä»é夿§å·¥ä½ä¸è§£æ¾åºæ¥ï¼ä¸æ³¨äºæ´å¯åç°åå³çæ¯æã