data-analyst

📁 borghei/claude-skills 📅 Jan 24, 2026
29
总安装量
29
周安装量
#7198
全站排名
安装命令
npx skills add https://github.com/borghei/claude-skills --skill data-analyst

Agent 安装分布

claude-code 20
opencode 20
gemini-cli 15
antigravity 13
openclaw 12
codex 12

Skill 文档

Data Analyst

Expert-level data analysis for business insights.

Core Competencies

  • SQL and database querying
  • Data visualization
  • Statistical analysis
  • Business intelligence
  • Data storytelling
  • Dashboard development
  • Reporting automation
  • Stakeholder communication

SQL Mastery

Query Patterns

Aggregation:

SELECT
    date_trunc('month', created_at) as month,
    COUNT(*) as total_orders,
    COUNT(DISTINCT customer_id) as unique_customers,
    SUM(amount) as total_revenue,
    AVG(amount) as avg_order_value
FROM orders
WHERE created_at >= '2024-01-01'
GROUP BY 1
ORDER BY 1;

Window Functions:

SELECT
    customer_id,
    order_date,
    amount,
    SUM(amount) OVER (PARTITION BY customer_id ORDER BY order_date) as running_total,
    ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date) as order_number,
    LAG(amount) OVER (PARTITION BY customer_id ORDER BY order_date) as previous_order
FROM orders;

CTEs for Clarity:

WITH monthly_metrics AS (
    SELECT
        date_trunc('month', created_at) as month,
        SUM(amount) as revenue
    FROM orders
    GROUP BY 1
),
growth_calc AS (
    SELECT
        month,
        revenue,
        LAG(revenue) OVER (ORDER BY month) as prev_revenue
    FROM monthly_metrics
)
SELECT
    month,
    revenue,
    ROUND((revenue - prev_revenue) / prev_revenue * 100, 1) as growth_pct
FROM growth_calc;

Cohort Analysis:

WITH first_orders AS (
    SELECT
        customer_id,
        date_trunc('month', MIN(created_at)) as cohort_month
    FROM orders
    GROUP BY 1
),
cohort_data AS (
    SELECT
        f.cohort_month,
        date_trunc('month', o.created_at) as order_month,
        COUNT(DISTINCT o.customer_id) as customers
    FROM orders o
    JOIN first_orders f ON o.customer_id = f.customer_id
    GROUP BY 1, 2
)
SELECT
    cohort_month,
    order_month,
    EXTRACT(MONTH FROM AGE(order_month, cohort_month)) as months_since_cohort,
    customers
FROM cohort_data
ORDER BY 1, 2;

Query Optimization

Use EXPLAIN:

EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 123;

Best Practices:

  • Use indexes on filtered columns
  • Avoid SELECT * in production
  • Use LIMIT for exploratory queries
  • Filter early, aggregate late
  • Use appropriate data types

Data Visualization

Chart Selection Guide

Data Type Best Chart Alternative
Trend over time Line chart Area chart
Part of whole Pie/Donut Stacked bar
Comparison Bar chart Column chart
Distribution Histogram Box plot
Correlation Scatter plot Heatmap
Geographic Map Choropleth

Visualization Best Practices

Do:

  • Start Y-axis at zero (for bars)
  • Use consistent colors
  • Label axes clearly
  • Include context (benchmarks, targets)
  • Order categories meaningfully

Don’t:

  • Use 3D charts
  • Use more than 5-7 colors
  • Truncate axes misleadingly
  • Clutter with gridlines
  • Use pie charts for many categories

Dashboard Layout

┌─────────────────────────────────────────────────────────────┐
│  EXECUTIVE SUMMARY                                          │
│  [KPI 1: $X]  [KPI 2: X%]  [KPI 3: X]  [KPI 4: X%]         │
├─────────────────────────────────────────────────────────────┤
│  TRENDS                          │  BREAKDOWN               │
│  [Line Chart - Primary Metric]   │  [Bar Chart - Segments]  │
│                                  │                          │
├──────────────────────────────────┼──────────────────────────┤
│  COMPARISON                      │  DETAIL TABLE            │
│  [Bar Chart - vs Target/LY]      │  [Top N with metrics]    │
│                                  │                          │
└──────────────────────────────────┴──────────────────────────┘

Statistical Analysis

Descriptive Statistics

import pandas as pd
import numpy as np

def describe_data(df, column):
    stats = {
        'count': df[column].count(),
        'mean': df[column].mean(),
        'median': df[column].median(),
        'std': df[column].std(),
        'min': df[column].min(),
        'max': df[column].max(),
        'q25': df[column].quantile(0.25),
        'q75': df[column].quantile(0.75),
        'skewness': df[column].skew(),
        'kurtosis': df[column].kurtosis()
    }
    return stats

Hypothesis Testing

from scipy import stats

# T-test: Compare two groups
def compare_groups(group_a, group_b, alpha=0.05):
    stat, p_value = stats.ttest_ind(group_a, group_b)

    result = {
        't_statistic': stat,
        'p_value': p_value,
        'significant': p_value < alpha,
        'effect_size': (group_a.mean() - group_b.mean()) / np.sqrt(
            (group_a.std()**2 + group_b.std()**2) / 2
        )
    }
    return result

# Chi-square: Test independence
def test_independence(observed, alpha=0.05):
    stat, p_value, dof, expected = stats.chi2_contingency(observed)

    return {
        'chi2': stat,
        'p_value': p_value,
        'degrees_of_freedom': dof,
        'significant': p_value < alpha
    }

Regression Analysis

from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_absolute_error

def simple_regression(X, y):
    model = LinearRegression()
    model.fit(X.reshape(-1, 1), y)

    predictions = model.predict(X.reshape(-1, 1))

    return {
        'coefficient': model.coef_[0],
        'intercept': model.intercept_,
        'r_squared': r2_score(y, predictions),
        'mae': mean_absolute_error(y, predictions)
    }

Business Analysis

Analysis Framework

# Analysis: [Topic]

## Business Question
[What are we trying to answer?]

## Hypothesis
[What do we expect to find?]

## Data Sources
- [Source 1]: [Description]
- [Source 2]: [Description]

## Methodology
1. [Step 1]
2. [Step 2]
3. [Step 3]

## Findings

### Finding 1: [Title]
[Description with supporting data]

### Finding 2: [Title]
[Description with supporting data]

## Recommendations
1. [Recommendation]: [Expected impact]
2. [Recommendation]: [Expected impact]

## Limitations
- [Limitation 1]
- [Limitation 2]

## Next Steps
- [Action item]

Key Business Metrics

Acquisition:

  • Customer Acquisition Cost (CAC)
  • Cost per Lead (CPL)
  • Conversion Rate

Engagement:

  • Daily/Monthly Active Users
  • Session Duration
  • Feature Adoption

Retention:

  • Churn Rate
  • Retention Rate
  • Net Revenue Retention

Revenue:

  • Monthly Recurring Revenue (MRR)
  • Average Revenue Per User (ARPU)
  • Lifetime Value (LTV)

Data Storytelling

Presentation Structure

1. CONTEXT
   - Why does this matter?
   - What question are we answering?

2. KEY FINDING
   - Lead with the insight
   - Make it memorable

3. EVIDENCE
   - Show the data
   - Use effective visuals

4. IMPLICATIONS
   - What does this mean?
   - So what?

5. RECOMMENDATIONS
   - What should we do?
   - Clear next steps

Insight Template

## [Headline: Action-oriented finding]

**What:** [One sentence description of the finding]

**So What:** [Why this matters to the business]

**Now What:** [Recommended action]

**Evidence:**
[Chart or data supporting the finding]

**Confidence:** [High/Medium/Low]

Reference Materials

  • references/sql_patterns.md – Advanced SQL queries
  • references/visualization.md – Chart selection guide
  • references/statistics.md – Statistical methods
  • references/storytelling.md – Presentation best practices

Scripts

# Data profiler
python scripts/data_profiler.py --table orders --output profile.html

# SQL query analyzer
python scripts/query_analyzer.py --query query.sql --explain

# Dashboard generator
python scripts/dashboard_gen.py --config dashboard.yaml

# Report automation
python scripts/report_gen.py --template monthly --output report.pdf