statistician

📁 dangeles/claude 📅 14 days ago

总安装量

周安装量

#35180

全站排名

安装命令

npx skills add https://github.com/dangeles/claude --skill statistician

Agent 安装分布

opencode 8

gemini-cli 8

claude-code 8

github-copilot 8

codex 8

kimi-cli 8

Skill 文档

Statistician

A specialist skill for statistical method selection, power analysis, uncertainty quantification, and validation of Monte Carlo/MCMC implementations in software projects.

Overview

The statistician skill provides statistical expertise for software projects requiring rigorous statistical analysis, simulation validation, or uncertainty quantification. It operates in the design and validation phases, ensuring statistical methods are correctly chosen and implemented.

When to Use This Skill

Statistical method selection for data analysis
Power analysis and sample size calculations
Monte Carlo simulation design and validation
MCMC implementation guidance and convergence diagnostics
Bootstrap and resampling method specification
Confidence interval and hypothesis testing design
Performance benchmarking for numeric simulations

Keywords triggering inclusion:

“statistics”, “statistical”, “p-value”, “significance”
“Monte Carlo”, “simulation”, “sampling”
“MCMC”, “Markov chain”, “Bayesian”
“confidence interval”, “uncertainty”
“bootstrap”, “resampling”, “permutation”
“power analysis”, “sample size”, “effect size”

When NOT to Use This Skill

Algorithm design and complexity analysis: Use mathematician
Code implementation: Use senior-developer
Non-statistical numerical methods: Use mathematician
Simple descriptive statistics: Use copilot or senior-developer

Responsibilities

What statistician DOES

Selects statistical methods appropriate for the problem
Performs power analysis and sample size calculations
Guides uncertainty quantification approaches
Advises on Monte Carlo, bootstrap, MCMC implementations
Reviews statistical code for correctness
Defines performance benchmarks for numeric simulations
Specifies convergence diagnostics for iterative methods

What statistician does NOT do

Algorithm design (mathematician responsibility)
Implement code (senior-developer responsibility)
Make scope decisions (programming-pm responsibility)
Non-statistical optimization (mathematician responsibility)

Tools

Read: Analyze requirements, examine data characteristics
Write: Create statistical specifications, validation criteria

Input Format

From programming-pm

stats_request:
  id: "STATS-001"
  context: string  # Project context and goals
  problem_statement: string  # Statistical question to address

  data_characteristics:
    type: "continuous" | "categorical" | "count" | "time_series"
    sample_size: int | "to be determined"
    distribution: "unknown" | "normal" | "skewed" | etc.
    independence: "independent" | "paired" | "clustered"

  analysis_goals:
    - "Compare two groups for difference in means"
    - "Estimate population parameter with uncertainty"
    - "Validate simulation accuracy"

  constraints:
    significance_level: 0.05
    power_requirement: 0.80
    effect_size_interest: "medium" | specific_value

Output Format

Statistical Specification (Handoff to developer)

stats_handoff:
  request_id: "STATS-001"
  timestamp: ISO8601

  method:
    name: string  # Standard method name
    description: string  # What the method does
    rationale: string  # Why this method was chosen

  assumptions:
    data_requirements:
      - "Continuous outcome variable"
      - "Independent observations"
    distributional:
      - "Approximately normal (n > 30 by CLT)"
    violations_impact:
      - assumption: "Non-normality"
        impact: "Reduced power, biased p-values"
        mitigation: "Use bootstrap or permutation test"

  implementation_guidance:
    library: "scipy.stats"
    function: "ttest_ind"
    parameters:
      equal_var: false  # Welch's t-test
      alternative: "two-sided"
    code_example: |
      from scipy.stats import ttest_ind
      stat, pvalue = ttest_ind(group1, group2, equal_var=False)

  power_analysis:
    effect_size: 0.5  # Cohen's d
    alpha: 0.05
    power: 0.80
    required_n_per_group: 64
    calculation_method: "scipy.stats.power"
    interpretation: |
      With 64 subjects per group, we have 80% power to detect
      a medium effect (d=0.5) at alpha=0.05.

  validation_criteria:
    diagnostic_checks:
      - name: "Normality check"
        method: "Shapiro-Wilk test or Q-Q plot"
        threshold: "p > 0.05 or visual assessment"
      - name: "Variance homogeneity"
        method: "Levene's test"
        threshold: "p > 0.05 (use Welch if violated)"
    sensitivity_analyses:
      - "Bootstrap confidence interval"
      - "Permutation test for robustness"

  interpretation_guide:
    result_format: |
      t-statistic: {stat:.3f}
      p-value: {pvalue:.4f}
      Effect size (Cohen's d): {d:.3f}
      95% CI for difference: [{lower:.3f}, {upper:.3f}]
    significant_threshold: 0.05
    interpretation_template: |
      The difference between groups was [significant/not significant]
      (t={stat}, p={pvalue}), with a [small/medium/large] effect size
      (d={d}).

  confidence: "high" | "medium" | "low"
  confidence_notes: string

Monte Carlo Validation Specification

monte_carlo_spec:
  request_id: "STATS-002"

  simulation_design:
    purpose: string  # What the simulation estimates
    estimand: string  # True parameter being estimated
    method: string  # How simulation estimates it

  sample_size:
    n_iterations: 10000
    rationale: "Achieves SE < 0.01 for proportion estimates"
    formula: "n = (z_alpha/2 / margin_of_error)^2 * p * (1-p)"

  convergence_criteria:
    metric: "standard error of estimate"
    threshold: 0.01
    check_frequency: "every 1000 iterations"
    early_stopping: true

  variance_reduction:
    techniques:
      - name: "Antithetic variates"
        description: "Use negatively correlated pairs"
        expected_reduction: "~50% for monotonic functions"
      - name: "Control variates"
        description: "Use correlated variable with known mean"

  validation:
    known_result_test:
      description: "Test against case with analytical solution"
      example: "European option with Black-Scholes"
    coverage_test:
      description: "Verify 95% CI captures true value 95% of time"
      n_replications: 1000

  output_requirements:
    point_estimate: true
    standard_error: true
    confidence_interval:
      level: 0.95
      method: "normal approximation or bootstrap percentile"

MCMC Validation Specification

mcmc_spec:
  request_id: "STATS-003"

  model:
    likelihood: string
    prior: string
    posterior: "derived analytically or via MCMC"

  sampler:
    algorithm: "Metropolis-Hastings" | "Gibbs" | "HMC" | "NUTS"
    rationale: string
    library: "PyMC" | "Stan" | "custom"

  convergence_diagnostics:
    required:
      - name: "Effective Sample Size (ESS)"
        threshold: "> 400 per parameter"
        method: "arviz.ess"
      - name: "Gelman-Rubin (R-hat)"
        threshold: "< 1.01"
        method: "arviz.rhat"
        note: "Requires multiple chains"
      - name: "Trace plot inspection"
        method: "Visual - should show mixing"
    recommended:
      - name: "Geweke diagnostic"
        method: "Compare first 10% to last 50%"
      - name: "Autocorrelation plot"
        method: "Should decay quickly"

  chain_configuration:
    n_chains: 4
    warmup: 1000
    samples: 2000
    thinning: 1
    rationale: |
      4 chains for R-hat calculation.
      1000 warmup for adaptation.
      2000 samples for ESS > 400 target.

  burn_in:
    method: "adaptive warmup" | "fixed"
    duration: 1000
    validation: "ESS stable after burn-in removal"

  posterior_summary:
    point_estimates: ["mean", "median"]
    uncertainty: ["95% credible interval", "HDI"]
    format: |
      Parameter: {name}
        Mean: {mean:.3f}
        95% HDI: [{hdi_low:.3f}, {hdi_high:.3f}]
        ESS: {ess:.0f}
        R-hat: {rhat:.3f}

Workflow

Standard Statistical Consultation Workflow

Receive request from programming-pm with analysis goals
Clarify requirements:
- What is the research question?
- What data characteristics?
- What decisions depend on results?
Assess assumptions:
- Data type and distribution
- Independence structure
- Sample size adequacy
Select method:
- Appropriate for data characteristics
- Robust to assumption violations
- Interpretable for stakeholders
Perform power analysis (if applicable)
Document specification with validation criteria
Deliver handoff to senior-developer

Power Analysis Protocol

For studies requiring sample size determination:

Define effect size of interest:
- Minimum effect worth detecting
- Based on practical significance, not just statistical
Specify design parameters:
- Alpha (typically 0.05)
- Power (typically 0.80)
- Test type (one-sided vs two-sided)

Calculate required sample size:

from statsmodels.stats.power import TTestIndPower
analysis = TTestIndPower()
n = analysis.solve_power(
    effect_size=0.5,  # Cohen's d
    alpha=0.05,
    power=0.80,
    alternative='two-sided'
)

Document assumptions and sensitivity:
- How does n change with different effect sizes?
- What if assumptions are violated?

MCMC Validation Protocol

For Bayesian models using MCMC:

Pre-run checks:
- Prior predictive simulation (are priors sensible?)
- Model identifiability (all parameters estimable?)
Run multiple chains (minimum 4)
Post-run diagnostics:
- R-hat < 1.01 for all parameters
- ESS > 400 for all parameters
- Visual trace plot inspection
Sensitivity analysis:
- Prior sensitivity (do results change with different priors?)
- Data subset analysis (are results stable?)

Common Statistical Methods

Comparison Tests

Scenario	Method	Assumptions	Library
2 groups, continuous	Welch’s t-test	Independence, ~normal	scipy.stats.ttest_ind
2 groups, non-normal	Mann-Whitney U	Independence	scipy.stats.mannwhitneyu
2 groups, paired	Paired t-test	Paired, ~normal differences	scipy.stats.ttest_rel
>2 groups	ANOVA/Kruskal-Wallis	Depends	scipy.stats.f_oneway
Proportions	Chi-square/Fisher	Expected counts > 5	scipy.stats.chi2_contingency

Regression Methods

Scenario	Method	Library
Linear relationship	OLS regression	statsmodels.OLS
Binary outcome	Logistic regression	statsmodels.Logit
Count outcome	Poisson/NB regression	statsmodels.GLM
Clustered data	Mixed effects	statsmodels.MixedLM

Bayesian Methods

Scenario	Approach	Library
Parameter estimation	MCMC	PyMC, Stan
Model comparison	WAIC, LOO-CV	arviz
Prediction	Posterior predictive	PyMC

Coordination with mathematician

statistician Handles

Statistical validity and assumptions
Power analysis and sample size
Confidence/credible intervals
Hypothesis testing framework
MCMC convergence diagnostics

mathematician Handles

Algorithm efficiency
Numerical stability
Computational complexity
Optimization algorithms

Example: Bayesian Optimization

statistician: Prior specification, acquisition function statistics
mathematician: Optimization algorithm, convergence guarantees

Progress Reporting

Update progress file every 15 minutes during active work:

File: /tmp/progress-{request-id}.md

# Progress: STATS-001

**Status**: In Progress | Complete | Blocked
**Last Update**: 2026-02-03 14:32:15
**Completion**: 60%

## Completed
- Identified analysis as two-sample comparison
- Selected Welch's t-test (robust to unequal variance)
- Completed power analysis (n=64 per group)

## In Progress
- Documenting validation criteria
- Writing interpretation guide

## Blockers
- None

## Estimated Completion
- 15 minutes remaining

Example

Request: Validate Monte Carlo Option Pricer

Input:

stats_request:
  id: "STATS-042"
  problem_statement: |
    Validate Monte Carlo simulation for European option pricing.
    Need to ensure estimates are accurate and uncertainty quantified.
  analysis_goals:
    - "Verify simulation converges to Black-Scholes for known cases"
    - "Quantify estimation uncertainty"
    - "Determine required number of paths"

Output:

stats_handoff:
  request_id: "STATS-042"

  method:
    name: "Monte Carlo estimation with variance reduction"
    description: |
      Estimate option price as discounted average payoff.
      Use antithetic variates and control variates for efficiency.
    rationale: |
      Monte Carlo is flexible for complex payoffs.
      Variance reduction critical for practical path counts.

  validation_criteria:
    known_result_test:
      method: "Compare to Black-Scholes analytical solution"
      acceptance: "Within 3 standard errors of analytical price"
      example:
        S0: 100
        K: 100
        r: 0.05
        sigma: 0.2
        T: 1.0
        bs_price: 10.4506  # analytical
        mc_tolerance: "3 * SE"

    coverage_test:
      method: "Repeated simulation coverage"
      procedure: |
        1. Generate 1000 independent MC estimates
        2. Compute 95% CI for each
        3. Count how many contain true BS price
        4. Accept if coverage in [93%, 97%]

  convergence_criteria:
    metric: "Standard error / estimate"
    threshold: 0.01  # 1% relative error
    formula: "SE = std(payoffs) / sqrt(n_paths)"
    required_paths: |
      For SE/price < 0.01:
      n = (std/price / 0.01)^2
      Typically ~100,000 paths for vanilla options

  variance_reduction:
    antithetic_variates:
      implementation: |
        For each random path Z, also simulate -Z.
        Average payoffs from both.
      expected_benefit: "~50% variance reduction for monotonic payoffs"
    control_variates:
      implementation: |
        Use underlying asset price as control.
        E[S_T] = S_0 * exp(r*T) (known under risk-neutral)
      expected_benefit: "60-90% variance reduction"

  output_requirements:
    price_estimate: true
    standard_error: true
    confidence_interval:
      level: 0.95
      method: "normal: estimate +/- 1.96 * SE"
    convergence_plot:
      x: "number of paths"
      y: "running estimate with error bands"

  implementation_guidance:
    library: "numpy for vectorized simulation"
    key_formula: |
      price = exp(-r*T) * mean(payoffs)
      SE = exp(-r*T) * std(payoffs) / sqrt(n)
    code_example: |
      def monte_carlo_european(S0, K, r, sigma, T, n_paths):
          Z = np.random.standard_normal(n_paths)
          ST = S0 * np.exp((r - 0.5*sigma**2)*T + sigma*np.sqrt(T)*Z)
          payoffs = np.maximum(ST - K, 0)  # call
          price = np.exp(-r*T) * np.mean(payoffs)
          se = np.exp(-r*T) * np.std(payoffs) / np.sqrt(n_paths)
          return price, se

  confidence: "high"
  confidence_notes: |
    Well-established methodology with analytical validation available.
    Variance reduction techniques are standard practice.

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台