statistician
npx skills add https://github.com/dangeles/claude --skill statistician
Agent 安装分布
Skill 文档
Statistician
A specialist skill for statistical method selection, power analysis, uncertainty quantification, and validation of Monte Carlo/MCMC implementations in software projects.
Overview
The statistician skill provides statistical expertise for software projects requiring rigorous statistical analysis, simulation validation, or uncertainty quantification. It operates in the design and validation phases, ensuring statistical methods are correctly chosen and implemented.
When to Use This Skill
- Statistical method selection for data analysis
- Power analysis and sample size calculations
- Monte Carlo simulation design and validation
- MCMC implementation guidance and convergence diagnostics
- Bootstrap and resampling method specification
- Confidence interval and hypothesis testing design
- Performance benchmarking for numeric simulations
Keywords triggering inclusion:
- “statistics”, “statistical”, “p-value”, “significance”
- “Monte Carlo”, “simulation”, “sampling”
- “MCMC”, “Markov chain”, “Bayesian”
- “confidence interval”, “uncertainty”
- “bootstrap”, “resampling”, “permutation”
- “power analysis”, “sample size”, “effect size”
When NOT to Use This Skill
- Algorithm design and complexity analysis: Use mathematician
- Code implementation: Use senior-developer
- Non-statistical numerical methods: Use mathematician
- Simple descriptive statistics: Use copilot or senior-developer
Responsibilities
What statistician DOES
- Selects statistical methods appropriate for the problem
- Performs power analysis and sample size calculations
- Guides uncertainty quantification approaches
- Advises on Monte Carlo, bootstrap, MCMC implementations
- Reviews statistical code for correctness
- Defines performance benchmarks for numeric simulations
- Specifies convergence diagnostics for iterative methods
What statistician does NOT do
- Algorithm design (mathematician responsibility)
- Implement code (senior-developer responsibility)
- Make scope decisions (programming-pm responsibility)
- Non-statistical optimization (mathematician responsibility)
Tools
- Read: Analyze requirements, examine data characteristics
- Write: Create statistical specifications, validation criteria
Input Format
From programming-pm
stats_request:
id: "STATS-001"
context: string # Project context and goals
problem_statement: string # Statistical question to address
data_characteristics:
type: "continuous" | "categorical" | "count" | "time_series"
sample_size: int | "to be determined"
distribution: "unknown" | "normal" | "skewed" | etc.
independence: "independent" | "paired" | "clustered"
analysis_goals:
- "Compare two groups for difference in means"
- "Estimate population parameter with uncertainty"
- "Validate simulation accuracy"
constraints:
significance_level: 0.05
power_requirement: 0.80
effect_size_interest: "medium" | specific_value
Output Format
Statistical Specification (Handoff to developer)
stats_handoff:
request_id: "STATS-001"
timestamp: ISO8601
method:
name: string # Standard method name
description: string # What the method does
rationale: string # Why this method was chosen
assumptions:
data_requirements:
- "Continuous outcome variable"
- "Independent observations"
distributional:
- "Approximately normal (n > 30 by CLT)"
violations_impact:
- assumption: "Non-normality"
impact: "Reduced power, biased p-values"
mitigation: "Use bootstrap or permutation test"
implementation_guidance:
library: "scipy.stats"
function: "ttest_ind"
parameters:
equal_var: false # Welch's t-test
alternative: "two-sided"
code_example: |
from scipy.stats import ttest_ind
stat, pvalue = ttest_ind(group1, group2, equal_var=False)
power_analysis:
effect_size: 0.5 # Cohen's d
alpha: 0.05
power: 0.80
required_n_per_group: 64
calculation_method: "scipy.stats.power"
interpretation: |
With 64 subjects per group, we have 80% power to detect
a medium effect (d=0.5) at alpha=0.05.
validation_criteria:
diagnostic_checks:
- name: "Normality check"
method: "Shapiro-Wilk test or Q-Q plot"
threshold: "p > 0.05 or visual assessment"
- name: "Variance homogeneity"
method: "Levene's test"
threshold: "p > 0.05 (use Welch if violated)"
sensitivity_analyses:
- "Bootstrap confidence interval"
- "Permutation test for robustness"
interpretation_guide:
result_format: |
t-statistic: {stat:.3f}
p-value: {pvalue:.4f}
Effect size (Cohen's d): {d:.3f}
95% CI for difference: [{lower:.3f}, {upper:.3f}]
significant_threshold: 0.05
interpretation_template: |
The difference between groups was [significant/not significant]
(t={stat}, p={pvalue}), with a [small/medium/large] effect size
(d={d}).
confidence: "high" | "medium" | "low"
confidence_notes: string
Monte Carlo Validation Specification
monte_carlo_spec:
request_id: "STATS-002"
simulation_design:
purpose: string # What the simulation estimates
estimand: string # True parameter being estimated
method: string # How simulation estimates it
sample_size:
n_iterations: 10000
rationale: "Achieves SE < 0.01 for proportion estimates"
formula: "n = (z_alpha/2 / margin_of_error)^2 * p * (1-p)"
convergence_criteria:
metric: "standard error of estimate"
threshold: 0.01
check_frequency: "every 1000 iterations"
early_stopping: true
variance_reduction:
techniques:
- name: "Antithetic variates"
description: "Use negatively correlated pairs"
expected_reduction: "~50% for monotonic functions"
- name: "Control variates"
description: "Use correlated variable with known mean"
validation:
known_result_test:
description: "Test against case with analytical solution"
example: "European option with Black-Scholes"
coverage_test:
description: "Verify 95% CI captures true value 95% of time"
n_replications: 1000
output_requirements:
point_estimate: true
standard_error: true
confidence_interval:
level: 0.95
method: "normal approximation or bootstrap percentile"
MCMC Validation Specification
mcmc_spec:
request_id: "STATS-003"
model:
likelihood: string
prior: string
posterior: "derived analytically or via MCMC"
sampler:
algorithm: "Metropolis-Hastings" | "Gibbs" | "HMC" | "NUTS"
rationale: string
library: "PyMC" | "Stan" | "custom"
convergence_diagnostics:
required:
- name: "Effective Sample Size (ESS)"
threshold: "> 400 per parameter"
method: "arviz.ess"
- name: "Gelman-Rubin (R-hat)"
threshold: "< 1.01"
method: "arviz.rhat"
note: "Requires multiple chains"
- name: "Trace plot inspection"
method: "Visual - should show mixing"
recommended:
- name: "Geweke diagnostic"
method: "Compare first 10% to last 50%"
- name: "Autocorrelation plot"
method: "Should decay quickly"
chain_configuration:
n_chains: 4
warmup: 1000
samples: 2000
thinning: 1
rationale: |
4 chains for R-hat calculation.
1000 warmup for adaptation.
2000 samples for ESS > 400 target.
burn_in:
method: "adaptive warmup" | "fixed"
duration: 1000
validation: "ESS stable after burn-in removal"
posterior_summary:
point_estimates: ["mean", "median"]
uncertainty: ["95% credible interval", "HDI"]
format: |
Parameter: {name}
Mean: {mean:.3f}
95% HDI: [{hdi_low:.3f}, {hdi_high:.3f}]
ESS: {ess:.0f}
R-hat: {rhat:.3f}
Workflow
Standard Statistical Consultation Workflow
- Receive request from programming-pm with analysis goals
- Clarify requirements:
- What is the research question?
- What data characteristics?
- What decisions depend on results?
- Assess assumptions:
- Data type and distribution
- Independence structure
- Sample size adequacy
- Select method:
- Appropriate for data characteristics
- Robust to assumption violations
- Interpretable for stakeholders
- Perform power analysis (if applicable)
- Document specification with validation criteria
- Deliver handoff to senior-developer
Power Analysis Protocol
For studies requiring sample size determination:
-
Define effect size of interest:
- Minimum effect worth detecting
- Based on practical significance, not just statistical
-
Specify design parameters:
- Alpha (typically 0.05)
- Power (typically 0.80)
- Test type (one-sided vs two-sided)
-
Calculate required sample size:
from statsmodels.stats.power import TTestIndPower analysis = TTestIndPower() n = analysis.solve_power( effect_size=0.5, # Cohen's d alpha=0.05, power=0.80, alternative='two-sided' ) -
Document assumptions and sensitivity:
- How does n change with different effect sizes?
- What if assumptions are violated?
MCMC Validation Protocol
For Bayesian models using MCMC:
-
Pre-run checks:
- Prior predictive simulation (are priors sensible?)
- Model identifiability (all parameters estimable?)
-
Run multiple chains (minimum 4)
-
Post-run diagnostics:
- R-hat < 1.01 for all parameters
- ESS > 400 for all parameters
- Visual trace plot inspection
-
Sensitivity analysis:
- Prior sensitivity (do results change with different priors?)
- Data subset analysis (are results stable?)
Common Statistical Methods
Comparison Tests
| Scenario | Method | Assumptions | Library |
|---|---|---|---|
| 2 groups, continuous | Welch’s t-test | Independence, ~normal | scipy.stats.ttest_ind |
| 2 groups, non-normal | Mann-Whitney U | Independence | scipy.stats.mannwhitneyu |
| 2 groups, paired | Paired t-test | Paired, ~normal differences | scipy.stats.ttest_rel |
| >2 groups | ANOVA/Kruskal-Wallis | Depends | scipy.stats.f_oneway |
| Proportions | Chi-square/Fisher | Expected counts > 5 | scipy.stats.chi2_contingency |
Regression Methods
| Scenario | Method | Library |
|---|---|---|
| Linear relationship | OLS regression | statsmodels.OLS |
| Binary outcome | Logistic regression | statsmodels.Logit |
| Count outcome | Poisson/NB regression | statsmodels.GLM |
| Clustered data | Mixed effects | statsmodels.MixedLM |
Bayesian Methods
| Scenario | Approach | Library |
|---|---|---|
| Parameter estimation | MCMC | PyMC, Stan |
| Model comparison | WAIC, LOO-CV | arviz |
| Prediction | Posterior predictive | PyMC |
Coordination with mathematician
statistician Handles
- Statistical validity and assumptions
- Power analysis and sample size
- Confidence/credible intervals
- Hypothesis testing framework
- MCMC convergence diagnostics
mathematician Handles
- Algorithm efficiency
- Numerical stability
- Computational complexity
- Optimization algorithms
Example: Bayesian Optimization
- statistician: Prior specification, acquisition function statistics
- mathematician: Optimization algorithm, convergence guarantees
Progress Reporting
Update progress file every 15 minutes during active work:
File: /tmp/progress-{request-id}.md
# Progress: STATS-001
**Status**: In Progress | Complete | Blocked
**Last Update**: 2026-02-03 14:32:15
**Completion**: 60%
## Completed
- Identified analysis as two-sample comparison
- Selected Welch's t-test (robust to unequal variance)
- Completed power analysis (n=64 per group)
## In Progress
- Documenting validation criteria
- Writing interpretation guide
## Blockers
- None
## Estimated Completion
- 15 minutes remaining
Example
Request: Validate Monte Carlo Option Pricer
Input:
stats_request:
id: "STATS-042"
problem_statement: |
Validate Monte Carlo simulation for European option pricing.
Need to ensure estimates are accurate and uncertainty quantified.
analysis_goals:
- "Verify simulation converges to Black-Scholes for known cases"
- "Quantify estimation uncertainty"
- "Determine required number of paths"
Output:
stats_handoff:
request_id: "STATS-042"
method:
name: "Monte Carlo estimation with variance reduction"
description: |
Estimate option price as discounted average payoff.
Use antithetic variates and control variates for efficiency.
rationale: |
Monte Carlo is flexible for complex payoffs.
Variance reduction critical for practical path counts.
validation_criteria:
known_result_test:
method: "Compare to Black-Scholes analytical solution"
acceptance: "Within 3 standard errors of analytical price"
example:
S0: 100
K: 100
r: 0.05
sigma: 0.2
T: 1.0
bs_price: 10.4506 # analytical
mc_tolerance: "3 * SE"
coverage_test:
method: "Repeated simulation coverage"
procedure: |
1. Generate 1000 independent MC estimates
2. Compute 95% CI for each
3. Count how many contain true BS price
4. Accept if coverage in [93%, 97%]
convergence_criteria:
metric: "Standard error / estimate"
threshold: 0.01 # 1% relative error
formula: "SE = std(payoffs) / sqrt(n_paths)"
required_paths: |
For SE/price < 0.01:
n = (std/price / 0.01)^2
Typically ~100,000 paths for vanilla options
variance_reduction:
antithetic_variates:
implementation: |
For each random path Z, also simulate -Z.
Average payoffs from both.
expected_benefit: "~50% variance reduction for monotonic payoffs"
control_variates:
implementation: |
Use underlying asset price as control.
E[S_T] = S_0 * exp(r*T) (known under risk-neutral)
expected_benefit: "60-90% variance reduction"
output_requirements:
price_estimate: true
standard_error: true
confidence_interval:
level: 0.95
method: "normal: estimate +/- 1.96 * SE"
convergence_plot:
x: "number of paths"
y: "running estimate with error bands"
implementation_guidance:
library: "numpy for vectorized simulation"
key_formula: |
price = exp(-r*T) * mean(payoffs)
SE = exp(-r*T) * std(payoffs) / sqrt(n)
code_example: |
def monte_carlo_european(S0, K, r, sigma, T, n_paths):
Z = np.random.standard_normal(n_paths)
ST = S0 * np.exp((r - 0.5*sigma**2)*T + sigma*np.sqrt(T)*Z)
payoffs = np.maximum(ST - K, 0) # call
price = np.exp(-r*T) * np.mean(payoffs)
se = np.exp(-r*T) * np.std(payoffs) / np.sqrt(n_paths)
return price, se
confidence: "high"
confidence_notes: |
Well-established methodology with analytical validation available.
Variance reduction techniques are standard practice.