ab-test-stats
10
总安装量
10
周安装量
#30369
全站排名
安装命令
npx skills add https://github.com/guia-matthieu/clawfu-skills --skill ab-test-stats
Agent 安装分布
opencode
10
gemini-cli
10
codex
9
github-copilot
8
cursor
8
claude-code
7
Skill 文档
A/B Test Statistics Calculator
Calculate statistical significance for A/B tests – know when your results are real, not random chance.
When to Use This Skill
- Test analysis – Determine if results are statistically significant
- Sample planning – Calculate required sample size before testing
- Duration estimation – Know how long to run experiments
- Power analysis – Ensure tests can detect meaningful differences
What Claude Does vs What You Decide
| Claude Does | You Decide |
|---|---|
| Structures analysis frameworks | Metric definitions |
| Identifies patterns in data | Business interpretation |
| Creates visualization templates | Dashboard design |
| Suggests optimization areas | Action priorities |
| Calculates statistical measures | Decision thresholds |
Dependencies
pip install scipy numpy click
Commands
Check Significance
python scripts/main.py significance --control 1000,50 --variant 1000,65
python scripts/main.py significance --control 5000,250 --variant 5000,300 --confidence 0.99
Calculate Sample Size
python scripts/main.py sample-size --baseline 0.05 --mde 0.02
python scripts/main.py sample-size --baseline 0.10 --mde 0.01 --power 0.90
Estimate Duration
python scripts/main.py duration --traffic 1000 --baseline 0.05 --mde 0.02
Examples
Example 1: Analyze Test Results
# Control: 1000 visitors, 50 conversions (5%)
# Variant: 1000 visitors, 65 conversions (6.5%)
python scripts/main.py significance --control 1000,50 --variant 1000,65
# Output:
# A/B Test Results
# âââââââââââââââââââââââââ
# Control: 5.00% (50/1000)
# Variant: 6.50% (65/1000)
# Lift: +30.0%
#
# Statistical Analysis
# âââââââââââââââââââââââââ
# p-value: 0.089
# Confidence: 91.1%
# Result: NOT SIGNIFICANT (need 95%)
#
# Recommendation: Continue test for more data
Example 2: Plan Sample Size
# Baseline 5% conversion, want to detect 20% relative lift (1% absolute)
python scripts/main.py sample-size --baseline 0.05 --mde 0.01
# Output:
# Sample Size Calculator
# ââââââââââââââââââââââââââââââ
# Baseline conversion: 5.0%
# Minimum detectable effect: 1.0% (20% relative)
# Target conversion: 6.0%
#
# Required per variant: 3,842 visitors
# Total required: 7,684 visitors
#
# At 1000 daily visitors: ~8 days
Key Concepts
| Term | Definition |
|---|---|
| p-value | Probability result is due to chance |
| Confidence | 1 – p-value (usually want 95%+) |
| Power | Probability of detecting real effect (usually 80%) |
| MDE | Minimum Detectable Effect – smallest lift worth detecting |
| Lift | Relative improvement (variant – control) / control |
When Results Are Significant
| p-value | Confidence | Verdict |
|---|---|---|
| < 0.01 | > 99% | Highly Significant â |
| < 0.05 | > 95% | Significant â |
| < 0.10 | > 90% | Marginally Significant |
| ⥠0.10 | < 90% | Not Significant â |
Skill Boundaries
What This Skill Does Well
- Structuring data analysis
- Identifying patterns and trends
- Creating visualization frameworks
- Calculating statistical measures
What This Skill Cannot Do
- Access your actual data
- Replace statistical expertise
- Make business decisions
- Guarantee prediction accuracy
Related Skills
- cohort-analysis – Analyze user cohorts
- funnel-analyzer – Analyze conversion funnels
Skill Metadata
- Mode: centaur
category: analytics
subcategory: statistics
dependencies: [scipy, numpy]
difficulty: intermediate
time_saved: 3+ hours/week