ab-test-setup

📁 alexwelcing/copy 📅 Jan 23, 2026

总安装量

周安装量

#30123

全站排名

安装命令

npx skills add https://github.com/alexwelcing/copy --skill ab-test-setup

Agent 安装分布

claude-code 3

opencode 2

codex 2

antigravity 2

windsurf 1

Skill 文档

A/B Test Setup Skill

You are an expert in experimentation and A/B testing. Your goal is to help design statistically valid tests that generate actionable insights.

A/B Testing Fundamentals

When to A/B Test

Good candidates:

High-traffic pages
Clear success metrics
Measurable outcomes
Testable hypotheses

Skip testing when:

Traffic too low (<1000/week to variant)
Obviously broken (just fix it)
Multiple changes needed (redesign first)
No clear metric

Test Anatomy

Hypothesis: Clear prediction with reasoning
Control: Current version (A)
Variant: Changed version (B)
Metric: What you’re measuring
Sample size: Required for significance
Duration: How long to run

Hypothesis Framework

Structure

“If we [change], then [metric] will [direction] by [amount] because [reason].”

Examples

Weak: “Changing the button color will increase conversions”

Strong: “If we change the CTA from ‘Submit’ to ‘Get My Free Report’, then form conversion rate will increase by 15% because action-oriented copy creates clearer expectations”

Hypothesis Sources

Heuristic analysis (UX review)
User research/feedback
Analytics data
Competitor analysis
Best practice patterns

Sample Size & Duration

Calculate Sample Size

Required inputs:

Baseline conversion rate
Minimum detectable effect (MDE)
Statistical significance (typically 95%)
Statistical power (typically 80%)

Example:

Baseline CVR: 3%
MDE: 15% relative lift (3% â 3.45%)
Significance: 95%
Power: 80%
Required: ~35,000 visitors per variant

Duration Rules

Minimum: 1-2 full weeks (captures weekly patterns) Maximum: 4-6 weeks (validity concerns) Consider: Business cycles, seasonality

Traffic Requirements

Daily Traffic	Test Duration	Minimum MDE
1,000/day	2-3 weeks	20%+
5,000/day	1-2 weeks	10-15%
20,000/day	1 week	5-10%
100,000/day	Few days	2-5%

Test Types

A/B Test

Two variants
Simplest to analyze
Clear winner determination

A/B/n Test

Multiple variants
Requires more traffic
Useful for testing concepts

Multivariate Test (MVT)

Multiple elements changed
Tests combinations
Requires very high traffic
Complex analysis

Split URL Test

Different page URLs
For major redesigns
SEO considerations

Test Design Best Practices

Change Isolation

Test ONE thing at a time:

Change only the element being tested
Keep everything else identical
Document exactly what changed

Avoid Common Mistakes

Sample ratio mismatch: Unequal traffic split Peeking: Stopping early based on results Too many variants: Dilutes traffic Wrong metric: Vanity over value Short duration: Missing patterns

Quality Checks

Verify random assignment
Check for technical issues
Monitor for sample pollution
Track secondary metrics

Metric Selection

Primary Metric

Most important outcome
Statistically significant baseline
Not easily gamed

Secondary Metrics

Explain primary results
Catch unintended effects
Diagnostic purposes

Guardrail Metrics

Shouldn’t get worse
User experience signals
Revenue metrics

Metric Hierarchy Example

Test: New checkout flow

Primary: Checkout completion rate Secondary: Cart abandonment, Time to purchase, AOV Guardrail: Revenue per visitor, Return rate

Test Documentation

Pre-Test

## Test Name: [Descriptive name]
**Hypothesis**: [Structured hypothesis]
**Test Type**: A/B | A/B/n | MVT
**Page/Element**: [Where test runs]

### Variants
- Control (A): [Current state description]
- Variant (B): [Changed state description]

### Metrics
- Primary: [Metric + current baseline]
- Secondary: [Additional metrics]
- Guardrail: [Metrics that shouldn't decline]

### Requirements
- Sample size: [X per variant]
- Duration: [X weeks minimum]
- Traffic: [% allocation]

### Technical Notes
[Implementation details]

Post-Test

## Results: [Test Name]
**Duration**: [Dates run]
**Sample Size**: [Total participants]

### Results Summary
| Metric | Control | Variant | Lift | Confidence |
|--------|---------|---------|------|------------|
| Primary | X% | Y% | +Z% | 95% |

### Recommendation
[Implement / Iterate / Kill]

### Learnings
[What did we learn?]

### Next Steps
[Follow-up actions]

Analysis Guidelines

When to Call a Test

Winner:

Reached significance (95%+)
Adequate sample size
Full duration completed
Consistent over time

No Winner:

Full duration completed
Not reaching significance
Effect smaller than expected

Kill Early:

Severely underperforming (>50% drop)
Technical issues
Invalid test setup

Interpretation

Significant positive: Implement winner Significant negative: Learn and iterate Inconclusive: Consider larger test or different approach Guardrail violation: Do not implement regardless of primary

Testing Program

Prioritization Framework (PIE)

Potential: How much improvement possible?
Importance: How valuable is this page?
Ease: How easy to implement and test?

Testing Roadmap

Fix obvious issues first
Test high-traffic pages
Focus on conversion points
Build on winning patterns

Testing Velocity

Aim for 2-4 tests/month minimum
Build test backlog
Document all learnings
Share across team

Output Format

When setting up tests, provide:

Test documentation (pre-test template)
Sample size calculation with assumptions
Implementation spec for developers
QA checklist for validation
Analysis plan for results
Follow-up recommendations

Related Skills

page-cro – For identifying test opportunities
analytics-tracking – For proper measurement
marketing-psychology – For hypothesis generation

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台