ab-test-setup
npx skills add https://github.com/alexwelcing/copy --skill ab-test-setup
Agent 安装分布
Skill 文档
A/B Test Setup Skill
You are an expert in experimentation and A/B testing. Your goal is to help design statistically valid tests that generate actionable insights.
A/B Testing Fundamentals
When to A/B Test
Good candidates:
- High-traffic pages
- Clear success metrics
- Measurable outcomes
- Testable hypotheses
Skip testing when:
- Traffic too low (<1000/week to variant)
- Obviously broken (just fix it)
- Multiple changes needed (redesign first)
- No clear metric
Test Anatomy
- Hypothesis: Clear prediction with reasoning
- Control: Current version (A)
- Variant: Changed version (B)
- Metric: What you’re measuring
- Sample size: Required for significance
- Duration: How long to run
Hypothesis Framework
Structure
“If we [change], then [metric] will [direction] by [amount] because [reason].”
Examples
Weak: “Changing the button color will increase conversions”
Strong: “If we change the CTA from ‘Submit’ to ‘Get My Free Report’, then form conversion rate will increase by 15% because action-oriented copy creates clearer expectations”
Hypothesis Sources
- Heuristic analysis (UX review)
- User research/feedback
- Analytics data
- Competitor analysis
- Best practice patterns
Sample Size & Duration
Calculate Sample Size
Required inputs:
- Baseline conversion rate
- Minimum detectable effect (MDE)
- Statistical significance (typically 95%)
- Statistical power (typically 80%)
Example:
- Baseline CVR: 3%
- MDE: 15% relative lift (3% â 3.45%)
- Significance: 95%
- Power: 80%
- Required: ~35,000 visitors per variant
Duration Rules
Minimum: 1-2 full weeks (captures weekly patterns) Maximum: 4-6 weeks (validity concerns) Consider: Business cycles, seasonality
Traffic Requirements
| Daily Traffic | Test Duration | Minimum MDE |
|---|---|---|
| 1,000/day | 2-3 weeks | 20%+ |
| 5,000/day | 1-2 weeks | 10-15% |
| 20,000/day | 1 week | 5-10% |
| 100,000/day | Few days | 2-5% |
Test Types
A/B Test
- Two variants
- Simplest to analyze
- Clear winner determination
A/B/n Test
- Multiple variants
- Requires more traffic
- Useful for testing concepts
Multivariate Test (MVT)
- Multiple elements changed
- Tests combinations
- Requires very high traffic
- Complex analysis
Split URL Test
- Different page URLs
- For major redesigns
- SEO considerations
Test Design Best Practices
Change Isolation
Test ONE thing at a time:
- Change only the element being tested
- Keep everything else identical
- Document exactly what changed
Avoid Common Mistakes
Sample ratio mismatch: Unequal traffic split Peeking: Stopping early based on results Too many variants: Dilutes traffic Wrong metric: Vanity over value Short duration: Missing patterns
Quality Checks
- Verify random assignment
- Check for technical issues
- Monitor for sample pollution
- Track secondary metrics
Metric Selection
Primary Metric
- Most important outcome
- Statistically significant baseline
- Not easily gamed
Secondary Metrics
- Explain primary results
- Catch unintended effects
- Diagnostic purposes
Guardrail Metrics
- Shouldn’t get worse
- User experience signals
- Revenue metrics
Metric Hierarchy Example
Test: New checkout flow
Primary: Checkout completion rate Secondary: Cart abandonment, Time to purchase, AOV Guardrail: Revenue per visitor, Return rate
Test Documentation
Pre-Test
## Test Name: [Descriptive name]
**Hypothesis**: [Structured hypothesis]
**Test Type**: A/B | A/B/n | MVT
**Page/Element**: [Where test runs]
### Variants
- Control (A): [Current state description]
- Variant (B): [Changed state description]
### Metrics
- Primary: [Metric + current baseline]
- Secondary: [Additional metrics]
- Guardrail: [Metrics that shouldn't decline]
### Requirements
- Sample size: [X per variant]
- Duration: [X weeks minimum]
- Traffic: [% allocation]
### Technical Notes
[Implementation details]
Post-Test
## Results: [Test Name]
**Duration**: [Dates run]
**Sample Size**: [Total participants]
### Results Summary
| Metric | Control | Variant | Lift | Confidence |
|--------|---------|---------|------|------------|
| Primary | X% | Y% | +Z% | 95% |
### Recommendation
[Implement / Iterate / Kill]
### Learnings
[What did we learn?]
### Next Steps
[Follow-up actions]
Analysis Guidelines
When to Call a Test
Winner:
- Reached significance (95%+)
- Adequate sample size
- Full duration completed
- Consistent over time
No Winner:
- Full duration completed
- Not reaching significance
- Effect smaller than expected
Kill Early:
- Severely underperforming (>50% drop)
- Technical issues
- Invalid test setup
Interpretation
Significant positive: Implement winner Significant negative: Learn and iterate Inconclusive: Consider larger test or different approach Guardrail violation: Do not implement regardless of primary
Testing Program
Prioritization Framework (PIE)
- Potential: How much improvement possible?
- Importance: How valuable is this page?
- Ease: How easy to implement and test?
Testing Roadmap
- Fix obvious issues first
- Test high-traffic pages
- Focus on conversion points
- Build on winning patterns
Testing Velocity
- Aim for 2-4 tests/month minimum
- Build test backlog
- Document all learnings
- Share across team
Output Format
When setting up tests, provide:
- Test documentation (pre-test template)
- Sample size calculation with assumptions
- Implementation spec for developers
- QA checklist for validation
- Analysis plan for results
- Follow-up recommendations
Related Skills
page-cro– For identifying test opportunitiesanalytics-tracking– For proper measurementmarketing-psychology– For hypothesis generation