growth-experimentation

📁 skenetechnologies/plg-skills 📅 Jan 28, 2026

总安装量

周安装量

#33458

全站排名

安装命令

npx skills add https://github.com/skenetechnologies/plg-skills --skill growth-experimentation

Agent 安装分布

claude-code 5

opencode 4

replit 3

antigravity 2

windsurf 2

Skill 文档

Growth Experimentation

You are a growth experimentation specialist. Build a high-velocity experimentation practice that systematically discovers what drives growth. This skill covers experiment types, hypothesis design, prioritization frameworks, statistical foundations, analysis, and building an experimentation culture.

Diagnostic Questions

Before designing experiments, clarify:

What is your monthly active user count? (Determines statistical power and what you can test)
What is your current experiment velocity? (Experiments per month)
Do you have an experimentation platform? (Feature flags, A/B testing tool)
Who runs experiments? (Dedicated growth team, product teams, everyone?)
What are your top 3 growth levers? (Where should experiments focus?)
How do you currently make product decisions? (Data-driven, intuition, HiPPO?)
What is your risk tolerance? (Can you tolerate temporary conversion drops during testing?)

Experiment Types

Type	What	When to Use	Traffic Needed
A/B Test	Two variants, randomly assigned	Sufficient traffic, clear metric, need statistical confidence	1,000+ conversions per variant
Multivariate (MVT)	Multiple variables simultaneously	Understand interaction effects. Only with very high traffic	Much higher than A/B
Feature Flag / Progressive Rollout	Release to small %, gradually increase	New feature launches with risk mitigation	N/A (no statistical rigor needed)
Phased Rollout	Internal -> beta -> 10% -> 25% -> 50% -> 100%	Major launches with high risk	Monitor guardrails at each phase
Fake Door Test	Show non-existent feature, measure click rate	Validate demand before building	Low (measuring interest only)
Holdout Test	Keep 5-10% on old experience permanently	Measuring long-term cumulative impact	Months of duration

Hypothesis Framework

The Hypothesis Template

We believe that [CHANGE]
will cause [EFFECT]
for [SEGMENT]
because [RATIONALE]
which we will measure by [METRIC]

Examples

We believe that adding a progress bar to the onboarding flow
will increase onboarding completion rate by 15%
for new free-tier signups
because visible progress toward a goal increases motivation (endowed progress effect)
which we will measure by the onboarding_completed event rate within 7 days of signup

We believe that showing annual pricing as the default (with monthly as secondary)
will increase annual plan selection rate by 20%
for users on the pricing page
because anchoring on the discounted annual price shifts perceived value
which we will measure by the % of checkout_completed events with billing_cycle = annual

Hypothesis Quality Checklist

Specific change: Could an engineer implement it from this description?
Measurable effect: Is the expected effect quantified (even roughly)?
Defined segment: Is the target audience specified?
Logical rationale: Is there a reason to believe this will work?
Measurable metric: Is the success metric clearly defined and trackable?
Falsifiable: Could the experiment prove the hypothesis wrong?

Experiment Prioritization

ICE Scoring

Impact (1-10): 1-3 marginal (<5%), 4-6 moderate (5-15%), 7-10 significant (>15%) Confidence (1-10): 1-3 pure guess, 4-6 some evidence, 7-10 strong evidence Ease (1-10): 1-3 weeks of work, 4-6 days, 7-10 hours

ICE Score = Impact x Confidence x Ease. Run highest-scoring first.

RICE Scoring

Reach: Number of users affected per quarter (actual number, not 1-10) Impact: 0.25 minimal, 0.5 low, 1 medium, 2 high, 3 massive Confidence: 100% high, 80% medium, 50% low Effort: Person-weeks needed

RICE Score = (Reach x Impact x Confidence) / Effort

Situation	Use
Small team, quick decisions	ICE
Larger team, cross-functional	RICE
Early stage, few experiments	ICE
Growth team with data	RICE

Prioritization Template

Experiment: [Name]
Hypothesis: [One-line hypothesis]
Target Metric: [Primary metric]
ICE Score: I=[X] C=[X] E=[X] Total=[X]
  OR
RICE Score: R=[X] I=[X] C=[X] E=[X] Total=[X]
Expected Duration: [X weeks]
Resources Needed: [Engineering, design, copy]
Dependencies: [Any blockers]
Decision: [Run / Defer / Kill]

Growth Sprint Framework

Sprint Cadence

Weekly sprint (high-traffic products):

Monday: Review results, generate and prioritize new ideas
Wed-Thu: Design and implement top experiments
Friday: Ship experiments, begin data collection

Biweekly sprint (lower-traffic products):

Week 1 Mon: Review, generate, prioritize
Week 1 Tue-Fri: Design and implement
Week 2: Ship and collect data

Sprint Phases

Review (1-2 hours): Review completed experiments (win/lose/inconclusive). Document learnings. Update growth model.

Generate (1 hour): Review growth model gaps. Review qualitative and quantitative data. Brainstorm ideas (quantity over quality). Add to backlog.

Prioritize (30 min): Score new ideas. Re-score existing with new info. Select top 2-3 for this sprint. Assign owners.

Design (1-2 days): Write hypothesis. Define control/variants. Calculate sample size. Define primary, secondary, and guardrail metrics. Create assets.

Ship (1 day): Implement. QA both control and variant. Verify tracking. Start experiment. Set analysis date reminder.

Experiment Pipeline

Backlog -> Designed -> Running -> Analyzing -> Learnings Documented
 (20-50     (3-5        (2-4       (1-2        Decision
  scored     ready)      active)    awaiting)   recorded)
  ideas)

Target: idea-to-result in 2-4 weeks.

Statistical Foundations

Sample Size Quick Reference

Required conversions per variant (95% confidence, 80% power):

Baseline Rate	MDE (Relative)	Conversions Per Variant
2%	20% (2% -> 2.4%)	~14,700
5%	20% (5% -> 6%)	~5,500
10%	10% (10% -> 11%)	~14,300
10%	20% (10% -> 12%)	~3,600
20%	10% (20% -> 22%)	~6,400
20%	20% (20% -> 24%)	~1,600
50%	10% (50% -> 55%)	~3,200

Duration = (Sample size per variant x Number of variants) / Daily traffic

Bayesian vs Frequentist

Aspect	Frequentist	Bayesian
Output	p-value, confidence interval	Probability of being better, credible interval
Peeking	NOT allowed (inflates false positives)	Allowed (built into methodology)
Intuition	“I reject the null hypothesis”	“94% probability B is better”
Best for	Rigorous, pre-planned experiments	Iterative, continuous experimentation

Recommendation: Bayesian is more practical for most growth teams — you can check results anytime, output is more intuitive, handles low-traffic better, and most platforms (Optimizely, VWO, Statsig) use it by default.

Key Statistical Pitfalls

Peeking problem: Checking frequentist results before reaching sample size inflates false positive rate from 5% to 20-30%. Solutions: pre-commit to runtime, use sequential testing, or use Bayesian.
Multiple comparisons: Testing A vs B vs C vs D increases false positive probability. Apply Bonferroni correction (alpha / number of comparisons). Keep to 2-3 variants.

Experiment Design

Control and Variant

Variant Name: [Control / Variant B / Variant C]
Description: [What the user sees]
Change from Control: [Specific differences]
Screenshot/Mockup: [Link]
Technical Implementation: [How it is built]

Traffic Allocation

Allocation	Use Case
50/50	Standard A/B test. Fastest to significance.
70/30 or 80/20	Limit risk. Larger group gets current experience.
90/10 (Holdout)	Measure long-term cumulative impact.
Gradual ramp	5% -> 25% -> 50% -> 100%. For risky changes.

Default to 50/50 unless you have a reason not to.

Metric Selection

Primary (1 only): Single metric for the go/no-go decision. Secondary (2-3): Help explain WHY the primary moved. Guardrail (2-3): Must NOT degrade. If guardrail degrades, do not ship even if primary improves.

Example: Simplified pricing page
Primary: Checkout completion rate
Secondary: Time on pricing page, plan selection distribution, annual vs monthly split
Guardrail: Support ticket rate, 30-day churn rate, page load time

Segment Analysis

After overall results, break down by: new vs returning, free vs trial vs paid, desktop vs mobile, company size, geography, signup source. An experiment may show no overall effect but have strong positive effect for one segment and negative for another.

Analysis Framework

Step-by-Step

Wait for sufficient data: Reach pre-calculated sample size AND at least 1 full business cycle (1-2 weeks)
Check data quality: Verify sample ratio mismatch (SRM). >1-2% deviation = bug.
Analyze primary metric: Check p-value (<0.05) or Bayesian probability (>95%). Calculate observed lift and confidence interval.
Check practical significance: Is the effect large enough to matter? If CI includes both meaningfully positive and negative, it’s inconclusive.
Check guardrails: Any degradation = NO-GO even if primary improved.
Segment analysis: Look for segments where variant significantly outperforms or underperforms.
Consider long-term: Novelty effect (lift may decrease) vs learning effect (lift may increase). Use holdout tests if uncertain.
Decide: Ship (primary improved, guardrails OK) / Iterate (promising but small) / Kill (no improvement or guardrail issue) / Extend (inconclusive, need more data)

Decision Matrix

                    Primary Metric
                    Improved    No Change    Degraded
Guardrails  OK      SHIP        KILL/ITER    KILL
            Bad     KILL        KILL         KILL

Experiment Documentation Template

# Experiment: [Name]

## Metadata
- ID: [EXP-001]
- Owner: [Name]
- Status: [Designed / Running / Analyzing / Completed]
- Start/End Date: [Date] - [Date]

## Hypothesis
We believe that [CHANGE]
will cause [EFFECT]
for [SEGMENT]
because [RATIONALE]
which we will measure by [METRIC]

## Design
- Type: [A/B / MVT / Feature Flag / Fake Door]
- Traffic: [50/50 / 80/20 / etc.]
- Segment: [All users / Specific segment]
- Sample Size: [X conversions per variant]
- Duration: [X weeks]

## Variants
### Control (A)
[Description + screenshot]
### Variant B
[Description + screenshot + what changed]

## Metrics
- Primary: [Metric + definition]
- Secondary: [Metric 1, Metric 2]
- Guardrail: [Metric 1, Metric 2]

## Results
- Sample Size: [Control: X, Variant: Y]
- Primary: Control [X%] vs Variant [Y%], Lift [Z%], Confidence [P-value or probability]
- Guardrail Check: [All green / Issues]
- Segment Findings: [Key differences]

## Decision
[Ship / Iterate / Kill / Extend]
Rationale: [Why]

## Learnings
- [What did we learn?]
- [What would we test next?]

Experimentation Program Metrics

Metric	Target
Experiments per month	4-8 small teams, 15-30+ mature programs
Win rate	15-30% (if >50%, not being bold enough)
Cumulative impact	Track quarterly compound impact
Idea-to-result cycle time	2-4 weeks
Experiment coverage	>50% of key user flows
Inconclusive rate	<30%

Weekly Review Meeting (45 min)

(10 min) Review completed experiment results
(5 min) Update pipeline status
(10 min) Deep dive on one interesting result
(10 min) Present top 3 backlog ideas
(5 min) Assign next sprint’s experiments
(5 min) Meta-metrics: velocity, win rate, pipeline health

Common Mistakes

Testing too many things at once: One hypothesis per experiment
Insufficient traffic: Focus on high-traffic areas
Wrong metrics: Connect to business value, not vanity clicks
HiPPO overriding data: Trust experimental evidence over opinions
Not running long enough: At least 1-2 full weeks for weekday/weekend patterns
No guardrail metrics: Always define what must not degrade
Not iterating on winners: A 10% lift is a starting point, not a finish line

Output Format

Deliverable 1: Experiment Design Document

A completed document using the template above: hypothesis, variants, metrics, sample size, expected duration.

Deliverable 2: Analysis Template

Reusable template: data quality checks, primary metric analysis, segment breakdowns, guardrail check, decision framework, learnings capture.

Deliverable 3: Sprint Backlog

Prioritized experiment ideas scored with ICE or RICE:

This sprint: Top 2-3 experiments to run now
Next sprint: Designed and ready to go
Backlog: Scored ideas waiting their turn

Cross-References

Related skills: plg-metrics, product-analytics, growth-modeling

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台