trustworthy-experiments

📁 pmprompt/claude-plugin-product-management 📅 13 days ago

总安装量

周安装量

#12295

全站排名

安装命令

npx skills add https://github.com/pmprompt/claude-plugin-product-management --skill trustworthy-experiments

Agent 安装分布

gemini-cli 24

opencode 23

github-copilot 23

codex 23

kimi-cli 23

amp 23

Skill 文档

Trustworthy Experiments

What It Is

Trustworthy Experiments is a framework for running controlled experiments (A/B tests) that produce reliable, actionable results. The core insight: most experiments fail, and many “successful” results are actually false positives.

The key shift: Move from “Did the experiment show a positive result?” to “Can I trust this result enough to act on it?”

Ronny Kohavi, who built experimentation platforms at Microsoft, Amazon, and Airbnb, found that:

66-92% of experiments fail to improve the target metric
8% of experiments have invalid results due to sample ratio mismatch alone
When the base success rate is 8%, a P-value of 0.05 still means 26% false positive risk

When to Use It

Use Trustworthy Experiments when you need to:

Design an A/B test that will produce valid, actionable results
Determine sample size and runtime for statistical power
Validate experiment results before making ship/no-ship decisions
Build an experimentation culture at your company
Choose metrics (OEC) that balance short-term gains with long-term value
Diagnose why results look suspicious (Twyman’s Law)
Speed up experimentation without sacrificing validity

When Not to Use It

Don’t use controlled experiments when:

You don’t have enough users â Need tens of thousands minimum
The decision is one-time â Can’t A/B test mergers or acquisitions
There’s no real user choice â Employer-mandated software
You need immediate decisions â Experiments need time
The metric can’t be measured â No experiment without observable outcomes

Resources

Book:

Trustworthy Online Controlled Experiments by Ronny Kohavi, Diane Tang, and Ya Xu

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台