Confidence Interval: Definition, Examples & CRO Impact

A confidence interval (CI) is a range of values within which the true effect of an A/B test is likely to fall, with a specified level of confidence.

Example: “Variant B shows a 15% improvement in CVR (95% CI: 4.2%–25.8%)”

This means: with 95% confidence, the true effect of the variant is somewhere between a 4.2% and 25.8% improvement over control. The 15% is the point estimate (most likely value); the CI tells you how certain you should be about that estimate.

What Confidence Interval Actually Means

A 95% confidence interval does not mean “there is a 95% probability that the true value is in this range.” (This is the most common misinterpretation.)

What it actually means: if you conducted this experiment 100 times with fresh samples each time and calculated a 95% CI from each, approximately 95 of those 100 CIs would contain the true value.

Practically: A 95% CI gives you a plausible range for the true effect, constructed using a method that captures the true value 95% of the time.

CI Width and What It Tells You

CI Width	Interpretation	Cause	Action
Narrow (e.g., 13%–17%)	Precise estimate, high confidence in magnitude	Large sample size	Strong basis for decision
Moderate (e.g., 8%–22%)	Reasonable estimate with expected uncertainty	Adequate sample	Proceed with informed caution
Wide (e.g., -2%–32%)	Highly uncertain estimate	Underpowered test	Extend test or collect more data
CI includes zero	No detected effect	True null or underpowered	Do not ship

CI vs P-Value: The Full Picture

P-value and confidence interval are complementary:

Metric	Tells you	Doesn’t tell you
P-value	Is the result likely due to chance?	How large or precise the effect is
Confidence interval	The plausible range of the effect	Whether the result is “significant”

Best practice: Always report both.

Significant p-value + narrow CI = high confidence in the effect size → ship
Significant p-value + very wide CI = significant but imprecise → consider extending
Non-significant p-value + CI that includes zero = no detectable effect → null result

Practical Application: “Is This Worth Shipping?”

Use the lower bound of the confidence interval for conservative decision-making:

Example: Test shows 15% CVR improvement, 95% CI: 4%–26%.

Question: “Even in the worst case (4% improvement), is this variant worth implementing?”

If yes → ship. The minimum realistic benefit (lower CI bound) still exceeds your implementation cost.

If no → the uncertainty means the risk of implementing isn’t justified by the potential reward.

Confidence Intervals and Sequential Testing

For teams using sequential testing (making ongoing decisions as data comes in), confidence intervals are especially important because:

Point estimates fluctuate wildly early in a test
CI width decreases as sample size grows
Decisions made when CI is still wide are far more likely to be wrong

The standard: wait until the CI has narrowed enough that the lower bound exceeds your minimum acceptable effect size.

For the complete framework on when to stop a test, see How Long Should You Run an A/B Test?.

Frequently Asked Questions

What is a confidence interval in A/B testing?

A confidence interval (CI) is a range of values within which the true conversion rate effect is likely to fall with a specified probability. A 95% confidence interval means: if you ran this exact test 100 times, approximately 95 of those tests would produce a confidence interval that contains the true effect. In A/B testing, a result might show 'Variant B improves CVR by 15% (95% CI: 3% to 27%)' — meaning the true improvement could plausibly be anywhere from 3% to 27%.

What does a wide vs narrow confidence interval mean?

The width of a confidence interval reflects the precision of your estimate — which is primarily determined by sample size. A narrow CI (e.g., 14%–16% improvement) means you have a precise estimate of the true effect. A wide CI (e.g., 2%–28%) means high uncertainty — the true effect could be anywhere in that range. Wide confidence intervals are typical in underpowered tests (insufficient sample size). They don't necessarily mean the result is wrong, but they mean you should be cautious about acting on it.

Should I use p-value or confidence interval to judge A/B test results?

Use both — they provide complementary information. P-value tells you whether the result is statistically significant (unlikely to be random noise). Confidence interval tells you the magnitude and precision of the effect. A statistically significant result with a very wide CI might technically be significant but practically uncertain. A result where the lower bound of the CI is still above your MDE is more compelling than one where the CI spans from near-zero to very large. Together, they give you significance AND practical meaning.

Confidence Interval