A/B Test Statistical Significance Calculator

Enter your control and variant visitor/conversion numbers and get an instant verdict: is your result statistically significant? The calculator uses a two-tailed z-test for difference in proportions — the correct method for A/B testing conversion rates — and shows you confidence level, z-score, p-value, and whether you have a winner.

Free Tool Is your A/B test result statistically significant? Find out instantly.

How to Use This Calculator

  1. Control (A) — enter the number of visitors and conversions for your original version. Get this from your A/B testing tool or GA4 Funnel Exploration report.
  2. Variant (B) — enter the visitors and conversions for your test variant. Make sure both are from the same date range.
  3. Read the verdict — the calculator shows confidence level, relative lift, and a clear pass/fail/borderline result. Green = 95%+ confidence. Yellow = 90–94% (keep running). Red = below 90% (not significant).
  4. Check your sample size — significance alone is not enough. Use the Sample Size Calculator to confirm you've reached your pre-planned minimum before acting on this result.

What the Numbers Mean

  • Confidence Level — the probability that the difference is real. 95%+ is the standard threshold for declaring a winner.
  • Relative Lift — the percentage improvement of variant over control. A +15% relative lift means the variant converts 15% better than the baseline.
  • Absolute Difference (pp) — the raw difference in conversion rate, in percentage points. A 2% vs 2.3% CVR = 0.3pp absolute difference.
  • Z-score — the number of standard deviations the result sits from the null hypothesis. Above ±1.96 = 95% significance (two-tailed).
  • p-value — the probability the result occurred by chance. Below 0.05 = significant at 95% confidence.

Frequently Asked Questions

What confidence level do I need to declare a winner?

95% is the industry standard (p-value < 0.05). For high-revenue decisions, use 99%. For low-stakes tests, 90% is the absolute minimum. Never ship based on anything below 90%.

Can I stop my test as soon as it hits 95% significance?

No. You must also reach your pre-planned sample size AND run for at least 14 days to capture day-of-week variation. Stopping early (peeking) inflates false positives above 30%. Plan your end date before launch and stick to it.

What is the p-value?

The probability that a result this extreme could have occurred by chance if there was actually no difference between control and variant. p = 0.05 means 5% chance of a false positive. Lower = stronger evidence.

Why two-tailed instead of one-tailed?

A two-tailed test checks whether the variant is different in either direction — better or worse. Since variants can and do hurt performance, always use two-tailed. One-tailed tests inflate significance and increase the chance of shipping a losing variant.

My result is significant but the lift is tiny — should I ship it?

Statistical significance ≠ practical significance. A 0.1pp absolute lift might be significant with large enough traffic, but it's probably not worth the implementation effort. Consider your minimum detectable effect (MDE) — if the lift is below your MDE threshold, treat it as a null result and move to a bigger test.

Not Sure What to Test Next?

Knowing whether a test won is step one. Knowing what to test, in what order, to maximise revenue impact — that's the hard part. A CRO audit builds the prioritised test roadmap so every experiment has the highest possible upside.

Get a Free CRO Audit