A/B Test Statistical Significance Calculator
Enter your control and variant visitor/conversion numbers and get an instant verdict: is your result statistically significant? The calculator uses a two-tailed z-test for difference in proportions — the correct method for A/B testing conversion rates — and shows you confidence level, z-score, p-value, and whether you have a winner.
A/B Test Significance Calculator
Enter your test results to calculate statistical significance, confidence level, and whether you have a real winner
How to Use This Calculator
- Control (A) — enter the number of visitors and conversions for your original version. Get this from your A/B testing tool or GA4 Funnel Exploration report.
- Variant (B) — enter the visitors and conversions for your test variant. Make sure both are from the same date range.
- Read the verdict — the calculator shows confidence level, relative lift, and a clear pass/fail/borderline result. Green = 95%+ confidence. Yellow = 90–94% (keep running). Red = below 90% (not significant).
- Check your sample size — significance alone is not enough. Use the Sample Size Calculator to confirm you've reached your pre-planned minimum before acting on this result.
What the Numbers Mean
- Confidence Level — the probability that the difference is real. 95%+ is the standard threshold for declaring a winner.
- Relative Lift — the percentage improvement of variant over control. A +15% relative lift means the variant converts 15% better than the baseline.
- Absolute Difference (pp) — the raw difference in conversion rate, in percentage points. A 2% vs 2.3% CVR = 0.3pp absolute difference.
- Z-score — the number of standard deviations the result sits from the null hypothesis. Above ±1.96 = 95% significance (two-tailed).
- p-value — the probability the result occurred by chance. Below 0.05 = significant at 95% confidence.
Frequently Asked Questions
What confidence level do I need to declare a winner?
95% is the industry standard (p-value < 0.05). For high-revenue decisions, use 99%. For low-stakes tests, 90% is the absolute minimum. Never ship based on anything below 90%.
Can I stop my test as soon as it hits 95% significance?
No. You must also reach your pre-planned sample size AND run for at least 14 days to capture day-of-week variation. Stopping early (peeking) inflates false positives above 30%. Plan your end date before launch and stick to it.
What is the p-value?
The probability that a result this extreme could have occurred by chance if there was actually no difference between control and variant. p = 0.05 means 5% chance of a false positive. Lower = stronger evidence.
Why two-tailed instead of one-tailed?
A two-tailed test checks whether the variant is different in either direction — better or worse. Since variants can and do hurt performance, always use two-tailed. One-tailed tests inflate significance and increase the chance of shipping a losing variant.
My result is significant but the lift is tiny — should I ship it?
Statistical significance ≠ practical significance. A 0.1pp absolute lift might be significant with large enough traffic, but it's probably not worth the implementation effort. Consider your minimum detectable effect (MDE) — if the lift is below your MDE threshold, treat it as a null result and move to a bigger test.
Not Sure What to Test Next?
Knowing whether a test won is step one. Knowing what to test, in what order, to maximise revenue impact — that's the hard part. A CRO audit builds the prioritised test roadmap so every experiment has the highest possible upside.
Get a Free CRO Audit