A/B Testing Beginner

A/B Testing

A controlled experiment comparing two versions of a webpage to determine which produces more conversions.

By Mario Kuren March 2026 Updated March 2026

A/B testing is a randomised controlled experiment that compares two versions of a webpage, email, or interface element to determine which produces a higher conversion rate. Version A (the control) represents the current design; Version B (the variant) contains a single proposed change.

Traffic is randomly split between both versions. After collecting sufficient data, statistical analysis determines whether the observed difference in conversion rate is likely real or the result of random variation.

How A/B Testing Works

Identify a conversion problem — Use analytics, heatmaps, and session recordings to find pages where visitors drop off or fail to convert
Form a hypothesis — “Because we observed [data], we believe [change] will improve [metric] for [segment]”
Calculate required sample size — Before the test starts, determine how many visitors per variant you need at your chosen significance level and power
Run the test — Split traffic 50/50, collect data until sample size and minimum duration are met
Analyse results — Check statistical significance, effect size, and segment by device/source/audience
Implement or discard — Ship winners, log learnings from both outcomes

The hypothesis step is often skipped — and skipping it is what separates random tinkering from systematic CRO. Every test should be connected to a specific observation from user research.

What to A/B Test (and What Not to)

Element	Impact Potential	Notes
Headlines and value proposition	Very high — often 20–50% lift	Start here
CTA copy and placement	High	First-person copy typically wins
Hero section / above the fold	High	Affects first impression and bounce
Social proof placement and type	Medium	Specific testimonials beat generic
Form length	Medium	Remove unnecessary fields
Page layout	Medium	Requires design resources
Button colour	Low	Only matters if current has no contrast

The biggest A/B testing gains come from copy, offer framing, and trust architecture — not cosmetic changes. For full guidance on test prioritisation, see A/B Testing Best Practices.

A/B Test Benchmarks and Effect Sizes

Most CRO practitioners report that well-researched A/B tests produce these results over time:

Outcome	Frequency
Statistically significant winner	~25–30% of tests
Inconclusive (insufficient data)	~40–50% of tests
Control wins (variant loses)	~15–20% of tests
Statistically significant loser	~10% of tests

This means most tests don’t produce clear winners — and that’s expected. The value of A/B testing is cumulative: the tests that do win compound into significant long-term CVR improvement. Microsoft Research (Kohavi et al.) found that only about 1 in 3 tests at top tech companies produces a statistically significant positive result.

The Peeking Problem

The most common A/B testing mistake: checking results before hitting your sample size and stopping when you see a winning variant.

Statistical significance fluctuates constantly during a test. A variant showing 95% confidence on day 3 may drop to 60% by day 14. If you stop on day 3, you’ve shipped a false positive.

Checking results 5 times during a test inflates the false positive rate from 5% to 26%. The fix: decide when the test ends before it starts, and don’t open the dashboard until then.

For the full list of statistical mistakes that invalidate test results, see A/B Testing Mistakes.

A/B Testing vs Multivariate Testing

	A/B Test	Multivariate Test
What’s tested	One element, two variations	Multiple elements simultaneously
Traffic needed	Lower	Much higher (5–10× more)
Results	Which version wins	Which combination of elements wins
Best for	90%+ of all tests	High-traffic pages with multiple hypotheses

A/B testing is the right tool for the vast majority of CRO scenarios. Multivariate testing requires enough traffic to support many variant combinations simultaneously — typically 100,000+ monthly sessions. See Multivariate Testing for when to escalate.

When to Use A/B Testing vs Other Methods

Not every conversion problem requires an A/B test. Use this framework to decide:

Situation	Recommended approach
Clear UX problem identified in usability testing	Fix without testing
Hypothesis based on analytics + qualitative data	A/B test
Site with under 5,000 monthly conversions	Focus on qualitative research, test only high-confidence hypotheses
Multiple competing hypotheses on same page	A/B test sequentially, not simultaneously
Critical bug or legal requirement	Fix immediately, no test needed

For teams with limited traffic, read CRO for Low-Traffic Sites — A/B testing requires adequate sample sizes and there are more appropriate research methods below certain traffic thresholds.

Sample Size by Baseline Conversion Rate

Pre-calculating sample size is non-negotiable. Tests stopped before reaching the required sample produce false positives at a dramatically elevated rate.

Baseline CVR	MDE (relative)	Visitors per variant needed
1%	20%	~35,000
2%	15%	~18,000
3%	15%	~12,000
5%	10%	~15,000
10%	10%	~7,500

At 80% statistical power, 95% confidence level. Calculated using standard frequentist methodology.

For the exact calculation methodology, see How Long to Run an A/B Test.

Common A/B Testing Mistakes

Testing without a hypothesis — Changes made without research backing are random guesses
Running too many tests simultaneously — Overlapping tests pollute each other’s data
Stopping at significance without hitting sample size — The peeking problem in practice
Not segmenting results — A test that “loses” overall may win on mobile or for paid traffic
Ignoring interaction effects — A winning headline may perform differently with a different hero image
Treating null results as failures — A test that shows no difference is still valuable learning

Tools for A/B Testing

Popular platforms include VWO, Optimizely, AB Tasty, and Convert. Statistical analysis can also be done manually using a chi-squared test or a dedicated significance calculator.

Running tests correctly requires more than a tool — it requires a structured testing methodology that prevents common statistical errors. A/B testing is the primary delivery mechanism of any CRO programme — every insight from research eventually becomes a test hypothesis.

Understanding statistical significance and confidence intervals is prerequisite reading before interpreting results.

Frequently Asked Questions

What is A/B testing?

A/B testing (also called split testing) is a controlled experiment that compares two versions of a webpage, email, or interface element — Version A (control) and Version B (variant) — to determine which produces more conversions. Visitor traffic is randomly split between both versions and statistical analysis determines whether the difference is real or due to chance. The method was formally established in marketing by Ron Kohavi at Microsoft in the early 2000s, and is now the gold standard for evidence-based conversion optimization.

How long should an A/B test run?

An A/B test should run for a minimum of 14 days (two complete business cycles) AND until each variant reaches the pre-calculated minimum sample size — whichever takes longer. Stopping tests early, even when results look significant, leads to false positives. Research by Ronny Kohavi and Roger Longbotham found that the false positive rate jumps from 5% to over 26% if you check results 5 times during a test. The 14-day minimum accounts for weekly behavioral cycles — Tuesday traffic converts differently than Sunday traffic.

How many visitors do I need for an A/B test?

Sample size requirements depend on your baseline conversion rate, minimum detectable effect (MDE), statistical power (typically 80%), and significance level (typically 95%). At a 3% baseline CVR targeting a 15% relative improvement, you need approximately 10,000 visitors per variant. At a 1% baseline with the same MDE, you need roughly 30,000 per variant. Always calculate sample size before starting — not after — using Evan Miller's sample size calculator (evanmiller.org/ab-testing/sample-size.html) or VWO's duration calculator.

What is the most important element to A/B test first?

The highest-impact A/B tests — in order of typical effect size — are: (1) headlines and value proposition copy, which often produce 20–50% CVR differences, (2) CTA copy and placement, (3) hero section or above-the-fold layout, (4) social proof type and position, (5) form length. Button color is frequently tested but rarely moves the needle meaningfully. Start with the hypothesis most grounded in research — a customer interview insight or a session recording observation — not with cosmetic changes.

What is the peeking problem in A/B testing?

The peeking problem is the practice of checking A/B test results before reaching the pre-set sample size and stopping the test early when a 'winner' appears. Statistical significance fluctuates constantly during a test — a variant showing 95% confidence on day 3 may drop to 60% by day 14. If you stop on day 3, you've shipped a false positive. Checking results 5 times during a test inflates the false positive rate from 5% to 26% (Kohavi et al., 2014). The solution: decide the stopping conditions before the test starts and do not open the dashboard until those conditions are met.

What is the difference between A/B testing and split testing?

A/B testing and split testing are synonymous terms for the same method. Both describe randomly splitting traffic between a control (original) and variant (changed) version of a page, then using statistical analysis to determine which performs better. The distinction sometimes drawn is between 'A/B testing' (comparing two page variants at the same URL using JavaScript injection) and 'split URL testing' (redirecting visitors to entirely different URLs). For CRO purposes, both methods apply the same statistical framework — the difference is technical implementation.

What does a 95% confidence level mean in an A/B test?

A 95% confidence level means that if you ran the same test 100 times with fresh samples, approximately 95 of those tests would produce a statistically significant result that reflects a real effect rather than random noise. It does not mean you are 95% certain the variant is better — it means the method has a 5% false positive rate. For most business decisions, 95% confidence is the standard threshold. For high-stakes changes (site-wide, pricing), some teams require 99% confidence, which reduces false positives but requires larger sample sizes.

Back to CRO Glossary

A/B Testing

Frequently Asked Questions

Related Terms