Introducing, an Open Source A/B Test Significance Calculator

I spent some time recently working on a small side project that I’m excited to share with you all today. It’s an A/B test significance calculator and you can check it out at


What’s an A/B test significance calculator, you ask? At a high level, A/B testing is a technique that allows you to improve your website by showing visitors one of several versions of something and then measuring the impact each version has on some other event. For example, you might A/B test the wording of a button on your homepage and find that it increases the number of people who sign up by 10%. An A/B test significance calculator helps you analyze the results of an A/B test to determine whether there is a statistically significant change that is not just the result of random chance.

The math is somewhat complicated which is why a number of A/B test calculators exist, including by Evan Solomon, another by KISSmetrics, another by VWO, and many more.

Why build another? Three reasons: to learn the math, to get better at JavaScript, and to build a tool that makes the results of an A/B test easier to understand.

I think most of these other tools do users a disservice by not clearly explaining how to interpret the results. They tend to throw around the percentage improvement and significance figures without explaining what they mean which in the past has led me to make uninformed and often wrong decisions. Worse, most don’t tell you when you don’t have enough participants or conversions in your test and will happily apply statistical analysis to your results even when those methods don’t apply.

It is my hope with this tool that users leave with a clearer understand of how to interpret the results. A few features:

  • An executive summary that provides an overview in plain English about how to interpret the results
  • One graph showing where the true conversion rate for each variation falls (using something called a Wald approximation) and another showing the percentage change between those two distributions
  • It handles ties as well as tests where there aren’t enough participants or conversions to come to a conclusion
  • Results are significant when there is at least a 90% chance that one variation will lead to an improvement
  • The ability to copy a URL for the results to make them easier to share

If you have any suggestions on how to make it better please don’t hesitate to let me know.

On the coding site of things, most of the JavaScript I’ve done in the past (including Preceden and Lean Domain Search) has been with lots and lots of messy jQuery. A lot of new JavaScript technologies have come out in the last few years and I was put on a project at Automattic not too long ago that uses many of them. I fumbled around with it, getting stuff done but not really understanding the fundamentals.

I’m happy to say that this tool uses React for the view layer, NPM and Browserify for dependency management, Gulp for its build system, parts of ES6 for the code (courtesy of Babel), JSHint for code analysis, Mocha for testing, and Github Pages for hosting — all of which I had little to no experience with when I started this project. If you’re interested in checking it out, all of the code is open source (my first!) so you can view it on Github.

This project is the best JavaScript I know how to do so if you do check out the code, please let me know if you have any suggestions on how to improve it.

One last note in case you were wondering about the domain: the former owner had a simple A/B test calculator up on it, but wasn’t actively working on it so I found his email via WHOIS, offered him $200 for it, he countered with $700, I countered with $300 and we had a deal. Normally I wouldn’t pay someone for a domain (I heard there is this amazing service to help people find available domains…), but the price was right and I figured it was worth it for the credibility and SEO value it adds. When I showed him the new site recently all he responded with was “I’m pretty glad I sold the domain now!” which was nice :).

Thanks for checking it out!

Visualizing the Sampling Distribution of a Proportion with R

In yesterday’s post, we showed that a binomial distribution can be approximated by a normal distribution and some of the math behind it.

Today we’ll take it a step further, showing how those results can help us understand the distribution of a sample proportion.

Consider the following example:

Out of the last 250 visitors to your website, 40 signed up for an account.

The conversion rate for that group is 16%, but it’s possible (and likely!) that the true conversion rate differs from this. Statistics can help us determine a range (a confidence interval) for what the true conversion rate actually is.

Recall that in the last post we said that the mean of the binomial distribution can be approximated with a normal distribution with a mean and standard deviation calculated by:

\mu = np

\sigma = \sqrt{npq}

For a proportion, we want to figure out the mean and standard deviation on a per-trial basis so we divide each formula by n, the number of trials:

\mu = \frac{np}{n} = p

\sigma = \frac{\sqrt{npq}}{n} = \sqrt{\frac{npq}{n^2}} = \sqrt{\frac{pq}{n}}

With the mean and standard deviation of the sample proportion in hand, we can plot the distribution for this example:


As you can see, the most likely conversion rate is 16% (which is no surprise), but the true conversion rate can fall anywhere under that curve with it being less and less likely as you move farther away.

Where it gets really interesting is when you want to compare multiple proportions.

Let’s say we’re running an A/B test and the original had 40 conversions out of 250 like the example above and the experimental version had 60 conversions out of 270 participants. We can plot the distribution of both sample proportions with this code:

Here’s the result:


What can we determine from this? Is the experimental version better than the original? What if the true proportion for the original is 20% (towards the upper end of its distribution) and the true proportion for the experimental version is 16% (towards the lower end of its distribution)?

We’ll save the answer for a future post :)

Visualizing How a Normal Distribution Approximates a Binomial Distribution with R

My handy Elementary Statistics textbook, which I’m using to get smart on the math behind A/B testing, states the following:

Normal Distribution as Approximation to Binomial Distribution

If np \geq 5 and nq \geq 5, then the binomial random variable has a probability distribution that can be approximated by a normal distribution with the mean and standard deviation given as:

\mu = np

\sigma = \sqrt{npq}

In easier to understand terms, take the following example:

Each visitor to your website has a 30% chance of signing up for an account. Over the next 250 visitors, how many can you expect to sign up for an account?

The first formula lets us figure out the mean by simply multiplying the number of visitors by the probability of a successful conversion:

\mu = np = 250 * 0.30 = 75

Simple enough and fairly easy to understand.

The second formula, the one to figure out the standard deviation, is less intuitive:

\sigma = \sqrt{npq} = \sqrt{250 * 0.30 * (1 - 0.30)} = 7.25

Why are we taking the square root of the product of these three values? The textbook doesn’t explain, noting that “the formal justification that allows us to use the normal distribution as an approximation to the binomial distribution results from more advanced mathematics”.

Because this standard deviation formula plays a big role in calculating the confidence intervals for sample proportions, I decided to simulate the scenario above to prove to myself that the standard deviation formula is accurate.

The R script below simulates 250 visitors coming to a website, each with a 30% chance of signing up. After each group of 250 visitors we track how many of them wound up converting. After all of the runs (the default is 1,000, though the higher the number the more accurate the distribution will be) we plot the probability distribution of the results in blue and a curve representing what we’d expect the distribution to look like if the standard deviation formula above is correct in red.

The distribution of results from this experiment paints a telling picture:


Not only is the mean what we expect (around 75), but the standard deviation formula (which said it would be 7.25) does predict the standard deviation from this experiment (7.25). Go figure :)

As we’ll see, we can use the fact that the normal distribution approximates a binomial distribution approximates to figure out the distribution of a sample proportion, which we can then compare to other sample proportion distributions to make conclusions about whether they differ and by how much (ie, how to analyze the results of an A/B test).

Rendering Two Normal Distribution Curves on a Single Plot with R

As a follow-up to my last post about how to render a normal distribution curve with R, here’s how you can render two on the same plot:


	Class = "Distribution",
	representation = representation(
		name = "character",
		mean = "numeric",
		sd = "numeric",
		color = "character",
		x = "vector",
		y = "vector"

# We rewrite the initialize method for Distribution objects so that we can
# set the x and y values which are used throughout the plotting process
	f = "initialize",
	signature = "Distribution",
	definition = function( .Object, name, mean, sd, color ) {
		.Object@name = name
		.Object@mean = mean
		.Object@sd = sd
		.Object@color = color
		.Object@x = seq( -4, 4, length = 1000 ) * sd + mean
		.Object@y = dnorm( .Object@x, mean, sd )

		return ( .Object )

# Given a list of distributions, this returns a list of the x and y axis range
get_axis_ranges = function( distributions ) {
	x_all = vector()
	y_all = vector()

	for ( distribution in distributions ) {
		x_all = c( x_all, distribution@x )
		y_all = c( y_all, distribution@y )

	xlim = c( min( x_all ), max( x_all ) )
	ylim = c( min( y_all ), max( y_all ) )

	# Note that by forming a list of the vectors, the vectors get converted to lists
	# which we then have to convert back to vectors in order to use them for plotting
	return ( list( xlim, ylim ) )

# Define the distributions that we want to plot
distributions = list(
	new( Class = "Distribution", name = "women", mean = 63.6, sd = 2.5, color = "pink" ),
	new( Class = "Distribution", name = "men", mean = 69, sd = 2.8, color = "blue" )

# Determine the range to use for each axis
axis_range = get_axis_ranges( distributions )
xlim = unlist( axis_range[ 1 ] )
ylim = unlist( axis_range[ 2 ] )

# Create the plot
plot( NULL, NULL, type = "n", xlim = xlim, ylim = ylim, xlab = "Height (inches)", ylab = "", main = "Distribution of Heights", axes = FALSE )

# Render each of the curves
line_width = 3
for( distribution in distributions ) {
	lines( distribution@x, distribution@y, col = distribution@color, lwd = line_width )

# Render the x axis
axis_bounds <- seq( min( xlim ), max( xlim ) )
axis( side = 1, at = axis_bounds, pos = 0, col = "#aaaaaa", col.axis = "#444444" )

# Finally, render a legend
legend_text = vector()
legend_colors = vector()
for ( distribution in distributions ) {
	legend_text = c( legend_text, distribution@name )
	legend_colors = c( legend_colors, distribution@color )
legend('right', legend_text, lty = 1, lwd = line_width, col = legend_colors, bty = 'n' )