Visualizing the Sampling Distribution of a Proportion with R

In yesterday’s post, we showed that a binomial distribution can be approximated by a normal distribution and some of the math behind it.

Today we’ll take it a step further, showing how those results can help us understand the distribution of a sample proportion.

Consider the following example:

Out of the last 250 visitors to your website, 40 signed up for an account.

The conversion rate for that group is 16%, but it’s possible (and likely!) that the true conversion rate differs from this. Statistics can help us determine a range (a confidence interval) for what the true conversion rate actually is.

Recall that in the last post we said that the mean of the binomial distribution can be approximated with a normal distribution with a mean and standard deviation calculated by:

\mu = np

\sigma = \sqrt{npq}

For a proportion, we want to figure out the mean and standard deviation on a per-trial basis so we divide each formula by n, the number of trials:

\mu = \frac{np}{n} = p

\sigma = \frac{\sqrt{npq}}{n} = \sqrt{\frac{npq}{n^2}} = \sqrt{\frac{pq}{n}}

With the mean and standard deviation of the sample proportion in hand, we can plot the distribution for this example:


As you can see, the most likely conversion rate is 16% (which is no surprise), but the true conversion rate can fall anywhere under that curve with it being less and less likely as you move farther away.

Where it gets really interesting is when you want to compare multiple proportions.

Let’s say we’re running an A/B test and the original had 40 conversions out of 250 like the example above and the experimental version had 60 conversions out of 270 participants. We can plot the distribution of both sample proportions with this code:

Here’s the result:


What can we determine from this? Is the experimental version better than the original? What if the true proportion for the original is 20% (towards the upper end of its distribution) and the true proportion for the experimental version is 16% (towards the lower end of its distribution)?

We’ll save the answer for a future post :)

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s