Removing Gaps from Stacked Area Charts in R

Creating a stacked area chart in R is fairly painless, unless your data has gaps. For example, consider the following CSV data showing the number of plan signups per week:

Plotting this highlights the problem:

chart.png

The reason the gaps exist is that not all plans have data points every week. Consider Gold, for example: during the first four weeks there are 55, 37, 42, and 26 signups, but during the last week there isn’t a data point at all. That’s why the chart shows the gap: it’s not that the data indicates Gold went to zero signups the final week; it indicates no data at all.

To remedy this, we need to ensure that every week contains a data point for every plan. That means for weeks where there isn’t a data point for a plan, we need to fill it in with 0 so that R knows that the signups are in fact 0 for that week.

I asked Charles Bordet, an R expert who I hired through Upwork to help me level up my R skills, how he would go about filling in the data.

He provided two solutions:

1. Using expand.grid and full_join

Here’s how it works:

expand.grid creates “a data frame from all combinations of the supplied vectors or factors”. By passing it in the weeks and plans, it generates the following data frame called combinations:

The full_join then takes all of the rows from data and combines them with combinations based on week and plan. When there aren’t any matches (which will happen when a week doesn’t have a value for a plan), signups gets set to NA:

Then we just use dplyr’s mutate to replace all of the NA values with zero, and voila:

2. Using spread and gather

The second method Charles provided uses the tidyr package’s spread and gather functions:

The spread function takes the key-value pairs (week and plan in this case) and spreads it across multiple columns, making the “long” data “wider”, and filling in the missing values with 0:

Then we take the wide data and convert it back to long data using gather The - week means to exclude the week column when gathering the data that spread produced:

Using either methods, we get a stacked area chart without the gaps ⚡️:

chart.png

An impractical guide to doubling your conversion rates

Let’s imagine a standard web app with three main steps:

  1. Viewed Homepage
  2. Signed Up
  3. Purchased

Furthermore, lets say 20% of the visitor to the hompage sign up, and 5% of the users that sign up complete a purchase giving you a 1% overall conversion rate.

Your boss comes to you and says “You need to double the overall conversion rate from 1% to 2%. Do whatever it takes.”

As a thought experiment, consider at a high level what the numbers have to be for this to work out. Your first thought might be to double the homepage conversion rate from 20% to 40% (giving you a 40% * 5% = 2% overall conversion rate) or double the purchase conversion rate (20% * 10% = 2% overall conversion rate). You could also increase both by some smaller amount to get a similar result: 30% * 6.67% = 2%.

In the real world, this winds up being really hard. If you increase the percentage of people signing up, it’s probably going to decrease the percentage of people who then purchase. Why? Consider what would happen to a site that has a pricing section on its homepage and then removes it completely. More people will sign up, but once they do and see the pricing, many of those extra people you got to sign up (because they thought your service was free) will leave (because they realized after signing up that your service actually isn’t free). So if you increase the sign up rate to 40% somehow, your sign up to purchase conversion rate might drop in half from 5% to say 2.5% giving you that same 40% * 2.5% = 1% overall conversion rate. If you’re lucky some of those extra users will convert and maybe you’ll get 40% * like 2.8% =  1.12% overall conversion rate. Getting closer to that 2%, but still a long way away.

The trick to getting to 2% is to improve the quality of the traffic at each step.

Consider that homepage conversion rate of 20%. How could you increase that without making any changes to your website?

That 20% conversion rate is actually a composite of different segments. For example, imagine your website has three traffic sources:

  1. Direct traffic (50% of your traffic) converts at 30%
  2. Search traffic (40% of your traffic) converts at 10%
  3. Social traffic (10% of your traffic) converts at 10%

50% * 30% + 40% * 10% + 10% * 10% = 20% homepage conversion rate.

What would happen to your conversion rate if you delisted your site from search engines completely? The numbers now become:

  1. Direct traffic (83.3% of your traffic) converts at 30%
  2. Social traffic (16.7% of your traffic) converts at 10%

83.3% * 30% + 16.7% * 10% = 26.7% homepage conversion rate.

Without making any changes to your site itself, you increased the homepage conversion rate by 26.7%/20% – 1 = +34%. Because the quality of your traffic has improved, the sign up to purchase conversion rate will likely increase as well. Instead of 5% upgrading, it might wind up being 8% (+60%). What’s your overall conversion rate now? 26.7% * 8% = 2.15%!

So in order to double your conversion rate in this made up example all you had to do was delist your site from search engines. Mission accomplished 🍻. As an added bonus many of your other metrics will increase as well. Because your site’s traffic is of a higher quality, your retention rates will go up, your churn will go down, your average revenue per user will go up, your refunds will go down, and more.

You could also do similar hacks where you make your website non-mobile-friendly so Google decreases its mobile rankings, which (assuming mobile converts more poorly than desktop users) would increase your conversion rates. You could block certain poorly converting browsers from visiting your site (I’m looking at you, IE). You could block users from poorly converting countries, or even non-English language users if your site isn’t translated.

Of course you should never do any of these things.

In the process of improving your metrics by hacking off a large portion of your traffic, you’ll also wind up decreasing the number of people who make it all the way through the funnel (purchasing in this case).

It’s tempting to want to focus on a single metric like conversion rate, but it’s also important to remember that individual metrics can almost always be artificially boosted:

  • You can increase your conversion rates by preventing low converting traffic from reaching your site (at the expense of revenue)
  • You can increase your revenue by increasing paid advertising (at the expense of profit)
  • You can increase your profit by laying off a bunch of employees (possibly at the expense of your long term growth and profitability).

Instead, try to identify the set of metrics that are most important to your company and pay attention to the group as a whole. More often than not when one metric goes up, another will go down but with solid execution and a little luck, the overall impact of your changes will be a net win for your website or company.

 

Promoting best sellers in Shopify

My friend Tom Davies just launched a new Shopify app called Best Seller Insights that enables shop owners to effortlessly promote and track trends for their best-selling products

He’s running a promotion as part of the launch that offers new customers 20% off all the plans through June 25th. If you run a Shopify store, be sure to check it out.

Analytics Event Name Cardinality

As a follow-up to my post about analytics event naming conventions, I want to share a few thoughts about event name cardinality aka how many distinct event names to track in your analytics tools.

Consider four events that a user can perform: signing up for an account, publishing a post, publishing a page, and publishing an image.

How many analytics events should this be? We have a few options:

Four

  1. Sign Up
  2. Publish Post
  3. Publish Page
  4. Publish Image

Three

  1. Sign Up
  2. Publish Post with an Image property set to true or false depending on whether the post consists of only an image
  3. Publish Page

Two

  1. Sign Up
  2. Publish with a Type property set to Post, or Page, or Image

One

  1. Perform Event with a Name property set to Sign Up or Publish with an additional Type property when the name is Publish set to Post, or Page, or Image

If you use a robust analytics tool like Mixpanel, Amplitude, KISSmetrics, or Tracks (Automattic’s internal analytics tool), you should in theory be able to perform any type of analysis using any of the options above.

For me, it comes down to what type of analysis you want to perform on the data, the types of properties on the event, and convenience.

Using the single event option will be a pain because you’ll constantly have to be specifying the Name property in the analytics tools to get the data you really want.

The decision between two, three, and four is close in this example. I think it comes down to whether you’re going to need a single Publish event in the types of analysis you’re performing. If knowing that the user published anything is important and each type publishing is conceptually similar, then having a single event might make sense. However, if your analysis is frequently going to focus on whether the user just published a post or just published a page or just published an image, having distinct Publish Post/Publish Page/Publish Image events is more convenient because you won’t constantly have to specify that you want the Publish event where the Type is Post. If publishing an image is similar to publishing a normal post, then maybe the three-event option is best.

At Automattic we went with three (Sign Up, Publish Post, Publish Page) and then added a feature to some of our tools (like our funnel builder) that let you specify a step can be one of several events (like publishing a post or publishing a page).

Hopefully this gives you a few things to think about next time you go to name new analytics events. If you can’t decide which route to go, feel free to reach out over email and I’d be happy brainstorm with you.

My Star Wars Action Figure Megacollection

In honor of the release of Star Wars The Force Awakens today I’d like to share with you all my flash talk from Automattic’s Grand Meetup this year.

Flash talks are short talks that every Automattician has to give at the Grand Meetup, a once a year gathering where the entire company gets together for a week to work and play. We can talk about anything we’d like so I showed off my childhood Stars Wars action figure collection.

Enjoy.

Related SNL clip from last week: