Removing Gaps from Stacked Area Charts in R

Creating a stacked area chart in R is fairly painless, unless your data has gaps. For example, consider the following CSV data showing the number of plan signups per week:

Plotting this highlights the problem:

chart.png

The reason the gaps exist is that not all plans have data points every week. Consider Gold, for example: during the first four weeks there are 55, 37, 42, and 26 signups, but during the last week there isn’t a data point at all. That’s why the chart shows the gap: it’s not that the data indicates Gold went to zero signups the final week; it indicates no data at all.

To remedy this, we need to ensure that every week contains a data point for every plan. That means for weeks where there isn’t a data point for a plan, we need to fill it in with 0 so that R knows that the signups are in fact 0 for that week.

I asked Charles Bordet, an R expert who I hired through Upwork to help me level up my R skills, how he would go about filling in the data.

He provided two solutions:

1. Using expand.grid and full_join

Here’s how it works:

expand.grid creates “a data frame from all combinations of the supplied vectors or factors”. By passing it in the weeks and plans, it generates the following data frame called combinations:

The full_join then takes all of the rows from data and combines them with combinations based on week and plan. When there aren’t any matches (which will happen when a week doesn’t have a value for a plan), signups gets set to NA:

Then we just use dplyr’s mutate to replace all of the NA values with zero, and voila:

2. Using spread and gather

The second method Charles provided uses the tidyr package’s spread and gather functions:

The spread function takes the key-value pairs (week and plan in this case) and spreads it across multiple columns, making the “long” data “wider”, and filling in the missing values with 0:

Then we take the wide data and convert it back to long data using gather The - week means to exclude the week column when gathering the data that spread produced:

Using either methods, we get a stacked area chart without the gaps ⚡️:

chart.png

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s