Creating a stacked area chart in R is fairly painless, unless your data has gaps. For example, consider the following CSV data showing the number of plan signups per week:
Plotting this highlights the problem:
The reason the gaps exist is that not all plans have data points every week. Consider Gold, for example: during the first four weeks there are 55, 37, 42, and 26 signups, but during the last week there isn’t a data point at all. That’s why the chart shows the gap: it’s not that the data indicates Gold went to zero signups the final week; it indicates no data at all.
To remedy this, we need to ensure that every week contains a data point for every plan. That means for weeks where there isn’t a data point for a plan, we need to fill it in with 0 so that R knows that the signups are in fact 0 for that week.
I asked Charles Bordet, an R expert who I hired through Upwork to help me level up my R skills, how he would go about filling in the data.
He provided two solutions:
1. Using expand.grid and full_join
Here’s how it works:
expand.grid creates “a data frame from all combinations of the supplied vectors or factors”. By passing it in the weeks and plans, it generates the following data frame called
full_join then takes all of the rows from
data and combines them with
combinations based on
plan. When there aren’t any matches (which will happen when a week doesn’t have a value for a plan),
signups gets set to
Then we just use dplyr’s
mutate to replace all of the
NA values with zero, and voila:
2. Using spread and gather
The second method Charles provided uses the tidyr package’s
spread function takes the key-value pairs (week and plan in this case) and spreads it across multiple columns, making the “long” data “wider”, and filling in the missing values with 0:
Then we take the wide data and convert it back to long data using
- week means to exclude the
week column when gathering the data that
Using either methods, we get a stacked area chart without the gaps ⚡️: