My R Cheat Sheet, now available on GitHub

Despite working on and off with R for about two years now, I can never seem to remember how to do basic things when I return to it after a few weeks away.

I recently started keeping detailed notes for myself to minimize how much time I spend figuring things out that I already learned about in the past.

You can check out my cheat sheet on GitHub here:

https://github.com/mattm/r-cheat-sheet

It covers everything from data frames to working with dates and times to using ggplot and a lot more. I’ll update it periodically as I add new notes.

If you spot any mistakes or have any suggestions for how to improve it, don’t hesitate to shoot me an email.

Chronos: An R Script to Analyze the Distribution of Time Between Two Events

If someone asked you about your site’s conversion rates, you could probably tell them what the conversion rates are (right?). But what if someone asked you what % convert within an hour, a day, or a week?

We’ve been looking at this at Automattic and I wound up putting together an R script to help with the analysis. Because everything needs a fancy name, I dubbed it Chronos and you can check it out on Github.

All you need to do to use it is generate a CSV containing two columns: one with the unix timestamp of the first event and another with the unix timestamp of the second event:

1350268044,1408676495
1307322538,1350061315
1307676110,1340667657
1307661905,1337311786
1307758702,1428877904
...

The script will then show you the distribution of time between the two events as well as the percent that occur prior to a few fixed points (30 minutes, 1 hour, etc):

Distribution:
5% within 2 minutes 
10% within 5 minutes 
15% within 1 hour 21 minutes 
20% within 1 day 38 minutes 
25% within 3 days 2 hours 58 minutes 
30% within 6 days 9 hours 20 minutes 
33.33333% within 11 days 
35% within 14 days 
40% within 23 days 
45% within 42 days 
50% within 67 days 
55% within 95 days 
60% within 148 days 
65% within 210 days 
66.66667% within 232 days 
70% within 288 days 
75% within 390 days 
80% within 550 days 
85% within 677 days 
90% within 920 days 
95% within 1288 days 
100% within 1715 days 

Percentage by certain durations:
13% within 30 minutes
14% within 1 hour
17% within 5 hours
20% within 1 day
30% within 7 days

In addition to analyzing conversion rates, you can use this to measure things like retention rates. The data above, for example, looks at how long between when users logged their first beer and last beer in Adam Week‘s handy beer tracking app, BrewskiMe (thank you again Adam for providing the data).

If you run into any issues or have any suggestions for how to improve it just let me know.

The impact of a $15 minimum wage on a McDonalds

There was a really interesting thread on Reddit earlier this week in the Explain It Like I’m 5 (ELI5) subreddit titled How would a $15 minimum wage ACTUALLY affect a franchised business like McDonalds?

In an effort to make sure I understand the math, I’m going to try to summarize the top response. Here we go:

The Cost of Labor (COL) is the sum your employees’ wages + benefits + payroll taxes. When viewing an operational report for a business, the COL is usually also expressed as a percentage of net sales. Net sales is gross sales minus returns and discounts which for a franchise like McDonalds means probably just subtracting the value of coupons.

For the franchise the commentor is considering for his analysis (which may or may not be an actual McDonalds), the COL is currently 28% of its net sales. So for every $1 they sell, $0.28 goes towards labor. If you buy a $15 meal, it costs $4.20 in wages to produce it on average.

(Some commentors point out that 28% is high and where they worked the goal was 15% and if they operated at more than 20% for a week the manager would get fired. Those are for higher end restaurants though.)

For restaurants, there’s also Cost of Sales aka Cost of Goods which is basically the cost of the ingredients. For this franchise, it’s also 28% of net sales. So for a $15 meal, 28% COL + 28% COS = 56% or $8.40 towards the wages and ingredients to make it.

Then there’s franchise fees (aka royalty fees which corporate charges each franchise for running a store with their brand), which are ~10% of net sales.

COL + COS + the franchise fee make up the majority of operating costs.

For the franchise he’s looking at for a particular week, those numbers work out to: $27,321 net sales so 28% to COL ($7,702) + 28% COS ($7,908) + 10% franchise fee ($2,732) = $8,979 remaining. Here, COL + COS are ~56% of the net sales. The remaining amount is used to pay the manager, assistant manager, rent/mortgage, garbage, utilities, maintenance, advertising, administrative overhead, etc.

At this restaurant, employees make $9.25/hour on average. Increasing the minimum to $15/hour would be a 62% increase in COL (we assume everyone would make $15/hour to keep it simple). With the same $27,321 net sales, that would bump COL to $12,477, reducing the remaining amount to $4,204. That won’t be enough to cover all of the remaining costs. Now COL + COS are ~74% of net sales.

For fast food restaurants, a general rule is that you want COL + COS to be under 60% and need it to be under 65% to be profitable. Another commentor said a good goal is 50% for COL + COS. It will vary by the type of restaurant; the fast food is extremely competetive so there are thin margins.

Increasing the COL by 62% would cause major issues. By increasing the hourly wage to $15, it increases the COL by $12,477 – $7,702 = $4,775/week. If you wanted the same $8,979 remaining, you’d have to increase the net sales by that $4,775/week to $32,096, an increase of 17%. That would probably come from higher menu prices, assuming customers were willing to pay it.

This other response and the comments on it are worth a read as well.


I’ll end by saying that I do believe the current US minimum wage is too low and think we should raise it, but… it’s complicated. If the national minimum wage was raised to $15/hour, that would would also lead to higher COS for McDonalds because it would cause more for companies to produce the ingredients, correct? But it would also mean that people who were making less than $15 would have more money to spend so a hypothetical 10%-20% increase in menu prices might not be that bad. But if the price of everything increases, doesn’t it decrease the value of those extra wages? While the Reddit discussion is interesting, it made me appreciate that there are professional economists out there who can take into account the full impact of a change like this.

Visualizing Your SaaS App’s Monthly Active Users Broken Down by Signup Cohort

This week at Automattic I’ve been helping with a tool that will allow us to visualize the number of active WordPress.com users each month broken down by when those users signed up for an account. I think this type of chart and what you can learn from it are incredibly valuable so I wanted to show you all how to quickly create one for your own service.

Here’s an example of what this type of chart looks like courtesy of Buffer’s Joel Gascoigne:

What I really like about it is that for each month you can see how many active users there are and when those users signed up for an account. This not only gives you a sense how long ago your active users signed up, but also of your service’s ability to retain users over time.

If you’d like to create a similar chart to visualize your SaaS app’s active users, I put together a small R script on Github that will help you do just that.

The only thing that you need to provide the script is a CSV file that contains user IDs and dates that those users performed an action in your app.

For example, the test data set that comes with it contains user IDs and actions performed by users of one of my apps (Preceden, a web-based timeline maker) for the first year that the site existed (as determined by the automatically set created_at and updated_at values on the Ruby on Rails Active Record objects that each user is associated with):

2   2010-03-28
2   2010-04-09
2   2010-04-10
2   2010-05-16
3   2010-01-31
3   2014-05-07
3   2014-09-30
3   2015-04-11
4   2010-01-31
4   2010-10-06
...

In this example user IDs 2 and 3 each performed actions on four dates and user ID 4 performed actions on 2 dates. The script will analyze that data to figure out which cohort the user belongs to based on the earliest date the user performed an action and count that user toward the active users for each subsequent month that he or she performed an action:

monthly

As you can see there was a huge spike at the beginning of the year when Preceden launched on HackerNews and was subsequently covered on other tech sites, but by December only a fraction of those users were still active. On that note, I encourage you to strive to build a service like Buffer that delivers long term value so your chart doesn’t wind up looking like this one :).

If you have any questions or need help customizing the script in any way, please don’t hesitate to drop me a note.

Thanks Joel Martinez and Rob Felty for providing feedback on the code.