My R Cheat Sheet, now available on GitHub

Despite working on and off with R for about two years now, I can never seem to remember how to do basic things when I return to it after a few weeks away.

I recently started keeping detailed notes for myself to minimize how much time I spend figuring things out that I already learned about in the past.

You can check out my cheat sheet on GitHub here:

https://github.com/mattm/r-cheat-sheet

It covers everything from data frames to working with dates and times to using ggplot and a lot more. I’ll update it periodically as I add new notes.

If you spot any mistakes or have any suggestions for how to improve it, don’t hesitate to shoot me an email.

Chronos: An R Script to Analyze the Distribution of Time Between Two Events

If someone asked you about your site’s conversion rates, you could probably tell them what the conversion rates are (right?). But what if someone asked you what % convert within an hour, a day, or a week?

We’ve been looking at this at Automattic and I wound up putting together an R script to help with the analysis. Because everything needs a fancy name, I dubbed it Chronos and you can check it out on Github.

All you need to do to use it is generate a CSV containing two columns: one with the unix timestamp of the first event and another with the unix timestamp of the second event:

1350268044,1408676495
1307322538,1350061315
1307676110,1340667657
1307661905,1337311786
1307758702,1428877904
...

The script will then show you the distribution of time between the two events as well as the percent that occur prior to a few fixed points (30 minutes, 1 hour, etc):

Distribution:
5% within 2 minutes 
10% within 5 minutes 
15% within 1 hour 21 minutes 
20% within 1 day 38 minutes 
25% within 3 days 2 hours 58 minutes 
30% within 6 days 9 hours 20 minutes 
33.33333% within 11 days 
35% within 14 days 
40% within 23 days 
45% within 42 days 
50% within 67 days 
55% within 95 days 
60% within 148 days 
65% within 210 days 
66.66667% within 232 days 
70% within 288 days 
75% within 390 days 
80% within 550 days 
85% within 677 days 
90% within 920 days 
95% within 1288 days 
100% within 1715 days 

Percentage by certain durations:
13% within 30 minutes
14% within 1 hour
17% within 5 hours
20% within 1 day
30% within 7 days

In addition to analyzing conversion rates, you can use this to measure things like retention rates. The data above, for example, looks at how long between when users logged their first beer and last beer in Adam Week‘s handy beer tracking app, BrewskiMe (thank you again Adam for providing the data).

If you run into any issues or have any suggestions for how to improve it just let me know.

The impact of a $15 minimum wage on a McDonalds

There was a really interesting thread on Reddit earlier this week in the Explain It Like I’m 5 (ELI5) subreddit titled How would a $15 minimum wage ACTUALLY affect a franchised business like McDonalds?

In an effort to make sure I understand the math, I’m going to try to summarize the top response. Here we go:

The Cost of Labor (COL) is the sum your employees’ wages + benefits + payroll taxes. When viewing an operational report for a business, the COL is usually also expressed as a percentage of net sales. Net sales is gross sales minus returns and discounts which for a franchise like McDonalds means probably just subtracting the value of coupons.

For the franchise the commentor is considering for his analysis (which may or may not be an actual McDonalds), the COL is currently 28% of its net sales. So for every $1 they sell, $0.28 goes towards labor. If you buy a $15 meal, it costs $4.20 in wages to produce it on average.

(Some commentors point out that 28% is high and where they worked the goal was 15% and if they operated at more than 20% for a week the manager would get fired. Those are for higher end restaurants though.)

For restaurants, there’s also Cost of Sales aka Cost of Goods which is basically the cost of the ingredients. For this franchise, it’s also 28% of net sales. So for a $15 meal, 28% COL + 28% COS = 56% or $8.40 towards the wages and ingredients to make it.

Then there’s franchise fees (aka royalty fees which corporate charges each franchise for running a store with their brand), which are ~10% of net sales.

COL + COS + the franchise fee make up the majority of operating costs.

For the franchise he’s looking at for a particular week, those numbers work out to: $27,321 net sales so 28% to COL ($7,702) + 28% COS ($7,908) + 10% franchise fee ($2,732) = $8,979 remaining. Here, COL + COS are ~56% of the net sales. The remaining amount is used to pay the manager, assistant manager, rent/mortgage, garbage, utilities, maintenance, advertising, administrative overhead, etc.

At this restaurant, employees make $9.25/hour on average. Increasing the minimum to $15/hour would be a 62% increase in COL (we assume everyone would make $15/hour to keep it simple). With the same $27,321 net sales, that would bump COL to $12,477, reducing the remaining amount to $4,204. That won’t be enough to cover all of the remaining costs. Now COL + COS are ~74% of net sales.

For fast food restaurants, a general rule is that you want COL + COS to be under 60% and need it to be under 65% to be profitable. Another commentor said a good goal is 50% for COL + COS. It will vary by the type of restaurant; the fast food is extremely competetive so there are thin margins.

Increasing the COL by 62% would cause major issues. By increasing the hourly wage to $15, it increases the COL by $12,477 – $7,702 = $4,775/week. If you wanted the same $8,979 remaining, you’d have to increase the net sales by that $4,775/week to $32,096, an increase of 17%. That would probably come from higher menu prices, assuming customers were willing to pay it.

This other response and the comments on it are worth a read as well.


I’ll end by saying that I do believe the current US minimum wage is too low and think we should raise it, but… it’s complicated. If the national minimum wage was raised to $15/hour, that would would also lead to higher COS for McDonalds because it would cause more for companies to produce the ingredients, correct? But it would also mean that people who were making less than $15 would have more money to spend so a hypothetical 10%-20% increase in menu prices might not be that bad. But if the price of everything increases, doesn’t it decrease the value of those extra wages? While the Reddit discussion is interesting, it made me appreciate that there are professional economists out there who can take into account the full impact of a change like this.

Visualizing Your SaaS App’s Monthly Active Users Broken Down by Signup Cohort

This week at Automattic I’ve been helping with a tool that will allow us to visualize the number of active WordPress.com users each month broken down by when those users signed up for an account. I think this type of chart and what you can learn from it are incredibly valuable so I wanted to show you all how to quickly create one for your own service.

Here’s an example of what this type of chart looks like courtesy of Buffer’s Joel Gascoigne:

What I really like about it is that for each month you can see how many active users there are and when those users signed up for an account. This not only gives you a sense how long ago your active users signed up, but also of your service’s ability to retain users over time.

If you’d like to create a similar chart to visualize your SaaS app’s active users, I put together a small R script on Github that will help you do just that.

The only thing that you need to provide the script is a CSV file that contains user IDs and dates that those users performed an action in your app.

For example, the test data set that comes with it contains user IDs and actions performed by users of one of my apps (Preceden, a web-based timeline maker) for the first year that the site existed (as determined by the automatically set created_at and updated_at values on the Ruby on Rails Active Record objects that each user is associated with):

2   2010-03-28
2   2010-04-09
2   2010-04-10
2   2010-05-16
3   2010-01-31
3   2014-05-07
3   2014-09-30
3   2015-04-11
4   2010-01-31
4   2010-10-06
...

In this example user IDs 2 and 3 each performed actions on four dates and user ID 4 performed actions on 2 dates. The script will analyze that data to figure out which cohort the user belongs to based on the earliest date the user performed an action and count that user toward the active users for each subsequent month that he or she performed an action:

monthly

As you can see there was a huge spike at the beginning of the year when Preceden launched on HackerNews and was subsequently covered on other tech sites, but by December only a fraction of those users were still active. On that note, I encourage you to strive to build a service like Buffer that delivers long term value so your chart doesn’t wind up looking like this one :).

If you have any questions or need help customizing the script in any way, please don’t hesitate to drop me a note.

Thanks Joel Martinez and Rob Felty for providing feedback on the code.

The Innovator’s Dilemma, Facebook, and the Oculus Acquisition

In The Age of Spiritual Machines Ray Kurzweil describes the life cycle of a technology:

We can identify seven distinct stages in the life cycle of a technology.

1. During the precursor stage, the prerequisites of a technology exist, and dreamers may contemplate these elements coming together. We do not, however, regard dreaming to be the same as inventing, even if the dreams are written down. Leonardo da Vinci drew convincing pictures of airplanes and automobiles, but he is not considered to have invented either.

2. The next stage, one highly celebrated in our culture, is invention, a very brief stage, similar in some respects to the process of birth after an extended period of labor. Here the inventor blends curiosity, scientific skills, determination, and usually a measure of showmanship to combine methods in a new way and brings a new technology to life.

3. The next stage is development, during which the invention is protected and supported by doting guardians (who may include the original inventor). Often this stage is more crucial than invention and may involve additional creation that can have greater significance than the invention itself. Many tinkerers had constructed finely handtuned horseless carriages, but it was Henry Ford’s innovation of mass production that enabled the automobile to take root and flourish.

4. The fourth stage is maturity. Although continuing to evolve, the technology now has a life of its own and has become an established part of the community. It may become so interwoven in the fabric of life that it appears to many observers that it will last forever. This creates an interesting drama when the next stage arrives, which I call the stage of the false pretenders.

5. Here an upstart threatens to eclipse the older technology. Its enthusiasts prematurely predict victory. While providing some distinct benefits, the newer technology is found on reflection to be lacking some key element of functionality or quality. When it indeed fails to dislodge the established order, the technology conservatives take this as evidence that the original approach will indeed live forever.

6. This is usually a short-lived victory for the aging technology. Shortly thereafter, another new technology typically does succeed in rendering the original technology to the stage of obsolescence. In this part of the life cycle, the technology lives out its senior years in gradual decline, its original purpose and functionality now subsumed by a more spry competitor.

7. In this stage, which may comprise 5 to 10 percent of a technology’s life cycle, it finally yields to antiquity (as did the horse and buggy, the harpsichord, the vinyl record, and the manual typewriter).

Fred Wilson argues in The Search For The Next Platform that Facebook’s acquisition of Oculus is “Zuck and his team looking up and saying “what’s next?””. Viewed through the lens of Kurzweil’s seven stages, Facebook is at stage 4 (maturity) and Oculus is the potentially disruptive upstart in stage 5. It might seem far-fetched now, but what if Oculus’s virtual reality platform did eventually evolve into a communication platform? Could it threaten Facebook’s current dominance? Maybe. The acquisition then can be seen as part Facebook’s attempt to beat the Innovator’s Dilemma, the tendency of mature companies to lose out to startups by focusing too much on satisfying existing customers and not enough on disruptive new technologies.

Will Facebook succeed and still be relevant in 5-10-20+ years or more? I have no idea, but I can’t wait to see how things play out. :)

Saving $X Per Week Nets You $752X After 10 Years

I’ve been slowly making my way through Mr. Money Mustache’s blog archive — something I encourage everyone to check out — and it’s been an incredibly eye opening experience.

Take this, for example:

A Starbucks habit of picking up a regular coffee and biscotti on the way to work each workday. $4/day = $20/week = $15,040 in coffee over just ten years!!

Without compounding, $4/day = $20/week = $1,040/year or $10,400 after ten years. However, to calculate it’s actual future value you have to take into account what would happen if you invested that $20/week instead.

He provides a helpful shortcut for calculating the future value assuming 7% growth compounded over 10 years:

To calculate a weekly expense compounded over ten years, multiply the price by 752. For a monthly expense, multiply by 173.

752 * $20 equals the $15,040 he calculated.

Curious, I looked into the math behind this. He provides a link to Future Value Calculator which derives the formula for an ordinary annuity, though I found this explanation on Investopedia much easier to follow.

Where does that 752 come from? From the Investopedia article:

future_value

In this case the interest is not 7% but 7% divided by 52 weeks per year, and the number of payments is 52 weeks per year multiplied by 10 years.

The future value then is:

FV = C*[ \frac{(1+\frac{0.07}{52})^{52*10}-1}{\frac{0.07}{52}} ]

Or, more simply:

FV = C * 752.34

That’s where the 752 comes from.

Similarly, the future value after 10 years of monthly savings is:

FV = C*[ \frac{(1+\frac{0.07}{12})^{12*10}-1}{\frac{0.07}{12}} ]

FV = C * 173.08

What’s amazing is that $15K is just by saving $20/week. If you can save $100/week instead, you’ll net about $75,200 after ten years. Pretty crazy isn’t it?