How Calypso’s A/B Test Module Works

Automattic recently open sourced Calypso, a JavaScript and REST-API powered interface that runs WordPress.com. I was fortunate to get to work on a few pieces of it, mainly its Analytics and A/B Test modules. In this post I’ll walk through how the A/B test module works because it might give you a few things to consider if you find yourself rolling your own A/B testing solution like we’ve done at Automattic.

Bucketing and Reporting

When it comes to A/B testing, there are two tools you need: one that buckets users and one that reports on the results of your tests. The bucketing tool lets you say “Show 50% of users a green button, show the other 50% the red button”. The analysis tool then lets you measure the impact of the green vs red button on other actions like signing up for an account, publishing a post, upgrading, etc.

The Calypso A/B Test module that I’ll be discussing in this post is our bucketing tool. We also have a separate internal tool for analyzing the results of the A/B tests, but that’s a topic for another post.

A/B Testing in Calypso

The A/B Test module’s README provides detailed instructions for how it works. You can also check out the module itself if you’re interested. I’ll give an overview here and elaborate on some of the decisions that went into it.

We have a file called active-tests.js that contains configuration information for all of the active tests we’re running in Calypso. For example, here’s one of the tests:


businessPluginsNudge: {
datestamp: '20151119',
variations: {
drake: 50,
nudge: 50
},
defaultVariation: 'drake'
}

view raw

example-test.js

hosted with ❤ by GitHub

What this says is that we have a test called businessPluginsNudge that started on November 19th that has two variations: drake and nudge each of which are shown 50% of the time. Also, users that are inelligble to participate in the test should be shown the drake variation (more on what inelligible means below).

To assign a user to a test, there’s a function the A/B test module exports called abtest. It’s used like so:


// Here, the user is assigned one of the variations for this test
// so variation will be either `drake` or `nudge`
var variation = abtest( 'businessPluginsNudge' );
// We can then vary what the user sees based on the variation
// he or she was assigned to
if ( variation === 'drake' ) {
// Do something
} else {
// Do something else
}

The abtest function assigns the user to a variation and returns that variation. For this particular test, if the user is eligible then 50% of the time the function will return drake and the other 50% of the time it will return nudge. We can then use the variation to determine what the user sees.

The abtest function also sends the test name and the user’s variation back to us via an API endpoint so that we can record it and later use it to measure the impact on other events using our internal reporting tool.

Eligibility

Consider an A/B test that tests the wording of a button. If the new wording isn’t properly translated but a large percentage of the users don’t speak English, it can throw off the results of the test. For example, if the new wording underperforms, was it because the new wording was truly inferior or was it because a lot of non-English users saw the English wording and simply couldn’t read it?

To account for that and similar issues, we have this idea of eligibility. In certain situations we don’t want users to count towards the test. We need to show them something of course, but we don’t want to track it. That’s what the defaultVariation property is for in the test configuration. Inelligible users are shown that variation, but we don’t send the information about the test and the user’s variation back to our servers. By default in the A/B test module, only English language users are eligible for the tests so they will always be shown the variation specified by defaultVariation.

We also only want to include users that have local storage enabled because that’s where we save the user’s variation. We save the variation locally because we always want to show the user the same variation that they originally saw. Saving it to local storage keeps things fast. We could fetch it from the server, but we don’t want to slow down the UI while we wait for the response so we read it from local storage instead. A side effect of this is that we don’t handle situations where users change browsers, switch devices, or clear their local storage. That’s only a fraction of users though so it doesn’t impact the results very much.

One last point on eligibility: imagine you’re testing the wording on a particular button. You run one test where you show “Upgrade Now” and another “Upgrade Today” (this is a silly test, but just to give you an idea). Lets say “Upgrade Today” wins and you make that the default. Then you run another test comparing “Upgrade Today” to “Turbocharge Your Site”. If a user participated in the original test and saw the “Upgrade Now” variation, it could impact their behavior on this new test. To account for that, if a user has participated in a previous test with the same name as a new test, then he or she won’t be eligible for the new test. We only want to include users who are participating in the test for the first time because it will result numbers that better represent the impact of each variation.

Multiple active tests

In A/B testing parlance, there’s a concept known as multivariate tests. The idea is that if you have a test running on one page (green button, red button) and another test running on another page (“Upgrade Now”, “Upgrade Today”), the combination of the variations from those tests might be important. For example, what if green button + “Upgrade Today” leads to a higher conversion rate than the other combinations? There is that possibility, but we generally don’t worry about that to keep the analysis simpler.

Dealing with A/B tests that span multiple pages

There’s one final situation I want to note:

Imagine you have two pricing pages, one for your Silver Plan and one for your Gold Plan. On the Silver Plan‘s pricing page, you assign the user a variation and use that to adjust the page:


var variation = abtest( 'silverPlan' );

view raw

silver-plan.js

hosted with ❤ by GitHub

So far so good. Now imagine that you want to adjust the payment form on a different page if the user saw a particular variation for the Silver Plan.

If you call abtest( 'silverPlan' ) to grab the variation on the payment page, it will also assign the user to a variation for that test. Many of the users viewing the payment page though will be purchasing the Gold Plan and never have even seen the plan page for your Silver Plan. Assigning those users to a variation will distort the results of the test. To account for that, the A/B test module also exports a getABTestVariation function that just returns a user’s variation without assigning him to one:


var variation = getABTestVariation( 'silverPlan' );

This doesn’t come up in simple tests, but for complex tests that affect multiple parts of the user’s experience, it’s essential to be able to determine if a user is part of a variation without assigning him or her to one.

Wrapping Up

As you can see, there are a lot of subtle issues that can impact the results of your tests. Hopefully this gives you an idea of a few of the things to watch out for if you do roll your own A/B testing tools.

If you have any questions, suggestions on how to improve it, or just want to chat about A/B testing tools, don’t hesitate to drop me a note.

Nothing beats Dash for quickly checking dev documentation

Dash is an incredibly useful Mac app that I’d highly recommend all developers check out. It lets you instantly search developer documentation (devdocs) straight from your computer:

Dash.gif

You can configure it to check only the devdocs you use on a regular basis. For example, I have Dash configured to search the devdocs for SaaS, Rails 4, jQuery, HTML, jQuery UI, PHP, MySQL, CSS, JavaScript, Ruby, WordPress, Node.js, Lo-Dash, R, and D3.js. It supports over 150+ sets of documentation and also lets you generate your own. Dash also keeps the documentation automatically up to date as it changes.

If you want to constrain your search to a specific set of devdocs, you can prefix your search such as ruby:gsub and it will only check the Ruby docs.

I also set Cmd+Shift+D to load Dash so that I can pull it up while I’m coding, perform a search, and Alt+Tab back to Sublime without ever touching the mouse.

It’s free to try and $24.99 to buy. Give it a shot and rejoice that you’ll never again have to Google for documentation.

The impact of a $15 minimum wage on a McDonalds

There was a really interesting thread on Reddit earlier this week in the Explain It Like I’m 5 (ELI5) subreddit titled How would a $15 minimum wage ACTUALLY affect a franchised business like McDonalds?

In an effort to make sure I understand the math, I’m going to try to summarize the top response. Here we go:

The Cost of Labor (COL) is the sum your employees’ wages + benefits + payroll taxes. When viewing an operational report for a business, the COL is usually also expressed as a percentage of net sales. Net sales is gross sales minus returns and discounts which for a franchise like McDonalds means probably just subtracting the value of coupons.

For the franchise the commentor is considering for his analysis (which may or may not be an actual McDonalds), the COL is currently 28% of its net sales. So for every $1 they sell, $0.28 goes towards labor. If you buy a $15 meal, it costs $4.20 in wages to produce it on average.

(Some commentors point out that 28% is high and where they worked the goal was 15% and if they operated at more than 20% for a week the manager would get fired. Those are for higher end restaurants though.)

For restaurants, there’s also Cost of Sales aka Cost of Goods which is basically the cost of the ingredients. For this franchise, it’s also 28% of net sales. So for a $15 meal, 28% COL + 28% COS = 56% or $8.40 towards the wages and ingredients to make it.

Then there’s franchise fees (aka royalty fees which corporate charges each franchise for running a store with their brand), which are ~10% of net sales.

COL + COS + the franchise fee make up the majority of operating costs.

For the franchise he’s looking at for a particular week, those numbers work out to: $27,321 net sales so 28% to COL ($7,702) + 28% COS ($7,908) + 10% franchise fee ($2,732) = $8,979 remaining. Here, COL + COS are ~56% of the net sales. The remaining amount is used to pay the manager, assistant manager, rent/mortgage, garbage, utilities, maintenance, advertising, administrative overhead, etc.

At this restaurant, employees make $9.25/hour on average. Increasing the minimum to $15/hour would be a 62% increase in COL (we assume everyone would make $15/hour to keep it simple). With the same $27,321 net sales, that would bump COL to $12,477, reducing the remaining amount to $4,204. That won’t be enough to cover all of the remaining costs. Now COL + COS are ~74% of net sales.

For fast food restaurants, a general rule is that you want COL + COS to be under 60% and need it to be under 65% to be profitable. Another commentor said a good goal is 50% for COL + COS. It will vary by the type of restaurant; the fast food is extremely competetive so there are thin margins.

Increasing the COL by 62% would cause major issues. By increasing the hourly wage to $15, it increases the COL by $12,477 – $7,702 = $4,775/week. If you wanted the same $8,979 remaining, you’d have to increase the net sales by that $4,775/week to $32,096, an increase of 17%. That would probably come from higher menu prices, assuming customers were willing to pay it.

This other response and the comments on it are worth a read as well.


I’ll end by saying that I do believe the current US minimum wage is too low and think we should raise it, but… it’s complicated. If the national minimum wage was raised to $15/hour, that would would also lead to higher COS for McDonalds because it would cause more for companies to produce the ingredients, correct? But it would also mean that people who were making less than $15 would have more money to spend so a hypothetical 10%-20% increase in menu prices might not be that bad. But if the price of everything increases, doesn’t it decrease the value of those extra wages? While the Reddit discussion is interesting, it made me appreciate that there are professional economists out there who can take into account the full impact of a change like this.

Using the ESLint Gem in Rails

ESLint is a popular linting utility for JavaScript. In this post I’ll show you how I use it in a Ruby on Rails app.

A quick intro to ESLint

ESLint lets  you specify how you want to style your JavaScript and it will then check your code and report any issues. For example, if you use the quotes rule to specify that you want to use single quotes everywhere, ESLint will check whether that’s true and report back anywhere you accidentally used double quotes.

Whether you’re a part of a team or working on a project by yourself, ESLint is a great way to ensure clean, consistent code and identify bugs before they ever make their way into production.

The ESLint Gem

Jon Kessler and Justin Force created a handy ESLint gem for Rails. You simplify create a configuration file in config/eslint.json, execute rake eslint:run, and it will check your application.js file for any issues.

If you’re looking for a solid eslint config file, check out the one we at Automattic use for Calypso.

Customing the workflow

I wound up customizing how I use the the gem for two reasons:

  1. The gem checks application.js which concatenates all of your JavaScript assets based on the manifest file. If your assets include third party scripts like jQuery, ESLint will wind up linting those as well which you probably don’t care about.
  2. Similarly, because all of your JavaScript files are concatenated in application.js, the line numbers that ESLint spits out in its report don’t correspond to the line numbers in the individual files, making it difficult to pinpoint the offending lines of code.

To account for this, I first moved all of the third party JavaScript files out of app/assets/javascripts and into app/assets/javascripts/lib. With them moved out of the javascripts directory, I then wrote a new Rake task that takes advantage of the gem’s ability to lint a single file:


namespace :lint do
task run: :environment do
js_dir = "#{Rails.root}/app/assets/javascripts"
Dir.chdir(js_dir)
# No directories and no application.js
files = Dir.glob('*').delete_if{ |f| File.directory?(f) || f == 'application.js' }
files.each do |file|
puts file
# There is a likely way to do this with Rake::Task's `invoke` and `reenable` methods
# but I couldn't figure out how to get it to check more than the first file.
puts `rake eslint:run[#{js_dir}/#{file}]`
end
end
end

view raw

lint.rake

hosted with ❤ by GitHub

With this in place, you can run rake lint:run and it will iterate over each of your JavaScript files within the javascripts directory and execute ESLint on each one:

$ rake lint:run

account.js
48:5 low indent Expected indentation of 3 tab chrs but found 4
49:5 low indent Expected indentation of 3 tab chrs but found 4
50:5 low indent Expected indentation of 3 tab chrs but found 4

interface.js
612:3 slow quote-props Unnecessarily quoted property `class` found
613:28 low quote-props Unnecessarily quoted property `class` found
915:1 low valid-jsdoc Missing JSDoc parameter type for 'reason'

If you also use ESLint in your Rails project, I’d love to hear more about your setup.