A/B Test Simulator v1

A few weeks ago I wrote about how I was working on an A/B test simulator and asked if anyone was interested in working on it. A few of you reached out (thank you!) but the discussions quickly stalled because I realized that I didn’t have a good plan where to take it from there.

Rather than let it linger on my Macbook forever more, I made a push ship the v1 and am happy to say you can check it out on GitHub here:

https://github.com/mattm/abtest-simulator

How it works

Here’s the idea:

Let’s say you’re running a big test on your homepage which has a conversion rate of 10% and you think your test will either do really well (+20%) or fail terribly (-20%). You configure this in the script:

Also, you want to run your A/B test until you’ve either had more than 10,000 participants or until the test has reached 99% significance. You configure this in the evaluate method:

When you run the script (ruby abtest-simulator.rb) it then simulates 1,000 A/B tests, where for each A/B test we assign visitors one of the variations and continue until we declare a winner or pass if a winner is never decided on:

Summary:
Passes: 74
Correct: 908
Incorrect: 18
Correct Decisions: 908/926: 98.06%

908 times out of 1,000 our criteria made the “correct” decision: we choose the winning +20% variation or didn’t chose the -20% variation. In 18 tests we incorrectly chose the -20% variation or didn’t choose the +20% variation. And in 74 tests out of 1,000 we never reached significance.

The idea with this project is that you can play around with the numbers to see what impact they have. For example:

  • What is the impact of 99% signifiance vs 95% signifiance?
  • What if you just wait until there are 50 conversions and pick the best performer?
  • What if you don’t expect the test to result in a big change, but only smaller ones? (Hint: A/B testing small changes is a mess.)

Next steps

If anyone is interested in helping on this project, now’s a good time to get involved.

Specifically, I’d love it for folks to go through the script and verify that I haven’t made any logical mistakes. I don’t think I have, but also wouldn’t bet my house on it. That’s also why I’m not including any general “lessons learned” from this simulator just yet – I don’t want to report on results until others have verified that all is well with the script. I also wouldn’t rule out someone saying “Matt, the way you’ve done this doesn’t make any sense”. If I figure out any mistakes on my own or from others, I’ll write posts about them so others can learn as well.

If you can’t find any bugs, just play around with it. How does the original conversion rate impact the results? How does the distribution of results impact it? How does the criteria for ending the test impact it? Eventually we can publish our findings – the more people that contribute, the better the writeup will be.

Analyzing an A/B Test’s Impact Using Funnel Segmentation

If you decide to roll your own in-house A/B testing solution, you’re going to need a way to measure how each variation in each test influences user behavior.

In my experience the best way to do this is to take advantage of a third party analytics tool and piggyback on its funnel segmentation features. This post is about how to do that.

Funnel Segmentation 101

Consider this funnel from Lean Domain Search:

  1. A user performs a search
  2. Then clicks on a search result
  3. Then clicks on a registration link

In Mixpanel, the funnel looks like this:

Screen Shot 2016-08-04 at 9.29.19 AM.png

Of the 35K people who performed a search, 9K (26%) of them clicked on a search result, then 900 (10% who clicked, 2.5% overall) clicked a registration link.

We can then use Mixpanel’s segmentation feature to segment on various properties to see how they impact the funnel. For example, here’s what segmenting on Browser looks like:

Screen Shot 2016-08-04 at 9.32.16 AM.png

We can see that 27% of Chrome searchers click on a search result compared to only 18% of iOS Mobile visitors. We could also segment on other properties that Mixpanel’s tracking client automatically collects such as the visitor’s country, which search engine he or she came from, and most importantly for our purposes here, custom event properties.

Passing Variations as Custom Event Properties

Segmenting on a property like the visitor’s country is very similar conceptually to segmenting on which A/B test variation a user sees. In both cases we’re breaking down the funnel to see what impact the property value (each country or each variation) has on the rest of the funnel.

Consider a toy A/B test where we’re running an A/B test to measure the impact of the homepage’s background color on sign ups.

When the visitor lands on the homepage, we fire a Visited Homepage event with a abtest_variation property set to the name of the variation the user sees:

With this in place, you can then set up a funnel such as:

  1. Visited  Homepage
  2. Signed Up

Then segment on abtest_variation to see what impact each variation has on the rest of the funnel.

In the real world, you’re not going to have white hardcoded like it is in the code snippet above. You’ll want to make sure that whatever A/B test variation the user is assigned to gets passed as the variation property’s value on that tracking event.

Further improvements

The setup above should work fine for your v1, but there are several ways you can improve the setup for long term testing.

Pass the test name as an event property

I recommend also passing an abtest_name property on the event:

The advantage of this is that if you’re running back to back tests, you’ll be able to set up your funnel to ensure you’re only looking at the results of a specific test without worrying that identically-named variations from earlier tests are impacting the results (which would happen if you started a test the same day a previous test ended). The funnel would look like this:

  1. Visited Homepage where abtest_name = homepage test 3
  2. Signed Up

Then segment on abtest_variation like before to see just the results of this A/B test.

Generalize the event name

In the examples above, we’re passing the A/B test details as properties on the Visited Homepage event. If we’re running multiple tests on the site, we’d have to pass the A/B test properties on all of the relevant events.

A better way to do it is to fire a generic A/B test event name with those properties instead:

Now the funnel would look like this:

  1. Assigned Variation where abtest_name = homepage test 3
  2. Signed Up

Then segment on abtest_variation again.

To see this in action, check out Calypso’s A/B test module (more on that module in this post). When a user is assigned an A/B test variation, we fire a calypso_abtest_start event with the name and variation:

We can then analyze the test’s impact on other events using Tracks, our internal analytics platform.

Benefits

The nice thing about using an analytics tool to analyze an A/B test is that you can measure the test’s impact on any event even after the test has finished. For example, at first you might decide you want to measure the test’s impact on sign ups, but later decide you also want to measure the test’s impact on users visiting your support page. Doing that is as easy as setting up a new funnel. You can event measure your test’s impact on multiple steps of your funnel because that’s just another funnel.

Also, you don’t have to litter your code with lots of conversion events specific to your A/B test (like how A/Bingo does it) because you’ll probably already have analytics events set up for the core parts of your funnel.

Lastly, if your analytics provider provides an API like Mixpanel you can pull in the results of your A/B tests into an internal report where you can also add significance results and other details about the test.

If you have any questions about any of this, don’t hesitate to drop me a note.

Transferring ROMs to RetroPie

contra-3-the-alien-wars.png

I recently bought a Raspberry Pi and configured it to play some of my favorite oldschool SNES video games. Transferring the video game ROMs over to the Raspberry Pi was one of the more confusing aspects of the setup so in this post I’ll share the steps I took to do it.

Obtaining ROMs

There are two main ways for obtaining ROMs:

  1. The legal way: buy a device that lets you create ROMs from your physical game cartridges. More on how to do that in this ArsTechnica article.
  2. The not-so-legal-way: go on ThePirateBay and download a torrent containing a library of ROMs that others have created.

Regardless of which way you go, in the end you should end up with one or more ROMs on your computer:

Screen Shot 2016-08-02 at 9.40.10 AM.png

Transferring the ROMs to your Raspberry Pi

There are a bunch of ways to do this: USB, SFTP, scp, and more.

I have a great Mac app called Transmit that provides SFTP functionality which made it my go-to choice for performing the transfer.

Simply set up a new favorite with your Raspberry Pi’s credentials:

Screen Shot 2016-08-02 at 9.42.48 AM.png

Then connect and transfer the ROMs from your computer to the appropriate subdirectory in the Raspberry Pi’s RetroPie/roms directory. For example, this Contra III ROM is an SNES ROM so I transferred it into the RetroPie/roms/snes directory:

Screen Shot 2016-08-02 at 9.46.05 AM.png

After the ROM is transferred, restart the RetroPie (Menu > Quit > Restart System), select the appropriate gaming system (Super Nintendo in this case), find the ROM in the game list, and you’re ready to play.

Enjoy!

contra.jpg

Analyzing the impact of an email campaign on user behavior

We recently launched an email campaign for WordPress.com where we sent out an email to a large number of free, active users to promote one of our paid plans. I helped analyze the results of the campaign and wanted to share a few lessons learned.

A first pass at the high level funnel looks something like this:

  1. We send the email to a user
  2. The user opens the email
  3. The user clicks a link in the email
  4. The user makes a purchase

However, unlike measuring a website funnel where it’s a lot more straightforward to track users moving from one step of your funnel to the next, the email medium adds a lot of complexity.

For example, email open rates aren’t reliable. The way open rate tracking works is that the email includes an image tag (typically a 1 by 1 transparent pixel) whose source is a URL that is unique to that specific email sent to you. When your email client loads that image, whoever sent it then can figure out that you opened the email because they can see that a request was made for that image. The problem is that some email clients don’t automatically load images so a user can read the email without you ever knowing about it. Because of this, you shouldn’t have opening the email as a step in the funnel because it’s possible for users to view your email without you knowing about it which would incorrectly eliminate them from the remaining funnel steps.

We don’t use MailChimp, but they do something neat where if a user clicks on a link in the email, they’ll automatically factor that into the open rate. That way if the user’s email client blocks the tracking pixel from loading, that person will still get counted in the open rate. But there’s still the issue of people who open the email but don’t click a link.

Ok, so lets remove that step from the funnel and see where that gets us:

  1. We send the email to a user
  2. The user clicks a link in the email
  3. The user makes a purchase

Better, but there are still issues. What happens if a user reads the email, doesn’t click the link, but winds up going to your site on their own and buying whatever you’re promoting? This is not an edge case either: many people read email on their phones, but switch over to their desktop to make a purchase. Maybe they’ll start that process by clicking the link in your email, but maybe they won’t. Therefore, we should remove that step as well.

Now we’re at:

  1. We send the email to a user
  2. The user makes a purchase

If you’re promoting a product that is only available to users who were sent the email, this could work. The problem arises when users who don’t receive the email can also purchase the same product.

Let’s say we sent the email to 1,000 users and 50 made a purchase. But how many of those users would have made a purchase anyway? What you really want to know is how many more people made a purchase because of the email.

To do this, you’ll need a holdout group. So you identify 2,000 users before you start who meet a certain criteria then you send the email to 1,000 of them and compare their purchase rate to the purchase rate of the ones who were not sent the email. Maybe 50 of the users who were sent the email made a purchase, but only 40 who weren’t sent it made a purchase. Ignoring statistical significance for the sake simplicity, you then have a 5% purchase rate for those who were sent the email vs a 4% purchase rate for those who weren’t so you could say the email resulted in a 20% lift. Nice.

We ran into another issue analyzing this on our first pass because we were only looking at the purchase rate of users after they were sent the email and comparing it to the purchase rate of everyone in the holdout group. The problem was that we sent out the emails over the course of several days so we wound up excluding purchases made by those users before they were sent the email. This artifically decreased the purchase rate for users we sent the email to and made it seem at first like our email resulted in fewer purchases. Once we realized this, we adjusted our queries to look at purchases made over the same date range (including purchases made by some users before they were sent the email) to give us an apples to apples comparison.

It’s interesting to consider how deep you can go on this type of analysis. For us, the analysis ended around here. But what else makes sense to look at? Should we try to look at the long term effect of the email on user behavior? What about short term: what did users do on the site immediately after being sent the email? What about the time distribution between being sent the email and making a purchase? This rabbit hole can go pretty deep, but you may reach a point of diminishing returns where additional analysis doesn’t yeild additional insights.

If you’ve done this type of analysis before, I’d love learn about what other metrics you look at.

Podcasts I’m listening to: July 2016 edition

Until recently, the only two podcasts I listened to were the Tim Ferriss podcast and Zen Founders podcast by Rob and Sherry Walling.

Tim’s interviews are top notch and I’ve learned a lot from them, but I’ve been listening to it less and less as time goes on. I think a big part of that is that I often finish listening to an interview and wind up with this feeling like I’m not doing enough with my life. For example, Tim recommends using the question “Am I working on something that I’ll be remembered for in 200 years?” to guide your efforts – not because being remembered is the goal, but because if you’re working on something at that scale that it’s probably going to be something really important. Maybe it’s because of the arrival of my daughter a few months ago, but I find myself caring less about professional ambition and more about family ambition.

Which brings me to Zen Founders, a podcast about building startups and balancing that with your family. I don’t have any plans to start another startup, but the discussions really resonate with me and Rob and Sherry are doing a huge service to the founder community through the podcast.

There are two other podcasts I recently started to listen to as well:

Revisionist History by Malcom Gladwell. I’ve listened to all of Gladwell’s books and am a huge fan of his writing. If you like the kind of insightful storytelling that he’s so well known for, definitely check out this podcast. The episode titled The Big Man Can’t Shoot and the the three-part series on higher education in America are great places to start. I learned about this podcast though Tim’s recent interview with Malcom.

Exponent by Ben Thompson and James Allworth where the two of them discuss Ben’s writing on Stratechery, a blog about how tech, society, and how the internet is fundamentally changing how the world works. The podcast and Stratechery will give you a new lens to understand what’s happening in the tech world. Highly recommended.

Any recommendations for other podcasts to check out?

In Search of Unhackable North Star Metrics

In my last post, I wrote about why focusing too much on conversion rates alone is a bad idea because conversion rates can easily be manipulated by changing the quality of traffic to your site.

It got me thinking about whether there are metrics that can’t be manipulated. The key for such a metric is that improvements to it have to always be correlated with the health of the site or business. Put another way, if you can artifically inflate a metric or an improvement to that metric could actually be a bad thing, then it wouldn’t qualify; it wouldn’t be a so-called north star metric.

For example, a homepage to purchase conversion rate wouldn’t work. You can improve your conversion rate by dropping search traffic, reducing the price of your product, and more – all of which could actually be bad for your business.

It gets interesting when you start making the metric more and more specific to overcome ways it can be manipulated. What if instead of looking at your overall homepage to purchase conversion rate, you just looked at the conversion rate of direct traffic? That’s better, but it can still be manipulated by increasing your offline advertising which could change the quality of the direct traffic and therefore your conversion rates.

I go back and forth about whether it’s even possible. Like, will there always be a way to improve a metric but have it actually be bad for the business? Or is it possible to avoid manipulation by being really, really specific about the metric?

I’d love to hear from folks on this topic, especially if you have a metric in mind for your site/product/business that seems unhackable.

An impractical guide to doubling your conversion rates

Let’s imagine a standard web app with three main steps:

  1. Viewed Homepage
  2. Signed Up
  3. Purchased

Furthermore, lets say 20% of the visitor to the hompage sign up, and 5% of the users that sign up complete a purchase giving you a 1% overall conversion rate.

Your boss comes to you and says “You need to double the overall conversion rate from 1% to 2%. Do whatever it takes.”

As a thought experiment, consider at a high level what the numbers have to be for this to work out. Your first thought might be to double the homepage conversion rate from 20% to 40% (giving you a 40% * 5% = 2% overall conversion rate) or double the purchase conversion rate (20% * 10% = 2% overall conversion rate). You could also increase both by some smaller amount to get a similar result: 30% * 6.67% = 2%.

In the real world, this winds up being really hard. If you increase the percentage of people signing up, it’s probably going to decrease the percentage of people who then purchase. Why? Consider what would happen to a site that has a pricing section on its homepage and then removes it completely. More people will sign up, but once they do and see the pricing, many of those extra people you got to sign up (because they thought your service was free) will leave (because they realized after signing up that your service actually isn’t free). So if you increase the sign up rate to 40% somehow, your sign up to purchase conversion rate might drop in half from 5% to say 2.5% giving you that same 40% * 2.5% = 1% overall conversion rate. If you’re lucky some of those extra users will convert and maybe you’ll get 40% * like 2.8% =  1.12% overall conversion rate. Getting closer to that 2%, but still a long way away.

The trick to getting to 2% is to improve the quality of the traffic at each step.

Consider that homepage conversion rate of 20%. How could you increase that without making any changes to your website?

That 20% conversion rate is actually a composite of different segments. For example, imagine your website has three traffic sources:

  1. Direct traffic (50% of your traffic) converts at 30%
  2. Search traffic (40% of your traffic) converts at 10%
  3. Social traffic (10% of your traffic) converts at 10%

50% * 30% + 40% * 10% + 10% * 10% = 20% homepage conversion rate.

What would happen to your conversion rate if you delisted your site from search engines completely? The numbers now become:

  1. Direct traffic (83.3% of your traffic) converts at 30%
  2. Social traffic (16.7% of your traffic) converts at 10%

83.3% * 30% + 16.7% * 10% = 26.7% homepage conversion rate.

Without making any changes to your site itself, you increased the homepage conversion rate by 26.7%/20% – 1 = +34%. Because the quality of your traffic has improved, the sign up to purchase conversion rate will likely increase as well. Instead of 5% upgrading, it might wind up being 8% (+60%). What’s your overall conversion rate now? 26.7% * 8% = 2.15%!

So in order to double your conversion rate in this made up example all you had to do was delist your site from search engines. Mission accomplished 🍻. As an added bonus many of your other metrics will increase as well. Because your site’s traffic is of a higher quality, your retention rates will go up, your churn will go down, your average revenue per user will go up, your refunds will go down, and more.

You could also do similar hacks where you make your website non-mobile-friendly so Google decreases its mobile rankings, which (assuming mobile converts more poorly than desktop users) would increase your conversion rates. You could block certain poorly converting browsers from visiting your site (I’m looking at you, IE). You could block users from poorly converting countries, or even non-English language users if your site isn’t translated.

Of course you should never do any of these things.

In the process of improving your metrics by hacking off a large portion of your traffic, you’ll also wind up decreasing the number of people who make it all the way through the funnel (purchasing in this case).

It’s tempting to want to focus on a single metric like conversion rate, but it’s also important to remember that individual metrics can almost always be artificially boosted:

  • You can increase your conversion rates by preventing low converting traffic from reaching your site (at the expense of revenue)
  • You can increase your revenue by increasing paid advertising (at the expense of profit)
  • You can increase your profit by laying off a bunch of employees (possibly at the expense of your long term growth and profitability).

Instead, try to identify the set of metrics that are most important to your company and pay attention to the group as a whole. More often than not when one metric goes up, another will go down but with solid execution and a little luck, the overall impact of your changes will be a net win for your website or company.