I recently started working on a small side project to build an A/B test simulator. My goal is to measure the long term conversion rate impact of different approaches such as:
- What impact do various significance levels (90%, 95%, 99%) have?
- Is it better to run more shorter tests with less statistical significance or fewer longer tests with more statistical significance?
- What’s the best strategy if your site does not receive a lot of traffic?
I have a preliminary script, but there’s still a lot of work to be done and I’d love to collaborate with somebody on it going forward. I think the results will run contrary to a lot of the current A/B test best practices and they could have a big impact on how people run tests in the future.
If you have experience running A/B tests, a stats background, or familiarity with Ruby, those skills will come in handy but they’re not required.
Drop me a note if you’re interested in working together on it.
Nice project. If I can lend a hand let me know :)
Consider asking an even more fundamental question too: is significance even the best way to measure test accuracy? What alternatives could be used that might produce a paradigm change in this space?
(Not because there is likely to be an answer, but because if you find one you will change the world)
A/B Test Simulator v1 – Matt Mazur