I recently learned something new about how retention rate calculations are performed and wanted to share. Consider this situation:
A user signs up for your service at 11:59pm on Tuesday, does something at 12:01am on Wednesday, leaves and never comes back again.
Should this user count towards your D1 retention rate?
When I initially built Retentioneer, a retention rate analysis script, it would say yes, the user should count towards the D1 retention rate because he was active on Tuesday then again on Wednesday.
But I realized that way of performing the calculation would inflate the D1 retention rate because it’s very common for a user to sign up for a service, try it out for a few minutes or an hour, then leave and never come back. Any of these users that overlapped midnight UTC would wind up inflating the D1 retention rate. In some test data I analyzed, D1 retention rate was 7% when D1 meant a user was active on adjacent calendar days, but only 2% when D1 meant users had to perform an event 24 – 48 hours after signing up (the magnitude of the difference will vary based on the type of service).
I decided the latter was a better reflection of the true retention rate, updated the script to perform the calculations that way, shipped it, and thought that was the end of it.
At some point though I got this idea in my head that most other analytics services don’t calculate retention rate this way because it would be too slow and that if a user is active at Tuesday and again on Wednesday, they are counted as retained. I wanted to standardize my script with these other services so it has been on my todo list for a while to revert Retentioneer to use the original calculation method.
But… turns out other analytics services do the calculations same way.
From Mixpanel’s How is retention calculated?:
Likewise, in daily retention, in order to be counted in the bucket marked 1, the customer must send whatever event we are looking for in the retention some time between 24 and 48 hours after he sent the cohortizing event.
And from Amplitude’s Retention: General and Computation Methods:
A user is counted as “next day” retained if they perform any event on at least the 24th-incremented hour. For instance, if a user performs their first event on Dec 1st, 05:59pm, the user is counted as Day 1 retained if they perform an event on Dec 2nd, 05:00pm, and Day 2 retained if they perform an event on Dec 3rd, 05:00pm.
Funny enough, there’s actually a note at the bottom of that page that explains Amplitude previously used calendar dates to perform the retention calculations:
Note: these computation methods only apply as far back to August 18, 2015. Any retention computations that include dates before August 18, 2015 will be computed by calendar days/weeks/months.
Awesome to see that they realized the issue and made the change.
To sum it up: a user should only count as D1 retained if they’ve been active 24 to 48 hours after performing the initial event. If you make the calculation using calendar days you’ll inflate the D1 retention rate.
Hat tip to my coworker Meredith for the always insightful discussions around this as well as Jan Piotrowski and Hrishi Mittal for their feedback on Twitter.