I spend more time than I’d like to admit investigating why metrics that should be identical vary from one tracking source to another. Sometimes the difference is due to bugs, but often just due to the nature of how the tracking system works. It’s also given me an appreciation for just how difficult it is to get truly precise metrics.
For example, consider this simple example:
You run a site that where people sign up for an account and then create a post. What percentage of accounts create a post?
Sounds pretty simple, right? Here are two possible approaches and where they can lead your metrics astray:
Using your database to figure out the answer
With this approach, we look at how many accounts there are, figure out how many of them have a post, and calculate the percentage that way. If there were 1,000 accounts created in August and 200 of them have a post, 20% have created a post (yay math).
But… what about users who create a post then delete the post? It’s possible that 20 people wound up creating a post then deleting it, so the “true” number of accounts that created a post is actually 220, but your database only reflects 200 posts so you’ll wind up reporting 20% instead of 22%.
What about users who deleted their account? Instead of 1,000 accounts there were actually 1,100 accounts, but 100 of their owners wound up deleting their account. Unless you set up some special tracking for this, you’re now in a spot where you don’t know exactly how many accounts were created or how many of them created posts.
If your account records have a unique id that increments by 1 each time a new one is created you could figure out the number of accounts relying on that, maybe.
One way around this is to not let people delete their posts or accounts. Maybe you tell them their post was deleted, but don’t actually delete it from your database. Just set a “deleted” flag on the database record and don’t show it to the user. Same thing for accounts. But what happens if they think they deleted their account then try to sign up again with the same email? Will you tell them that their account already exists even though supposedly you deleted it? There are technical solutions around this, but things are getting pretty complicated already.
Using an analytics tool to figure out the answer
Another approach is to use a tool like Mixpanel or KISSmetrics to try to answer the question. Set up a funnel to measure the conversion rate from your Sign Up event to the Publish Post event.
The good news here is that if users delete a post, it won’t impact the conversion rate that the funnel reports.
The bad news is that if they create additional accounts, they’ll only count once in the funnel because funnels try to measure the actions of people, not accounts.
One possible solution
If I really, really wanted to answer this question precisely, I’d set up a new database table that keeps track of accounts and whether they’ve created a post. These records wouldn’t be impacted if the post or account gets deleted so we can trust that the data is accurate.
Alternatively, you can simply redefine your metric so that you can be precise: instead of “what % of accounts created a post?” you ask “what % of non-deleted accounts have a non-deleted post?”. Not very elegant, but far easier to answer than the original metric.
Does perfect precision matter?
I’d argue that for most metrics (with the exception of revenue metrics), being perfectly precise is not critical. The metrics are probably fine as long as they’re close to the true value and that the way you calculate it is consistent over time.
tl;dr: data analysis is fun :).