If you’re working with Mixpanel data, it’s important to keep in mind that current_url
property represents the the full URL including query parameters and fragements. Here are a few examples, screenshots courtesy of the Chrome Mixpanel Debugger Extension:
https://www.helpscout.net/blog/
https://www.helpscout.net/blog/?utm_source=example
https://www.helpscout.net/blog/?utm_source=example#fragment
You get the idea: The current_url
property is the exact URL as displayed in the browser, query parameters and all.
This is an issue because if we’re analyzing the data to determine the number of unique visitors to a page, the query paramters and fragments shouldn’t have an impact. These should all be treated as the same page:
- https://www.helpscout.net/blog/beacon-preview-ui/
- https://www.helpscout.net/blog/beacon-preview-ui/?utm_content=1234&utm_medium=social&utm_source=twitter
- https://www.helpscout.net/blog/beacon-preview-ui/#example
Without adjusting the query to take into account that these are all the same page, we’d wind up undercounting the number of visitors to it because we’d only wind up counting the version without query paramters or fragements.
Take a look at just a fraction of the URLs our visitors had when viewing a single blog post (this Mixpanel data is in BigQuery thanks to Fivetran’s Mixpanel connector):
Thanks to our marketing efforts, we actually had more visitors to the post with query parameters than there were to the same post without URL parameters.
Fortunately, removing the URL parameters and fragment is relatively straightforward using Standard SQL’s STRPOS
AND SUBSTR
:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
SELECT | |
current_url, | |
CASE | |
WHEN STRPOS(current_url, "?") > 0 THEN SUBSTR(current_url, 0, STRPOS(current_url, "?") – 1) | |
WHEN STRPOS(current_url, "#") > 0 THEN SUBSTR(current_url, 0, STRPOS(current_url, "#") – 1) | |
ELSE current_url | |
END AS url | |
FROM ( | |
SELECT 'https://www.helpscout.net/blog/beacon-preview-ui/' AS current_url | |
UNION ALL SELECT 'https://www.helpscout.net/blog/beacon-preview-ui/?utm_content=1234&utm_medium=social&utm_source=twitter' AS current_url | |
UNION ALL SELECT 'https://www.helpscout.net/blog/beacon-preview-ui/#example' AS current_url | |
); | |
+———————————————————————————————————-+—————————————————+ | |
| current_url | url | | |
+———————————————————————————————————-+—————————————————+ | |
| https://www.helpscout.net/blog/beacon–preview–ui/ | https://www.helpscout.net/blog/beacon–preview–ui/ | | |
| https://www.helpscout.net/blog/beacon–preview–ui/?utm_content=1234&utm_medium=social&utm_source=twitter | https://www.helpscout.net/blog/beacon–preview–ui/ | | |
| https://www.helpscout.net/blog/beacon–preview–ui/#example | https://www.helpscout.net/blog/beacon-preview-ui/ | | |
+———————————————————————————————————-+—————————————————+ |
With that adjustment made, you can easily count the number of page views or unique visitors to a given page, regardless of URL parameters or fragements.