Prioritization for Data Analysts



For the past two years I’ve worked as a data scientist, first on the marketing team for and now on the growth team at Help Scout. In both of these roles I’ve been the sole data analyst on my team so I tend to have more work to do than I have time for. As a result, I spend a lot of time thinking about how to prioritize what’s on my plate. My goal with this post is to share some of what I’ve learned to help other data folks who are in similar roles.

The prioritization problem

Here’s a made up scenario:

Your manager asks you for help with a medium priority analysis that will take several days. Shortly after you begin working on it, a coworker pings you for help with an urgent request that will only take a few hours. A little while later, your boss’s boss asks you for help with a low priority one-day project.

You now have three requests from people with differing seniority each with a different duration and urgency – which do you work on next and why?

Ad hoc requests vs projects

At a high level, there are two types of data requests:

Ad hoc requests are questions that can be answered fairly quickly. They may take a few minutes or a few hours and the urgency can range from “drop what you’re doing” to “no rush at all”.

Projects take longer and usually have a higher impact on the business. For me, they range from a few days to (rarely) a few weeks.

While it may be tempting to only want to work on ad hoc requests or only on projects, the reality is that they’re both important and if you’re the only analyst, you’ll need to work them both into your schedule. For example, even if you’re working on a big project, you can’t simply ignore the urgent ad hoc requests that come up during the course of a week. Telling your CEO that you’ll get to his request in three weeks when you’re done with your project is not recommended.

Therefore my advice is to set aside about two hours each day to work on ad hoc data requests and spend the rest of your time working on your highest priority project. This will give you time to work on those longer, high impact initiatives but still give you time to work on the shorter requests as well.

I wouldn’t loop your manager into the prioritization of ad hoc data requests; he or she probably has better things to do than helping you decide “spend 30 minutes on this then spend an hour on that” etc etc. Use your judgment: work on whatever is the most urgent or will have the highest impact on the business. When in doubt, base it on the seniority of whoever is asking.

For projects, I definitely would recommend getting your manager’s help prioritizing what to work on. He or she help you decide where you can be the most impactful and when questions come up about why you’re not working on some other thing, it’s not just you who made that call.

I would only work on one project at a time. I’ve tried doing like 3 hours a day on one project, 3 hours a day on another, but all that’s going to do is make both take twice as long if not longer due to the frequent context switching.

When you meet with your manager, keep him or her up to date about where you’re at with your project. Often it’s hard to predict at the start how long a project will take. Something you thought might take a week will take a day. Something you thought would take three days will take two weeks. That’s just the nature of it. Keeping your lead informed.

Sometimes priorities will change. Your expected three-day project that is now in its second week may get put on the back burner in order to work on something else. It’s easy to get frustrated when this happens, but remember that you’re there to have the biggest impact on the business, not to finish projects just for the sake of finishing them. Sometimes what’s highest priority will change so be flexible.

When priorities change and you have to change projects midway, try to estimate what impact that will have on the completion of the original project. If you estimated you would complete your original project by the end of the week, but by switching projects it will now take an additional week, communicate that to the people waiting on the result of the original analysis.

Get clarity before beginning your analysis

Regardless of whether a request is ad hoc or a project, I’d recommending asking the requestor several questions:

  • How important is this?
  • Is there a deadline?
  • How accurate do you need the answer to be?
  • What type of end result are you looking for?

If you don’t ask about importance, you risk spending a lot of time on an analysis that is not that important to the person asking about it. It’s common for someone to ask a quick hey-I-wonder-about-this question, but they only care if you can find the answer quickly.

Similarly, ask whether it needs to be done by a specific date. If the answer is holding something up, it should get higher priority than something that doesn’t.

Asking about accuracy is something that took me a while to appreciate too. I’ve run into a lot of data requests that I can give a 95% accurate solution to fairly quickly, but that would take much longer to get to 100%. A lot of times, people will be fine with a 95% quick solution. Understanding when 95% is acceptable and when 100% is necessary can save you a lot of time.

Sometimes I also ask what type of end result the person is interested in. It often helps them and you clarify what the analysis is all about.

For longer projects, I’d also recommend periodically updating the person who requested it with your progress. As they see your progress, they might want to refine the request or may be happy with the results as-is, freeing you up to work on something else.

Staying organized

I’ve seen some data teams ask people to fill out a little template (usually in a Trello card) to help people ask good data questions. I prefer to discuss the details with the person, then create a Trello card for myself with the relevant information.

I would recommend keeping a list of all potential projects and ad hoc data requests. That is, don’t just try to keep it all in your head. I do this in Trello with four lists:

  • Multi-week Projects
  • Multi-day Projects
  • Multi-hour Projects
  • Quick Requests

Whenever someone asks a question, I get the details from them then create a new card and add it to the appropriate list.

You could also have a Completed list where you drag cards as you complete them to keep a record of it. I’ve also seen some teams have a list for each month or quarter (January 2018, February 2018, etc) and drag completed cards to the relevant list when it’s complete. I did that in the past, but tend to just archive the cards now to avoid cluttering up the Trello board.

Invest in a proper BI tool like Looker

One other big win has been switching from manual data analysis (using MySQL queries and R) to Looker, the premier Business Intelligence (BI) tool on the market today. I could go on for hours about how amazing Looker is, but at a high level it lets you:

  • Create dashboards that are automatically updated as the underlying data changes.
  • Use LookML, Looker’s modeling language, to teach Looker how our data fits together. With that in place, I rarely have to write queries by hand anymore. I can use Looker’s interface to quickly ask and answer questions.
  • This also means that other people at the company who may not have querying skills can answer their own questions without asking me every time.

I’d highly recommend checking out Looker if you still find yourself wrangling data a slow manual way.

If this post resonates with you, I’d love to connect

If you’re a data analyst, especially if you’re the only one at your company or part of a small team, I’d love to chat to learn more about what you’re working on, how you prioritize, etc. Drop me a note: Cheers!

Bookmarklet to Manually Set document.referrer

At Help Scout today I’m working on a tracking script to help us track where our visitors are coming from. One of the ways it identifies that is using document.referrer which returns the referring URL.

For example, if you go to Google, click on a link, then check out the value of document.referrer in the JavaScript console, it will tell you it was

> document.referrer

For testing the script I’m working on I needed a way to manually set the value of the referrer to quickly test that the script is handling it correctly.

Unfortunately you can’t directly overwrite the value:

> document.referrer = ""
> document.referrer

Good news is that there’s a hacky way you can do it by overwriting what document.referrer returns. After modifying the code slightly (setting configurable: true), I was able to turn it into a bookmarklet that prompts you for what you want to set the referrer to:

Then when you click this bookmarklet, you can set the new referrer value:

Screen Shot 2018-01-25 at 11.55.06 AM.png

And document.referrer will return the new value:

> document.referrer

Help Scout’s New Partnership with Fivetran

Today at Help Scout we’re excited to announce that we’re partnering with Fivetran, a fantastic service that makes it easy to centralize all of your data in a data warehouse. You can read my blog post about the partnership here: Performing Advanced Analytics on Your Help Scout Data with Fivetran.

The short version is that if you’re a Help Scout customer, you can sign up for Fivetran (for free!) and use it to get all of your mailbox data out of Help Scout and into a data warehouse like BigQuery. From there you can use SQL to analyze the data (on its own or combining it with your data from other services) or hook it up to a Business Intelligence tool for easy analysis and charting.

If this sounds interesting to you, check out the blog post and don’t hesitate to reach out if you have any questions about any of it:

Quantity and Quality Metrics

Something I learned from Help Scout’s former Growth Lead, Suneet Bhatt, is that for every quantity metric you should also strive to have an associated quality metric.

For example, one of the metrics we look at is the number of New Customers. We also track Initial Average New Users Per Customer. Together these help us understand not just how many new customers we’re bringing in, but how large they are.

  • Quantity Metric: New Customers
  • Quality Metric: Initial Average New Users Per Customer

New Trials is a quantity metric. A quality metric for it might be the % of trials that convert into customers. I’ve found with my own products that that it’s easy to have either the quantity or quality metric go up, but hard to have both. For example, with Preceden, my timeline maker tool, New Trials tends to plummet during the summer months when students are not in school (a lot of students use Preceden to create timelines for class projects). When this happens, the trial to paid conversion rate goes way up because without a large number of students signing up (who rarely convert into paying customers), the conversion rate for the remaining trials winds up being pretty good. Then in the Fall when school resumes, trials increase as students begin signing up again, but trial to paid conversion rates fall. If I could find a way to increase conversion rates as more students sign up, it would be huge – but way easier said than done.

  • Quantity Metric: New Trials
  • Quality Metric: Trial to Paid Conversion Rate

Another example from Help Scout: we track how many new employees we hire, but also measure what % leave within 1 year. If our hiring process works well, the % that leave within a year should be fairly low; if not, it will be high.

  • Quanity Metric: New Employees
  • Quality Metric: 1-Year New Employee Churn Rate

When you’re thinking about your own business’s metrics, try to come up with quality metrics for most quantity metrics. You’ll gain a deeper understanding of the health of the business which allows you to make better decisions about its future.


Identifying the Primary Keys for Deleted MySQL Records

At Help Scout today I had to perform an analysis that required me to identify the primary keys of deleted MySQL records.

For example, these results contain primary keys for records 1-5 but not 6-10:

The question is how to identify records 2, 3, 5, 7, and 8.

With the help of Stack Overflow, I found a neat solution that involves creating a temporary table of all possible primary keys, then left joining it on the original table to identify the missing ones. Here’s what it looks like, customized for my situation:

This solution does require that the keys be auto-incrementing, but that should be the case most of the time. Also, it assumes the last record is not deleted.

Happy querying!

Using Fivetran’s Google Sheets Integration to Get Monthly Unique Visitors Into BigQuery and Looker


At Help Scout we use Fivetran to get data from various data sources (like our internal MySQL database, HubSpot, and Salesforce) into BigQuery so that we can analyze it all using Looker.

One of the challenges I’ve run into is that even though Fivetran supports Google Analytics, the data is limited to what’s available via the Google Analytics API. One of the main metrics we want – Unique Visitors – is available, but only on a 1-day, 7-day, 14-day, or 30-day basis:

Screen Shot 2017-12-05 at 9.41.27 AM.png

This means that if we want to know how many Unique Visitors we had each month, there’s no way to get that out of the Google Analytics via its API, which means Fivetran can’t grab it and automatically put it into BigQuery, which means we can’t have it in Looker.

Enter Fivetran’s Google Sheets integration

Fortunately, there is a hacky way to do it by taking advantage of Fivetran’s Google Sheets integration. Big picture, we’re going to manually export monthly Unique Visitors from the Google Analytics UI, put that data into a Google Sheet, then configure Fivetran to pull that data out of the Google Sheet and put it into BigQuery so we can analyze it with Looker. Here’s how the setup works:

In our Google Sheet, we select the range of data we want then go to Data > Named ranges… to give it a name:

Screen Shot 2017-12-05 at 9.48.53 AM.png

We give it a name that begins with fivetran_ such as fivetran_monthly_unique_visitors_test:

Screen Shot 2017-12-05 at 9.51.44 AM.png

Next, we grab the Google Sheet ID from the URL:

Screen Shot 2017-12-05 at 9.54.01 AM.png

And paste that into Fivetran:

Screen Shot 2017-12-05 at 9.54.52 AM.png

Authorize Fivetran to access Google Sheets:

Screen Shot 2017-12-05 at 9.55.39 AM.png

Fivetran will then kick off the initial sync:

Screen Shot 2017-12-05 at 9.56.14 AM.png

And then shoot you an email when it’s done:

Screen Shot 2017-12-05 at 10.04.31 AM.png

Now over in BigQuery, we can check to verify the table is synced:

Screen Shot 2017-12-05 at 10.05.29 AM.png

Step 1, complete.

Next, lets create this chart in Looker

We’ll tell Looker we want to create a view from a table:

Screen Shot 2017-12-05 at 10.07.07 AM.png

I had to click the Refresh button to get the Google Sheets table to appear:

Screen Shot 2017-12-05 at 10.07.56 AM.png

Before tinkering with the view, here’s what the Google Sheet model might look like:

And here’s the view:

There is some hacky stuff going on here so let me explain what’s what:

  • In the Google Sheet, our dates are formatted like “1-Jan-16″. Looker has no idea what to do with that out of the box, so we need to use Standard SQL’s PARSE_DATE function to tell it how to parse it. The dimension expects a timestamp so we then need to cast the date to a timestamp. And because we’re only ever going to be using months here, we can get rid of the other date/time options leaving just months.
  • Similarly, the spreadsheet has numbers like “743,353″. This string isn’t a valid number, so we use REPLACE to eliminate the commas before casting it to an integer.
  • You might be wondering about the measure: in order to use this value in a chart, it has to be a measure, not a dimension. A hacky way to do that is just to use the measure type of max (though min or average or others will do too) which will just grab the maximum value for that date range – and since this value is the only value in the date range, it just returns the number of monthly uniques.

Back in Looker we can now create the chart:

Screen Shot 2017-12-05 at 10.24.28 AM.png

The main downside to this is of course that you have to manually update the Google Sheet if you want the chart to update, but for something like Monthly Unique Visitors, updating it once a month isn’t a big deal. Though if you can think of an automated way to do it, I’d be interested to learn how.


Automatically Updating HubSpot Contact Properties Using Data in Looker

At Help Scout we rely heavily on HubSpot to communicate with our customers.

In order to set up workflows and personalize those emails, we need to get data about our customers into HubSpot. In the past this usually required lots of manual CSV wrangling or involving one of our engineers who would update some backend code to push the relevant data over to HubSpot using their API.

Several months ago we started using Looker, a mind-blowingly powerful Business Intelligence (BI) tool, to analyze and report on data.

As part of a project this week, I did some research to figure out whether or not it’s possible to set HubSpot contact properties using the data we have in Looker. It turns out that it is possible, but there are some gotchas that took some time to figure out. This post is about exactly how to do it.

Big picture, we’re going to schedule a Look to send data to Zapier via a webhook, then use Zapier to update the contact properties in HubSpot. This guide assumes some familiarity with all three of these services.

Scheduling a Webhook in Looker

For this guide, lets imagine that we want to set a Plan Name contact property in HubSpot. We’ll need two data points: the contact’s email address and the plan name.

Here’s what the Look might look like:

Screen Shot 2017-12-01 at 2.12.29 PM.png

Note that I’ll just be setting the plan name for a single user in this guide, but this would work fine if the Look contained a list of users/plan names as well.

Looker lets you set up a set up a Schedule for a Look. You can do things like email yourself data on a regular basis, upload data to an FTP, or for our purposes here, post the data to a third-party webhook.

Before we can set this up though, we’re going to need a Zapier webhook URL to send the data to.

Setting up the Trigger in Zapier

Create a new Zap and select Webhooks by Zapier as the trigger app.

The trigger should be Catch Hook (Wait for a new POST, PUT, or GET to a Zapier URL):

Screen Shot 2017-12-01 at 2.17.53 PM.png

For the Set Up Options, it’s going to ask you whether you want to pick off a child key from the payload. Enter data there. This will tell Zapier to ignore the rest of the payload that Looker sends it and just grab the data (the emails and plan names):

Screen Shot 2017-12-01 at 2.19.37 PM.png

Next, copy the webhook URL that Zapier provides you:

Screen Shot 2017-12-01 at 2.19.58 PM.png

Head back to the Looker Schedule. Make these changes to the defaults:

  • Destination: Webhook
  • Address: paste the URL that Zapier provided
  • Format: Change it to JSON – Simple, Inline
  • Uncheck formatted data values
  • You can also change the other options like the schedule based on your requirements.

It should look something like this:

Screen Shot 2017-12-01 at 2.23.47 PM.png

To test it out, click the Send Test button at the top.

Back in Zapier, it should tell you that your test was successful:

Screen Shot 2017-12-01 at 2.24.57 PM.png

You can click on the view your hook link to view the data:

Screen Shot 2017-12-01 at 2.25.28 PM.png

It has the email and plan, just what we expected.

Next, lets tell Zapier what we want it to do with this data.

For the Step 2 action, choose HubSpot as the desination app. For the action, choose Create or Update Contact:

Screen Shot 2017-12-01 at 2.26.57 PM.png

Now here’s the neat part. For the Set Up Template step, we’re going to map the data from Looker over to HubSpot contact properties. For this example, I’ll map the Contact Email HubSpot property to the email address from the Looker data:

Screen Shot 2017-12-01 at 2.28.55 PM.png

Screen Shot 2017-12-01 at 2.29.41 PM.png

And then map the plan name HubSpot property to our data as well:

Screen Shot 2017-12-01 at 2.30.04 PM.png

Click Continue at the bottom of the page to move onto the testing step. Zapier will show you a summary of the mapping:

Screen Shot 2017-12-01 at 2.31.02 PM.png

Before you run the test, I’d recommend loading the contact’s details in HubSpot to confirm the property is not set (that way you can be sure if it’s set after you test it that it was a result of your test and not because it was previously set):


The Plan Name is blank, so now lets run the test.

Zapier will tell you it was sent:

Screen Shot 2017-12-01 at 2.36.19 PM.png

Reload the contact page, cross your fingers, and…


Voila! We successfully updated a HubSpot contact property by taking advantage of Looker’s webhooks and Zapier’s integrations.

From here, you could turn on your Zap to make it permanent and then just make sure to save the schedule for the Look so it happens automatically going forward.

Screen Shot 2017-12-01 at 2.40.37 PM.png

Troubleshooting Tips

If the HubSpot property isn’t set like you expect, there are two things to check out. First is the Task History in Zapier:

Screen Shot 2017-12-01 at 2.41.01 PM.png

Here you’ll be able to see whether Zapier received the data from Looker and whether it ran into any problems:

Screen Shot 2017-12-01 at 2.42.01 PM.png

If you don’t see the request there, it probably means that Looker didn’t send it over to Zapier. To check, head to your Settings page in Looker and go to the Schedule History section. You should see the details:

Screen Shot 2017-12-01 at 2.43.20 PM.png

I ran into an issue while testing this where Zapier wasn’t receiving the data because I had the Schedule option toggled that told Looker not to send the data if nothing had changed. Because I had already tested it, Looker didn’t fire the request. This page will tell you if something like that happened.

That should do it! If you run into any trouble setting this up for your company, don’t hesitate to drop me an email.