Friday Updates / 2020-09-11

What I’m working on

At Help Scout, a customer using Beacon wrote in asking about a warning that Chrome was displaying on their site:

The root cause seemed to be some attribution tracking cookies that were set by a script I worked on, so the support request made its way to me. Digging into it, we can see a lot of Help Scout cookies (including attribution tracking ones like _A_FirstTouchURL) are being set when Beacon is loaded on the customer’s site:

Despite having worked a lot with cookies and tracking scripts in the past, I was fuzzy on how these cookies were being set. Why, for example, were there Mixpanel and Google Analytics cookies when Beacon wasn’t loading them? Turns out that because the customer is loading Beacon from a helpscout.net domain, all of the helpscout.net cookies are made available on their site unless they’re explicitly set the cookie not to be passed along.

This can be achieved by setting the SameSite cookie attribute to Strict. In this case, there’s no need for most of these helpscout.net cookies to be passed along to third party sites loading Beacon, so we’re going through the steps to mark the cookies as strict when possible. Some cookies on helpscout.net are set by third party scripts and can’t be set to strict, so some amount of cookies getting passed along is inevitable, but at least we can minimize it.

This week I knocked out two DataCamp courses in my machine learning adventures: Hyperparameter Tuning in Python and Introduction to Natural Language Processing in Python. Typically I only am able to finish one course per week, but a lot of the material in these was covered by other courses so it made these courses fairly quick to work through.

One new thing was the introduction of TPOT, a tool that uses genetic programming to find an optimal classifier and hyperparameters for a given data set.

Long time readers may remember my Evolution of Color project in 2014. It uses genetic algorithms to evolve a population of colors towards a goal color. Unlike most of the other Emergent Mind projects that were I re-implemented other people’s projects in JavaScript, the Evolution of Color was completely original, making it one of the projects there I’m most proud of.

I hope as I continue down this machine learning path I get more opportunities to work with genetic algorithms.

And on that note, I’ve been mulling over where to focus in the coming months to keep leveling up my machine learning skills. Currently my plan is:

  • Finish the 11 remaining machine learning DataCamp courses by end of the year
  • Spend a few months in early 2021 focusing on Kaggle competitions to gain experience applying what I’ve learned
  • Figure out how to integrate machine learning features into a web application, whether it be for Preceden, a new app, or maybe even Help Scout if there’s an opportunity. Apparently deploying ML applications is quite difficult though. We will see.

What I’m watching

My wife and I just finished watching Umbrella Academy on Netflix:

It’s about a dysfunctional family of superheroes that need to work together to save the world. I enjoyed it and would recommend.

Product Recommendations

I’m in a mastermind group with Tom Davies and Jason Rudolph. Tom has two popular Shopify apps, Best Sellers and Flair, and Jason has BuildPulse, a tool that helps automatically identify flaky tests (automated tests that randomly pass sometimes and fail other times).

If you run a Shopify store or your team is banging their head against the wall dealing with flaky tests, their tools are definitely worth checking out.

What else

I’m on day 26 of a 28-day keto challenge. I’m participating through the Wearable Challenge initiative which basically means I’ve been wearing a Continuous Glucose Monitor (CGM) on my arm and for every day that I stay below a certain blood glucose limit I get $25 on my own money back. I’ve lost about 5 pounds and feel pretty healthy, but probably won’t continue keto after the challenge ends in a few days. I’ll write more about this whole thing in a separate post.

Hope everyone’s doing well ๐Ÿ‘‹.

Friday Updates / 2020-09-04

I’m going to try a little experiment for a few weeks and post a short update every Friday about what I’m up to.

I enjoy writing and used to blog here a lot, but family, work, and projects tend to take up a lot of my time these days so I haven’t prioritized blogging for a long time. These short updates are a way to get me back into writing and also serve as a way to stay closer with you all. I’m time-boxing myself to about an hour every Friday to write these so we’ll see how they turn out :).

What I’m working on

At Help Scout, I’ve spent a lot of time recently helping with a big Go To Market (GTM) stategy project. A GTM strategy is a comprehensive plan for launching a new business or growing an existing one. Help Scout has been around for more than 9 years so we have a lot of data that we can use to understand what industries our product resonates in, how features are being used, who the buyers are, how our customers grow over time, etc. All of this can be used to help inform our GTM strategy for the coming years.

At Preceden, I addressed an issue this week stemming from a bug feature in wkhtmltopdf, a tool that converts web pages to PDF files. It’s what Preceden uses to let users export their timelines as PDFs. The problem with it though is that if a webpage contains an image that points to an invalid (404) image, the PDF conversion completely fails. This happens with Preceden because users can paste an image URL into an event’s notes and Preceden will try to display the image. But if the URL doesn’t point to a valid image, the export fails and I get support emails like this:

The link above refers to a World Civilization Preceden timeline that I want to download, but for some reason each time I try, an error message shows “There was an unknown error while processing the download. Please contact support.” I tried on multiple devices, but the same problem persists.

In the past I’ve solved this by parsing the image URLs from event notes and using the FastImage gem to verify that each one is valid. It then caches the results (valid or not valid) for each image and doesn’t attempt to render the image if it’s invalid. Problem is, I permanently cached the results of the validation check. For a timeline that’s been around for years, sometimes the previously valid image URLs become invalid. As a result, a timeline that once exported successfully might eventually start failing, leading to frustration and support tickets.

I addressed it this week by introducing some code that revalidates all of the image URLs in a timeline if an export fails.

Sometimes building a SaaS product is sexy and exciting, but often it’s fixing random issues like this.

I’m still continuing to learn machine learning, probably spending 10-15 hours/week on DataCamp, writing documentation for myself, and building small projects. This week I finished a fantastic course on Dimensionality Reduction in Python. The big idea with dimensionality reduction is that you can often reduce or simplify the inputs to a machine learning model to improve it’s performance and how well it generalizes to new data.

What I’m reading

Stumbled across Ethereum is a Dark Forest on HackerNews. I own a small amount of Ethereum but honestly have no idea how it all works behind the scenes. This article offers a glimpse of what’s going on under the hood. Eventually I’d love to dive into it more.

In that article the author mentions he enjoyed The Dark Forest, a sequl to Cixin Liu’s popular Three Body Problem novel. I read the latter a few months ago and seeing the sequal praised so highly in this post made me pick it up and start reading. Been enjoying it so far.

What else

  • My son is wrapping up his third week of virtual kindergarten today. I give the school a ton of credit, they’re really doing the best they can with it. Just sad that he’s not getting to experience kindergarten like he would in a world without Covid.
  • Speaking of virtual, I attended Microconf’s one day online conference earlier this week. I attended the live ones maybe 5 times in the past, but haven’t been for a few years. Kudos to Rob, Mike, Xander, and the rest of the team for organizing the virtual one this year.
  • If you’re trying to come up with a bilingual baby name, check out MixedName.com, a new baby name generator from a buddy of mine, Bemmu Sepponen. The service recently got a ton of attention and praise on Reddit and HackerNews.
  • And last but not least, I want to recommend Hey, the new email service from Basecamp. I was skeptical at first, but having used it for a few weeks now I’m a huge fan. The combination of their screen out feature and ability to categorize senders as Paper Trail or Feed have drastically reduced the amount of emails I’m exposed to each day, leaving me more time to focus on important things.

I hope this email finds you all well. If anyone wants to catch up sometime, I’d love to jump on a call. Drop me a note at mazur@hey.com.

Cheers!

Learning Data Science: 3 Months In

At the end of April I decided to take a break from Preceden and start using that time to level up my data science skills. I’m about 3 months into that journey now and wanted to share how I’m going about it in case it’s helpful to anyone.

Data Science

Data science is very broad and depending on who you ask it can mean a lot of different things. Some folks would consider analyzing data in SQL or Excel as data science, but to me that’s never felt quite right. I prefer a definition that leans more on writing code that makes use of statistics, machine learning, natural language processing, and similar fields to analyze data.

Python

Going into this I had a lot of programming and data analysis experience, but hadn’t done much with Python and barely knew what regression meant.

I considered continuing to learn R which I already have some experience in, but I’m not a huge fan of R so decided to start fresh and learn Python instead. Having used Ruby extensively for Preceden and other projects has made learning Python pretty easy though.

DataCamp

DataCamp is an online learning platform to help people learn data science. They have hundreds of interactive courses and tracks for learning R, Python, Excel, SQL, etc. If you have the interest and time, the $300/year they charge for access to all of their courses is nothing compared to the value they provide.

I’ve been making my way through their Machine Learning for Everyone career track which starts off with a basic introduction to Python and quickly dives into statistics, supervised learning, natural language processing, and a lot more.

Screen Shot 2020-07-29 at 9.21.24 AM.png

Each course is a combination of video lectures and interactive coding exercises:

Screen Shot 2020-07-29 at 9.44.11 AM

Screen Shot 2020-07-29 at 9.44.52 AM.png

The courses are really well done and I feel like they’re giving me exposure to a broad range of machine learning topics. I wouldn’t say the courses go deep on any particular topic, but they provide great introductions which you can build on outside of DataCamp.

So far I’ve completed 10 out of 37 courses in this career track + 2 additional Python courses that were not in the track but recommended prerequisites for some of the courses that are in the track.

If you pushed through a course it might take 4 hours to complete, but I’m probably spending 10-15 hours on each course (so about 1 course/week). This is because I spend a lot of extra time during and after the course writing documentation for myself and trying to apply the material to real-world data to learn it better.

Documentation

Every time I stumble across a new function or technique I spend some extra time researching it and documenting it in a public Python Cheat Sheet GitHub repository.

At first I was doing writing notes in markdown files, but have since gotten a little savier and am doing them in iPython Notebook files now. Here’s a recent example of documentation I wrote about analyzing time series.

Screen Shot 2020-07-29 at 9.28.57 AM

I usually try to come up with some super simple example demonstrating how each function works which helps me learn it better and serves as an easy reference guide when I need to brush up on it when applying it down the road.

Real World Projects

For each course, I also try to apply the material to some real world data that I have access to, whether it be for Help Scout or Preceden.

For example, after DataCamp’s supervised learning course I spent some time trying to use Help Scout trial data to predict which would convert into customers.

For any projects involving Help Scout data, I usually share a short writeup afterwards in our metrics Slack channel as a way to help educate people on data science terms and techniques:

Screen Shot 2020-07-29 at 9.36.56 AM.png

Books

I’ve also picked up a few books which I’ve found to be excellent resources for learning matrial in more depth.

YouTube

You can search YouTube for almost any data science topic and find dozens of videos about it. The quality varies, but I’ve found that watching a few on any topic are usually enough to fill in any major gaps in my understanding.

For example, last week I was working through DataCamp’s course on time series analysis and having trouble with a few concepts. A quick search on YouTube for videos on autoregressive models turned up this video which cleared things up for me:

Kaggle

After DataCamp’s course on supervised learning I spent a lot of time trying to apply it to Kaggle’s Titantic Survival data competition.

Screen Shot 2020-07-29 at 10.01.01 AM

Breaking 80% accuracy is super hard ๐Ÿ˜ฌ

The public notebooks that other people have shared are fantastic learning resources and in the future I want to spend a lot more time trying these competitions and learning from the work others have done.

What’s Next

At the rate I’m going I should be through DataCamp’s machine learning track before the end of the year which will be a nice milestone in this journey. Along the way I’ll continue trying to apply the material to real world problems and hopefully wind up somewhat competent with these techniques when all is said and done. We shall see!

What’s Up

It’s been a while since I’ve written on this blog, so wanted to take a break from my normal routine to say hi to you long-term readers and share a few updates about what’s going on in my world.

Work-wise, I’m very fortunate to not have been heavily impacted by Covid so far. I’m still consulting with Help Scout where I oversee their analytics and business intelligence efforts. I had been consulting with Automattic as well, but left earlier this year to focus more on growing Preceden, my long-running timeline maker tool. After a few months though I had knocked out most of my big todo list items for Preceden, so started looking for something new to work on and decided to I wanted to learn machine learning. My goal at the moment is to find some valuable ways to apply machine learning to help grow Preceden and Help Scout.

These days, my mornings are mostly spent getting better at machine learning through a combination of courses on DataCamp, books, and projects. Preceden is on the backburner, though I do spend some time each week working on support and fixing occasional bugs. My afternoons are spent with Help Scout where I spend a lot of time using dbtย and Looker to help the team gain insights though data.

Family-wise, we moved from Florida to North Carolina last summer and we’ve been very happy with the move. My kids are 5, 4, and 2 now and keep my wife and I very busy.

Health-wise, I’ve been experimenting with high-intesnsity interval training (HIIT) workouts on YouTube which I enjoy beause they’re short but also get you sweating a lot. Most benefits I get from those are negated by a suboptimal diet though (Chick Fil A and Dunkin Donuts are so good…).

I recently finished Ozark on Netflix and highly recommend it, especially if you enjoyed shows like Breaking Bad or Narcos.

I also usually play one, sometimes two online poker tournaments with friends each week – if you’re interested in joining shoot me an email.

I’ll try not to let a year go between blog posts in the future, but no promises ๐Ÿ˜.

Hope everything is going as well with you all.