Matt Mazur

We’re currently in the process of rolling out Beacon, our live chat tool, to existing customers who have expressed interest trying it.

Customers could have expressed interest in two ways: either by filling out an interest form or mentioning to our support team that they want to try it.

For the interest form, there is one table, hubspot.contact, with two relevant fields:

email – The user’s email address
property_beacon_interest – A Unix timestamp in milliseconds representing when they filled out the form or null if they have not expressed interest

+------------------+--------------------------+
|      email       | property_beacon_interest |
+------------------+--------------------------+
| matt@example.com |            1534101377000 |
| eli@example.com  |                          |
+------------------+--------------------------+

When a customer expresses interest in a support conversation, our support team tags the conversation with a beacon-interest tag. There are two relevant tables:

helpscout.conversation with three relevant fields:

id – The id of the conversation
email – The email of the person who reached out to support
created_at – A timestamp with the date/time the conversation was created

+----+-------------------+--------------------------+
| id |       email       |        created_at        |
+----+-------------------+--------------------------+
|  1 | matt@example.com  | 2018-08-14 14:02:10 UTC  |
|  2 | eli@example.com   | 2018-08-14 14:06:30 UTC  |
|  3 | matt@example.com  | 2018-08-14 14:07:33 UTC  |
|  4 | katia@example.com | 2018-08-14 14:11:30 UTC  |
|  5 | jen@example.com   | 2018-08-13 14:11:30 UTC  |
+----+-------------------+--------------------------+

There’s also a helpscout.conversation_tag table with two relevant fields:

conversation_id – The id of the conversation that was tagged. A conversation can have zero or more tags.
tag – The name of the tag

+-----------------+-----------------+
| conversation_id |       tag       |
+-----------------+-----------------+
|               1 | new-trial       |
|               1 | bug-report      |
|               2 | beacon-interest |
|               4 | beacon-interest |
+-----------------+-----------------+

Your challenge:

Write a SQL query (any dialect is fine) that combines data from these two sources that lists everyone who has expressed interest in trying Beacon and when they first expressed that interest.

The end result using the example tables above should be a functioning SQL query that returns the following:

+-------------------+-------------------------+
|      email        |  expressed_interest_at  |
+-------------------+-------------------------+
| matt@example.com  | 2018-08-12 19:16:17 UTC |
| eli@example.com   | 2018-08-14 14:06:30 UTC |
| katia@example.com | 2018-08-14 14:11:30 UTC |
+-------------------+-------------------------+

You should include your query in the response field to this question in the online application.

2022-08-10T08:49:20-04:00

Hi Matt. I’m transitioning from Sr. Analyst to Analytics Engineer and am currently working to develop expertise in SQL to support the move. Your article was VERY helpful to me. My solution was close to yours so that was reassuring. Your commentary throughout the article/analysis helped me clean up a bit of my own understanding and habits/practices so thank you for that 🙌

Take care.
-Jason

Reply

2022-08-14T13:48:35-04:00

Glad it was helpful! You might also like my SQL style guide: https://github.com/mattm/sql-style-guide

Reply

2022-08-18T21:59:24-04:00

Yes! I stumbled onto that a while back and have it as one of my top go to SQL style guides – appreciate you, Matt 😎

Reply

	with hubspot_interest as (

	select
	email,
	timestamp_millis(property_beacon_interest) as expressed_interest_at
	from hubspot.contact
	where property_beacon_interest is not null

	),

	support_interest as (

	select
	email,
	created_at as expressed_interest_at
	from helpscout.conversation
	join helpscout.conversation_tag on conversation.id = conversation_tag.conversation_id
	where tag = "beacon-interest"

	),

	combined_interest as (

	select * from hubspot_interest
	union all
	select * from support_interest

	),

	final as (

	select
	email,
	min(expressed_interest_at) as expressed_interest_at
	from combined_interest
	group by 1

	)

	select * from final

	# Good: All uppercase

	SELECT
	email,
	datetime(property_beacon_interest/1000, 'unixepoch') AS expressed_interest_at
	FROM
	hubspot.contact
	WHERE
	property_beacon_interest IS NOT NULL
	UNION
	SELECT
	c.email,
	c.created_at AS expressed_interest_at
	FROM
	helpscout.conversation c
	INNER JOIN helpscout.conversation_tag ct
	ON c.id = ct.conversation_id AND ct.tag = 'beacon-interest'


	# Good: All lowercase

	select
	email,
	first_interest = min(first_interest)
	from (
	— interest forms
	select
	email,
	first_interest = dateadd(S, property_beacon_interest/1000, '1970-01-01')
	from
	hubspot.contact
	where
	property_beacon_interest is not null

	— support team tags
	union all
	select
	email,
	first_interest = created_at
	from
	helpscout.conversation c join
	helpscout.conversation_tag ct on c.id = ct.conversation_id and ct.tag = 'beacon-interest'
	) combined
	group by
	email

	# Okay: Mixed uppercase and lowercase

	SELECT helpscout.conversation.email, helpscout.conversation.created_at as expressed_interest_at
	FROM helpscout.conversation
	INNER JOIN helpscout.conversation_tag ON helpscout.conversation.id=helpscout.conversation_tag.conversation_id
	WHERE tag="beacon-interest"
	UNION
	select hubspot.contact.email, DATETIME(hubspot.contact.property_beacon_interest/1000, 'unixepoch') \|\| ' UTC' as expressed_interest_at
	FROM hubspot.contact
	WHERE property_beacon_interest != ''
	ORDER BY expressed_interest_at;

	# Bad: Other variations

	Select
	email
	From
	helpscout.conversation
	Where
	created_at
	And
	conversation_id
	Order By
	email

	# GOOD: CTEs
	# I'd put each CTE on a new line though to make it easier to comment and copy/paste

	with form as (

	select
	email,
	to_timestamp(property_beacon_interest) as expressed_interest_at
	from hubspot.contact

	), support as (

	select
	c.email,
	c.created_at as expressed_interest_at
	from hubspot.conversation c
	left join hubspot.conversation_tag t on c.id = t.conversation_id
	where t.tag = 'beacon-interest'

	), unioned as (

	select * from form
	union all
	select * from support

	), final as (

	select
	email,
	min(expressed_interest_at) as expressed_interest_at
	from unioned
	group by 1
	order by 2

	)

	select * from final

	# OKAY: Subqueries

	SELECT email, MIN(created_at) AS expressed_interest_at
	FROM
	(
	SELECT a.email, MIN(created_at) AS created_at
	FROM conversation a
	INNER JOIN conversation_tag b
	ON a.id=b.conversation_id
	WHERE b.tag="beacon-interest"
	GROUP BY id
	UNION
	SELECT email,DATETIME(time_created/1000,'unixepoch')\|\| ' UTC' AS created_at
	FROM hubspot.contact
	WHERE time_created IS NOT NULL
	)
	GROUP BY email ORDER BY expressed_interest_at

	# Good: Descriptive names

	WITH beacon_interest_contact as (
	SELECT
	email,
	FROM_UNIXTIME(property_beacon_interest/1000) as expressed_interest_at
	FROM
	hubspot.contact
	WHERE
	property_beacon_interest IS NOT NULL),
	beacon_interest_conversation as (
	SELECT
	email,
	created_at as expressed_interest_at
	FROM
	helpscout.conversation
	INNER JOIN
	helpscout.conversation_tag
	ON
	conversation.id = conversation_tag.conversation_id
	WHERE
	conversation_tag.tag = 'beacon-interest'),
	beacon_interest_union as (
	SELECT
	*
	FROM
	beacon_interest_contact
	UNION
	SELECT
	*
	FROM
	beacon_interest_conversation)
	SELECT
	email,
	MIN(expressed_interest_at) as expressed_interest_at
	FROM
	beacon_interest_union
	GROUP BY
	email;

	# Bad: Non-descriptive names

	;with x0 as
	(
	Select hcv.email, hcv.created_at expressed_interest_at
	From helpscout.conversation hcv
	Inner join helpscout.conversation_tag hct on hcv.id = hct.conversation_id
	Where hct.tag = ‘beacon-interest’
	Union
	Select email, dateadd(S,property_beacon_interest, '1970-01-01') expressed_interest_at
	From hubspot.contact
	Where property_beacon_interest is not null

	),

	X1 as (
	Select email , expressed_interest_at, row_number() over (partition by email order by expressed_interest_at) as RowNo
	From x0
	Order by expressed_interest_at
	)

	Select email, expressed_interest_at
	From x1
	Where RowNo =1

	# Good: INNER JOIN

	…
	SELECT a.email AS email, a.created_at AS expressed_interest_at
	FROM helpscout.conversation AS a
	INNER JOIN helpscout.conversation AS b ON a.id = b.conversation_id
	WHERE b.conversation_id = 'beacon-interest'
	…

	# Okay: LEFT JOIN

	…
	select
	c.email,
	c.created_at as expressed_interest_at
	from hubspot.conversation c
	left join hubspot.conversation_tag t on c.id = t.conversation_id
	where t.tag = 'beacon-interest'
	…

Matt Mazur

Analyzing 89 Responses to a SQL Screener Question for a Senior Data Analyst Position

Analyzing the Differences

SQL Casing

Common Table Expressions (CTEs) vs Subqueries

Meaningful CTE Names

INNER JOIN vs LEFT JOIN

Implicit vs Explicit INNER

Filtering in the WHERE clause vs JOIN condition

Converting Milliseconds

UNION DISTINCT vs UNION ALL

Accounting for Multiple Interest

GROUP BY Column Name vs Number

Single vs Multiple Lines When Listing Multiple Columns

Comma First vs Comma Last

All Responses

3 thoughts on “Analyzing 89 Responses to a SQL Screener Question for a Senior Data Analyst Position”

Leave a comment Cancel reply

	# Good: Specifying INNER

	…
	select a.email
	, a.created_at as expressed_interest_at
	from helpscout.conversation a
	INNER JOIN
	helpscout.conversation_tag b
	ON a.id = b.conversation_id
	where b.tag = ‘beacon-interest’
	…

	# Good: Omitting INNER

	…
	SELECT HC.`email`, DATE_FORMAT(HC.`created_at`, '%Y-%m-%e %T UTC') AS expressed_interest_at
	FROM `helpscout.conversation` AS HC
	JOIN `helpscout.conversation_tag` AS HCT ON HC.`id`=HCT.`conversation_id` WHERE HCT.`tag`='beacon-interest'
	ORDER BY expressed_interest_at
	…

	# Good: Filtering in the WHERE clause

	…
	SELECT cnv.email,
	cnv.created_at AS expressed_interest_at
	FROM helpscout.conversation AS cnv
	JOIN helpscout.conversation_tag AS ctg
	ON cnv.id = ctg.conversation_id
	WHERE ctg.tag = 'beacon-interest'
	…

	# Good: Including it in the join condition

	…
	select conv.email, conv.created_at as expressed_interest_at
	from helpscout.conversation conv
	join helpscout.conversation_tag tag
	on conv.id = tag.conversation_id and tag.tag = 'beacon-interest'
	…

	# Good: Accounting for the timestamp being in millseconds

	…
	select email, (timestamp '1970-01-01 00:00:00 UTC' +
	numtodsinterval(property_beacon_interest/1000, 'SECOND'))
	at time zone 'utc') as created_at
	from hubspot.contact
	…

	# Bad: Not realizing millseconds need to be handled differently

	…
	SELECT
	EMAIL
	,FROM_UNIXTIME(PROPERTY_BEACON_INTEREST) AS EXPRESSED_INTEREST_AT
	FROM HUBSPOT.CONTACT
	WHERE
	PROPERTY_BEACON_INTEREST IS NOT NULL
	…

	# Good: UNION ALL

	SELECT
	C.email,
	C.created_at AS expressed_interest_at
	FROM helpscout.conversation C
	INNER JOIN helpscout.conversation_tag CT ON CT.conversation_id = C.id
	WHERE CT.tag = 'beacon-interest'
	UNION ALL
	SELECT
	email,
	DATEADD(S, CONVERT(INT,LEFT(property_beacon_interest, 10)), '1970-01-01') AS expressed_interest_at
	FROM hubspot.contact

	# Okay: UNION DISTINCT

	SELECT email, MIN(created_at) AS expressed_interest_at
	FROM
	(SELECT a.email, MIN(created_at) AS created_at
	FROM conversation a
	INNER JOIN conversation_tag b
	ON a.id=b.conversation_id
	WHERE b.tag="beacon-interest"
	GROUP BY id
	UNION
	SELECT email,DATETIME(time_created/1000,'unixepoch')\|\| ' UTC' AS created_at
	FROM hubspot.contact
	WHERE time_created IS NOT NULL)
	GROUP BY email ORDER BY expressed_interest_at

	# Good: Taking the earliest timestamp for each email

	SELECT email, MIN(created_at) AS expressed_interest_at
	FROM
	(SELECT c.email, MIN(created_at) AS created_at
	FROM helpscout.conversation c, helpscout.conversation_tag ct
	WHERE c.id =ct.conversation_id
	AND ct.tag="beacon-interest"
	GROUP BY id
	UNION
	SELECT email,DATETIME(time_created/1000,'unixepoch')\|\| ' UTC' AS created_at
	FROM hubspot.contact
	WHERE time_created IS NOT NULL)
	GROUP BY email ORDER BY expressed_interest_at;

	# Bad: Missing it

	select cont.email, cont.property_beacon_interest as expressed_interest_at
	from hubspot.contact cont
	where cont.property_beacon_interest is not null
	union
	select conv.email, conv.created_at as expressed_interest_at
	from helpscout.conversation conv
	join helpscout.conversation_tag tag
	on conv.id = tag.conversation_id and tag.tag = 'beacon-interest'
	order by expressed_interest_at

	# Good: Grouping by column number

	…
	select
	email,
	min(expressed_interest_at) as expressed_interest_at
	from unioned
	group by 1
	order by 2
	…

	# Good: Grouping by column name

	…
	SELECT a.email, min(b.created_at) AS expressed_interest_at
	FROM hubspot.contact AS a
	JOIN helpscout.conversation AS b ON a.email = b.email
	GROUP BY a.email
	…

	# Good: Multiple columns per line

	…
	SELECT email, created_at AS expressed_interest_at
	FROM helpscout.conversation c
	…

	# Good: One column per line

	…
	SELECT
	email,
	FROM_UNIXTIME(property_beacon_interest/1000) as expressed_interest_at
	FROM
	hubspot.contact
	…

	# Good: Comma last

	…
	SELECT
	email,
	FROM_UNIXTIME(property_beacon_interest/1000) as expressed_interest_at
	FROM
	hubspot.contact
	…

	# Good: Comma first
	…
	select
	email
	,dateadd(s, convert(int,left(property_beacon_interest,10)), '1970-01-01')
	from hubspot.contact
	…

Analyzing the Differences

SQL Casing

Common Table Expressions (CTEs) vs Subqueries

Meaningful CTE Names

INNER JOIN vs LEFT JOIN

Implicit vs Explicit INNER

Filtering in the WHERE clause vs JOIN condition

Converting Milliseconds

UNION DISTINCT vs UNION ALL

Accounting for Multiple Interest

GROUP BY Column Name vs Number

Single vs Multiple Lines When Listing Multiple Columns

Comma First vs Comma Last

All Responses

Share this:

3 thoughts on “Analyzing 89 Responses to a SQL Screener Question for a Senior Data Analyst Position”

Leave a comment Cancel reply