I haven’t made too many changes to how I write SQL recently, but I did adopt a new convention recently that I really like so wanted to share.
In the past, I would have written the following query like so:
select email, sum(amount) as total_revenue from users inner join charges on users.id = charges.user_id
Note that this does not prefix email
or amount
with the table name where they came from.
Claire Caroll of dbt fame recently pinged me to suggest a change: whenever there’s a join involved, you should include the table name to make it clear where the column originated. The query above would look like this:
select users.email, sum(charges.amount) as total_revenue from users inner join charges on users.id = charges.user_id
When there’s no join involved, it’s fine to leave it out because there’s no room for confusion:
select id, name from companies
I’ve been following this convention for a few weeks and really like it because there’s zero ambiguity when seeing this around where each column originated. It’s more verbose obviously, but I think the extra clarity outweighs that downside.
I’ve updated the style guide to reflect this guide.
I’d love to hear your thoughts – do you always include the name name, only when necessary, or follow a convention like this and only include it when joins are involved?