I’ve been experimenting a lot with dbt, a command line tool for transforming data within a data warehouse. I’ll have more to write about dbt later, but want to share a few lessons learned as I work through it for anyone else Googling for solutions in the future.
Here’s a problem I ran into and how to solve it:
In my development environment, I have an exact copy of my production Postgres database. For example, I can run a query like:
select count(*) from users;
In my data warehouse, the tables are created by Stitch which automatically adds the tables to a schema. To query for the number of users in the data warehouse, I have to prepend the schema to the table name:
select count(*) from preceden_heroku.users;
This presents a slight challenge for configuring dbt. It’s a best practice to use variables to set table names that way if the table name changes in the future, you can just adjust the variable and all of the models that depend on it will continue to work correctly:
However, with my setup, the table names differ between development and production so I can’t use this setup without some adjustment.
Macros to the rescue
We can create a macro that accepts a table name and adjusts it based on whether dbt is being run in development or production:
In development, if we pass in an argument like “users”, the macro will simply return “users”. If production, it will prepend the table name with the schema name to return “preceden_heroku.users”.
Then we can use the this macro in the model by passing in the table name (via the variable) to have dbt use table names without a schema in development and table names with a schema in production:
Easy peasy :)