Using Macros in dbt to Select Different Table Names in Development and Production

I’ve been experimenting a lot with dbt, a command line tool for transforming data within a data warehouse. I’ll have more to write about dbt later, but want to share a few lessons learned as I work through it for anyone else Googling for solutions in the future.

Here’s a problem I ran into and how to solve it:

In my development environment, I have an exact copy of my production Postgres database. For example, I can run a query like:

select count(*) from users;

In my data warehouse, the tables are created by Stitch which automatically adds the tables to a schema. To query for the number of users in the data warehouse, I have to prepend the schema to the table name:

select count(*) from preceden_heroku.users;

This presents a slight challenge for configuring dbt. It’s a best practice to use variables to set table names that way if the table name changes in the future, you can just adjust the variable and all of the models that depend on it will continue to work correctly:


vars:
"base.users" : "users"

view raw

base-table.yml

hosted with ❤ by GitHub


select
id,
email
from {{ var('base.users') }}

view raw

base-model.sql

hosted with ❤ by GitHub

However, with my setup, the table names differ between development and production so I can’t use this setup without some adjustment.

Macros to the rescue

I asked in the dbt Slack group how to work around this and Drew Banin, one of dbt’s creators, recommend using a macro. Macros are snippets of code that you can use in your dbt models like functions.

We can create a macro that accepts a table name and adjusts it based on whether dbt is being run in development or production:


{% macro source(table_name) %}
{% if target.name == 'prod' %}
{{ return("preceden_heroku." ~ table_name) }}
{% else %}
{{ return(table_name) }}
{% endif %}
{% endmacro %}

view raw

dbt-source.sql

hosted with ❤ by GitHub

In development, if we pass in an argument like “users”, the macro will simply return “users”. If production, it will prepend the table name with the schema name to return “preceden_heroku.users”.

Then we can use the this macro in the model by passing in the table name (via the variable) to have dbt use table names without a schema in development and table names with a schema in production:


select
id,
email
from {{ source(var('base.users')) }}

Easy peasy :)