In my last post, I wrote about how to set up a Bash alias to pull your Heroku data into a local Postgres database. This post takes it a step further by showing how to automate it so that the database is automatically updated on a regular basis.
We left off the last post with this Bash alias:
alias prdb="dropdb preceden_production_copy; PGUSER=postgres PGPASSWORD=password heroku pg:pull
HEROKU_POSTGRESQL_MAGENTApreceden_production_copy --app sushi"
This lets me run
prdb which first deletes my local database copy, then pulls the production data into a new one.
When attempting to schedule this as a cronjob, I ran into several issues.
dropdb won’t work if the database is being used by any applications. You’ll get an error message like:
dropdb: database removal failed: ERROR: database "preceden_production_copy" is being accessed by other users
This is a problem because I usually leave Metabase and SQLPro for Postgres open which causes dropdb to throw and error which causes
pg:pull not to work because the database already exists. I then have to shut down the apps and try again, and even then there’s often some hidden connection that requires a complete reboot.
Second, it usually takes about half an hour for
pg:pull to load all of the data into a database. If I’m eager to dive into the latest data, it’s frustrating to have to wait that long to do it.
Lastly, you can’t use aliases in a cronjob, meaning you’d have to duplicate the content of the alias in the cronjob.
As we’ll see, we can work through each of these issues.
There’s a way with Postgres to drop existing connections. By excuting this query through terminal before we attempt to drop the database, we ensure the command will work. Here’s what it looks like:
echo "SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE datname='preceden_production_copy';" | psql -U postgres
If all goes well, you’ll see something like:
pg_terminate_backend ---------------------- t t (2 rows)
Next, in order to avoid duplicating the alias command and the cronjob command, we’ll combine all of the commands into a Bash script which we’ll execute from both places:
#!/bin/bash current_date_time="`date +%Y\-%m\-%d\ %H\:%M\:%S`"; echo "Running at:" $current_date_time; echo "SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE datname='preceden_production_copy';" | psql -U postgres dropdb preceden_production_copy PGUSER=postgres PGPASSWORD=password heroku pg:pull
HEROKU_POSTGRESQL_MAGENTApreceden_production_copy --app sushi
This will output the date and time when the script is being run, then drop any open database connections, remove the database, then pull the production Postgres data into a new database.
Next, we can update the alias to point to this script:
And we can point to it from a cronjob:
0 12 * * * /Users/matt/Projects/Preceden/prdb.sh >> /Users/matt/Projects/Preceden/prdb.log 2>&1
This says “At noon every day, run prdb.sh and append the output to prdb.log”.
And voila! The data should get updated automatically once a day thanks to the cronjob and you still have the ability to run it manually using the alias.