Skip to content

Instantly share code, notes, and snippets.

@raven-rock
Created October 12, 2018 15:09
Show Gist options
  • Save raven-rock/2f229e1efa4c55739e67b2e903046897 to your computer and use it in GitHub Desktop.
Save raven-rock/2f229e1efa4c55739e67b2e903046897 to your computer and use it in GitHub Desktop.
Makefile example (extracting datasets from PostgreSQL/Redshift)
# Example Makefile
#
# Format:
# [target]: [dep1 [dep2 ...]]
# <tab>command(s) that build target
#
# Be sure to use actual tab characters for indentation before commands.
#
# Cheat sheet:
# $@ -> refers to the current target
# $(X) -> refers to variable X. Set one like this: X = 42.
# Some vars are predefined in make, like RM
# $(RM) -> stAnd in for `rm -f` by default on most systems
#
# This Makefile takes whatever name is passed in the DATASET variable
# and assumes there's a .sql file with that name as the first part of
# it. It executes the file thru psqltsv (my wrapper for TSV output
# from psql) and outputs to TSV, CSV, and PSV (pipe-delimited) files
# of the same name with respective file extensions. All these formats
# not necessary in a normal project, but just illustrating how GNU
# make can be used.
#
# DATASET = sweet_data
#
# Example command line usage:
#
# make DATASET=sweet_data
#
# This will assume there's a sweet_data.sql file in the same
# directory, and will first generate sweet_data.tsv, and then from
# that sweet_data.csv and sweet_data.psv. Note the the database will
# only be hit once to generate the TSV. If user reruns `make` and the
# TSV file's modified date is *after* the modified date of the SQL
# file, nothing will happen (idempotent), otherwise the TSV and
# subsequent CVS/PSV files regenerating with the new data. The
# data pipeline might be expressed like this:
#
# SQL -> TSV
# -> CSV
# -> PSV
#
# Although this example doesn't have much in the way of branches, you
# can see how it's a dependency tree, using each file's modified
# timestamp to determine from which place in the branches to rerun
# commands to (re)build targets from there down.
all: $(DATASET).psv $(DATASET).csv
$(DATASET).psv: $(DATASET).tsv
< $(DATASET).tsv perl -pe 's/\t/\|/g' > $@
$(DATASET).csv: $(DATASET).tsv
< $(DATASET).tsv xsv fmt -d '\t' > $@
$(DATASET).tsv: $(DATASET).sql
< $(DATASET).sql psqltsv > $@
clean:
$(RM) $(DATASET).tsv
$(RM) $(DATASET).csv
$(RM) $(DATASET).psv
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment