Created
October 12, 2018 15:09
-
-
Save raven-rock/2f229e1efa4c55739e67b2e903046897 to your computer and use it in GitHub Desktop.
Makefile example (extracting datasets from PostgreSQL/Redshift)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Example Makefile | |
# | |
# Format: | |
# [target]: [dep1 [dep2 ...]] | |
# <tab>command(s) that build target | |
# | |
# Be sure to use actual tab characters for indentation before commands. | |
# | |
# Cheat sheet: | |
# $@ -> refers to the current target | |
# $(X) -> refers to variable X. Set one like this: X = 42. | |
# Some vars are predefined in make, like RM | |
# $(RM) -> stAnd in for `rm -f` by default on most systems | |
# | |
# This Makefile takes whatever name is passed in the DATASET variable | |
# and assumes there's a .sql file with that name as the first part of | |
# it. It executes the file thru psqltsv (my wrapper for TSV output | |
# from psql) and outputs to TSV, CSV, and PSV (pipe-delimited) files | |
# of the same name with respective file extensions. All these formats | |
# not necessary in a normal project, but just illustrating how GNU | |
# make can be used. | |
# | |
# DATASET = sweet_data | |
# | |
# Example command line usage: | |
# | |
# make DATASET=sweet_data | |
# | |
# This will assume there's a sweet_data.sql file in the same | |
# directory, and will first generate sweet_data.tsv, and then from | |
# that sweet_data.csv and sweet_data.psv. Note the the database will | |
# only be hit once to generate the TSV. If user reruns `make` and the | |
# TSV file's modified date is *after* the modified date of the SQL | |
# file, nothing will happen (idempotent), otherwise the TSV and | |
# subsequent CVS/PSV files regenerating with the new data. The | |
# data pipeline might be expressed like this: | |
# | |
# SQL -> TSV | |
# -> CSV | |
# -> PSV | |
# | |
# Although this example doesn't have much in the way of branches, you | |
# can see how it's a dependency tree, using each file's modified | |
# timestamp to determine from which place in the branches to rerun | |
# commands to (re)build targets from there down. | |
all: $(DATASET).psv $(DATASET).csv | |
$(DATASET).psv: $(DATASET).tsv | |
< $(DATASET).tsv perl -pe 's/\t/\|/g' > $@ | |
$(DATASET).csv: $(DATASET).tsv | |
< $(DATASET).tsv xsv fmt -d '\t' > $@ | |
$(DATASET).tsv: $(DATASET).sql | |
< $(DATASET).sql psqltsv > $@ | |
clean: | |
$(RM) $(DATASET).tsv | |
$(RM) $(DATASET).csv | |
$(RM) $(DATASET).psv |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment