Skip to content

Instantly share code, notes, and snippets.

@godber
Created March 23, 2019 22:34
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save godber/4af453d0ec452f9b33f1b624b4db0238 to your computer and use it in GitHub Desktop.
Save godber/4af453d0ec452f9b33f1b624b4db0238 to your computer and use it in GitHub Desktop.
Makefile that bootstraps my Python and Kafka Talk
.PHONY: all clean data help tipcs cleandata cleantopics
help:
echo "all - do all setup, gets kafka and data and processes data"
echo "clean - deletes temp/"
echo "cleandata - deletes /tmp/kafka-logs /tmp/zookeeper"
echo "cleantopics - removes kafka topics"
echo "topics - creates kafka topics"
all: temp/kafka_2.12-2.1.1 data
clean:
rm -rf temp/
cleandata:
rm -rf /tmp/kafka-logs /tmp/zookeeper
data: temp/noaa-2016-sorted.json
temp/kafka_2.12-2.1.1: temp/kafka_2.12-2.1.1.tgz
tar -C temp -zxf temp/kafka_2.12-2.1.1.tgz
temp/kafka_2.12-2.1.1.tgz: temp
wget -P temp/ -nc http://apache.cs.utah.edu/kafka/2.1.1/kafka_2.12-2.1.1.tgz
temp/noaa-2016-sorted.json:
echo "This step requires lots of memory or will take a long time"
jq -s 'sort_by(.date) | .[]' -c temp/noaa-2016.json > temp/noaa-2016-sorted.json
temp/noaa-2016.json: temp/2016-sorted.csv temp/ghcnd-stations.txt temp/ghcnd-countries.txt temp/ghcnd-states.txt
echo "This step requires lots of memory or will take a long time"
cd temp; python3 ../process.py
temp/2016-sorted.csv: temp/2016.csv
echo "This step requires lots of memory or will take a long time"
sort --field-separator=',' --key=1,2 -o temp/2016-sorted.csv temp/2016.csv
temp/2016.csv: temp/2016.csv.gz
gunzip -k temp/2016.csv.gz
temp/2016.csv.gz:
wget -P temp/ -nc ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/by_year/2016.csv.gz
temp/ghcnd-stations.txt:
wget -P temp/ -nc ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/ghcnd-stations.txt
temp/ghcnd-countries.txt:
wget -P temp/ -nc ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/ghcnd-countries.txt
temp/ghcnd-states.txt:
wget -P temp/ -nc ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/ghcnd-states.txt
temp:
mkdir temp
topics:
temp/kafka_2.12-2.1.1/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 5 --topic noaa-csv-raw
temp/kafka_2.12-2.1.1/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 5 --topic noaa-json
cleantopics:
temp/kafka_2.12-2.1.1/bin/kafka-topics.sh --delete --zookeeper localhost:2181 --topic noaa-csv-raw
temp/kafka_2.12-2.1.1/bin/kafka-topics.sh --delete --zookeeper localhost:2181 --topic noaa-json
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment