Skip to content

Instantly share code, notes, and snippets.

@hamelin
Last active March 19, 2020 19:18
Show Gist options
  • Save hamelin/3a8ed923e8f0ab56eec8f0d9e64e3b7a to your computer and use it in GitHub Desktop.
Save hamelin/3a8ed923e8f0ab56eec8f0d9e64e3b7a to your computer and use it in GitHub Desktop.
Makefile for chunking the Los Alamos dataset for parallel processing of the data, while preserving compression.
SHELL = /bin/bash
bigfiles = auth dns proc flows
lines-per-chunk = 250000
.PHONY: bigfiles
bigfiles: $(addsuffix .chunks,$(bigfiles))
%.chunks: %.txt.gz
trap "rm -rf $@" INT ERR;\
mkdir $@;\
zcat $< |\
split --verbose\
-d\
--suffix-length=4\
--lines=$(lines-per-chunk)\
--additional-suffix=.txt.gz\
--filter='gzip -c >$$FILE'\
-\
'$@/'
.PHONY: clean
clean:
rm -rf *.chunks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment