Skip to content

Instantly share code, notes, and snippets.

@sjackman
Created August 27, 2013 22:11
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sjackman/6359837 to your computer and use it in GitHub Desktop.
Save sjackman/6359837 to your computer and use it in GitHub Desktop.
Parallelize wc by implementing map/reduce in make
#!/usr/bin/make -Rrf
# Parallelize wc by implementing map/reduce in make
# Written by Shaun Jackman.
# Number of chunks
n=4
all: words.wc words.mapreduce.wc
clean:
rm -f words.wc words.mapreduce.wc words.[0-9][0-9]
test: words.wc words.mapreduce.wc
diff -sw $^
.PHONY: all clean test
.DELETE_ON_ERROR:
# Input data
words: /usr/share/dict/words
ln -s $< $@
# Partition
chunks=$(foreach i,$(shell seq -f%02.0f 0 $$(($n-1))),%.$i)
$(chunks): %
split -dnl/$n $< $*.
# Map
%.wc: %
wc $< >$@
# Reduce
%.mapreduce.wc: $(foreach i,$(chunks),$i.wc)
awk '{a+=$$1; b+=$$2; c+=$$3} END {print a, b, c, "$*"}' $^ >$@
@tseemann
Copy link

Very neat!

Perhaps you should use "mktemp" to get a directory for the temp files.

Often /tmp points to a much faster disk (SSD) which could speed things up, but I guess it could also not be large enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment