Skip to content

Instantly share code, notes, and snippets.

@malditogeek
Created November 12, 2010 15:00
Show Gist options
  • Save malditogeek/674181 to your computer and use it in GitHub Desktop.
Save malditogeek/674181 to your computer and use it in GitHub Desktop.
#!/bin/sh
# Map function is executed distributed
MAP=${MAP}
# Reduce is executed locally
REDUCE=${REDUCE}
# Input file, a huge one, splittable safe
INPUT=${1}
# Working directory setup
TIMESTAMP=`date +%s`
WORKDIR=jadup_${TIMESTAMP}
# Create a temporary directory
mkdir -p ${WORKDIR}
# Split the input file in chunks
echo '[SPLIT]'
time split -b 128m ${INPUT} ${WORKDIR}/chunk_
# As many jobs as CPU cores
JOBS="--jobs +0"
# Show progress
ETA="--progress"
# Run locally (:) and in the remote servers
# specified in ~/.parallel/sshloginfile
SERVERS="--sshloginfile .."
# Return output from the servers
OUTPUT="--trc {.}.map"
# Verbosity
VERBOSITY="--silent"
echo '[MAP]'
time ls ${WORKDIR}/chunk_* | parallel ${JOBS} ${ETA} ${SERVERS} ${OUTPUT} ${VERBOSITY} "${MAP} >> {.}.map"
echo '[REDUCE]'
time ls ${WORKDIR}/chunk_*.map | parallel ${JOBS} ${ETA} ${VERBOSITY} "${REDUCE} >> ${WORKDIR}/reduce.out"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment