Skip to content

Instantly share code, notes, and snippets.

@vlasenkov
Last active October 7, 2017 14:38
Show Gist options
  • Save vlasenkov/d7411bc3cf59b7838d025edf76126f3e to your computer and use it in GitHub Desktop.
Save vlasenkov/d7411bc3cf59b7838d025edf76126f3e to your computer and use it in GitHub Desktop.
Bash script for launching distributed TensorFlow training in several processes.
# Simple script for experiments with dsitributed TensorFlow.
# Requires jq utility for JSON parsing. Example JSON file can be found on the gist.
# Args:
# name of script to run, example: train.py
# file with cluster configuration, example: hosts.json
#!/bin/bash
script=$1
hosts=$2
for job in `jq -cr 'keys[]' $hosts`
do
task=0
for hostinfo in `jq -c ".$job[]" $hosts`
do
host=`echo $hostinfo | $jq -r '.host'`
# run script locally
python $script --job $job --task $task --hosts $hosts &
# or some command to run script remotely
# ssh -f user@$host "..."
((task++))
done
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment