Skip to content

Instantly share code, notes, and snippets.

@taylorpaul
Last active November 20, 2023 10:27
Show Gist options
  • Star 28 You must be signed in to star a gist
  • Fork 4 You must be signed in to fork a gist
  • Save taylorpaul/250ee3ed2524e8c28ee7c58ed656a5b9 to your computer and use it in GitHub Desktop.
Save taylorpaul/250ee3ed2524e8c28ee7c58ed656a5b9 to your computer and use it in GitHub Desktop.
Tensorboard on SLURM

Environment:

Tensorflow: v0.11.0rc2 OS: CENTOS 6.8 (No root access)

Description:

  1. The tensorboardSLURM.sh can be run with the following command to start a tensorboard server on a SLURM cluster:
sbatch --array=0-0 tensorboardSLURM.sh 
  1. Check the tb-%J.out file for the $SERVER_IP and $SERVER_PORT your tensorboard server.

  2. ssh to your cluster, passing the above variables and a $LOCAL_PORT to pass the server page to:

ssh uname@login.node.edu -L $LOCAL_PORT:$SERVER_IP:$SERVER_PORT
  1. In your browers go to http://localhost:$LOCAL_PORT

Citations:

Thanks to Will McFadden for doing this first with iPython!

#!/bin/sh
#SBATCH --ntasks=1
#SBATCH -t 04:00:00 # max runtime is 4 hours
#SBATCH -J tensorboard_server # name
#SBATCH -o /work/thpaul/tf_tools/tensorflow/im2txt/tb-%J.out #TODO: Where to save your output
# To run as an array job, use the following command:
# sbatch --partition=beards --array=0-0 tensorboardHam.sh
# squeue --user thpaul
source /home/thpaul/.bash_profile #TODO: Your profile
MODEL_DIR=/work/thpaul/tf_tools/tensorflow/im2txt/im2txt/model #TODO: Your TF model directory
let ipnport=($UID-6025)%65274
echo ipnport=$ipnport
ipnip=$(hostname -i)
echo ipnip=$ipnip
module load cuda/8.0 #TODO: Your Cuda Module if required
tensorboard --logdir="${MODEL_DIR}" --port=$ipnport
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment