Skip to content

Instantly share code, notes, and snippets.

@laic
Last active June 18, 2021 12:19
Show Gist options
  • Save laic/7b23e0fd21685f0527c91378fb45c395 to your computer and use it in GitHub Desktop.
Save laic/7b23e0fd21685f0527c91378fb45c395 to your computer and use it in GitHub Desktop.
eddie/pytorch setup
## login to eddie, need to be in the university VPN. The password should be your ease password
ssh your_uun@eddie.ecdf.ed.ac.uk
## Go to an interactive node with a GPU
## From the login node, it's good to run screen so that if you lose the connection you can rejoin
screen
## To reattach your session if you get bumped of you'll need to get to the same login node that you started from
## to get to login1 you neeed to do: ssh login01-ext.ecdf.ed.ac.uk
## load anaconda
module load anaconda
## Get setup with anaconda:
## SLP students should have access to this group space:
## /exports/chss/eddie/ppls/groups/lel_hcrc_cstr_students
## So, you can make a directory there called UUN_Firstname_Lastname and work in that
## You can probably just put things in your home directory or in the scratch space (/exports/eddie/scratch/<you-uun>)
## PLEASE NOTE THAT FILES IN THE SCRATCH SPACE GET DELETED AFTER 1 MONTH!
## See this page on setting anaconda directories
## https://www.wiki.ed.ac.uk/display/ResearchServices/Anaconda
CONDADIR=/exports/chss/eddie/ppls/groups/lel_hcrc_cstr_students/clai_Catherine_Lai/anaconda
mkdir -p $CONDADIR
mkdir -p $CONDADIR/envs
mkdir -p $CONDADIR/pkgs
## Tell conda where to look for environments and download packages
conda config --add envs_dirs $CONDADIR/envs/
conda config --add pkgs_dirs $CONDADIR/pkgs/
## make a conda environment
conda create -n slptorch python=3.8
source activate slptorch
## install things
## The default version of gcc is 4.8.5 (in /usr/bin/gcc) which is too old for some packages
## e.g. to get openSmile to work you'll need a more up to date version of gcc and g++
conda install gcc_linux-64 gxx_linux-64
conda install cmake
## make these the default C and C++ compilers:
## You probably only need to do the following if you're compiling opensmile
#alias gcc=x86_64-conda_cos6-linux-gnu-cc
#alias g++=x86_64-conda_cos6-linux-gnu-c++
## check that you're using the version in your conda environment
#which gcc
## after logging into eddie, you can ask for an interactive gpu node to do actual computations.
## You don't need this just to install pytorch, but it's handy to be able to test if you can
## actually access the gpu.
## The following requests the node for 48 hours (maximum allowed), 1 K80 GPU, 32GB of RAM.
qlogin -l h_rt=48:00:00 -pe gpu 1 -l h_vmem=32G
## start up the environment and install some packages if you like
module load anaconda
source activate slptorch
## If you're on an interactive node you need to run this so that the system knows to look for GPUs
source /exports/applications/support/set_cuda_visible_devices.sh
echo "Allocated GPU: $CUDA_VISIBLE_DEVICES"
## If you want to see the GPUs on node you're on, you can use this command from the bash shell (takes a while)
nvidia-smi
## That will just give you info for the time you run the command, but you can get it to keep printing out info with -l
nvidia-smi -l
## press Ctrl-C to stop it!
## Then just use the normal conda command to install pytorch etc, as well as the appropriate cudatoolkit
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
## Now check whether you've got a GPU in python
python -c 'import torch; print("cuda available?:", torch.cuda.is_available())'
## You can do some more checks on the GPU in python
import torch
## The following should return True if things are setup right
torch.cuda.is_available()
## You can try to do some tensor manipulations to check if you can put data on the GPU
## https://stackoverflow.com/questions/48152674/how-to-check-if-pytorch-is-using-the-gpu
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Using device:', device)
x = torch.rand(1000000, device=device)
print(x)
if device.type == 'cuda':
print(torch.cuda.get_device_name(0))
print('Memory Usage:')
print('Allocated:', round(torch.cuda.memory_allocated(0)/1024**3,5), 'GB')
print('Cached: ', round(torch.cuda.memory_reserved(0)/1024**3,5), 'GB')
## tested on an eddie interactive node
qlogin -l h_rt=48:00:00 -pe gpu 1 -l h_vmem=32G
## load conda on eddie (or else activate the environment whichever normal way you use if you're using your own computer!)
module load anaconda
source activate slptorch
## binaries for opensmile 3.0 didn't work on eddie because the default it version 4.8 (too old)
## to compile from source you need to use a later version of gcc and g++
## You can get newer versions of the compilers through conda, but you might not need to if you're
## using a more up to date system.
conda install gcc_linux-64
conda install gxx_linux-64
## make these the default C and C++ compilers
alias gcc=x86_64-conda_cos6-linux-gnu-cc
alias g++=x86_64-conda_cos6-linux-gnu-c++
## You need to install cmake to compile OpenSmile too
conda install cmake
## get the dependencies
conda install -c conda-forge sox
conda install pandas matplotlib scikit-learn h5py
conda install nltk
## clean up and remove downloaded packages (can save several GB!)
conda clean --all
## clone the repo in scratch space
## WARNING: Files in this space get deleted after 1 month so you'll need to either backup to somewhere else regularly or
## find another data staging solution
cd /exports/eddie/scratch/clai
## I'm cloning my fork of Roddy's original repo here. This has some updates to run on eddie and the maptask download script.
## You can just fork my version rather than his, or browse the updates on github
git clone https://github.com/laic/lstm_turn_taking_prediction.git
## get the data
cd lstm_turn_taking_prediction/data
## You'll need to copy the following script (attached to this gist) into this directory
## I couldn't get a new version from the maptask website
## It basically downloads all the audio files for you
sh maptaskBuild-93451-Tue-Jun-16-2020.wget.sh
## Get the maptask annotations
wget http://groups.inf.ed.ac.uk/maptask/hcrcmaptask.nxtformatv2-1.zip
unzip hcrcmaptask.nxtformatv2-1.zip
rm hcrcmaptask.nxtformatv2-1.zip
cd ..
## install opensmile
cd utils
git clone https://github.com/audeering/opensmile.git
cd opensmile
## Now run the build script
./build.sh
## The actual binary you need to use is: opensmile/build/progsrc/smilextract/SMILExtract
## Let's copy it into a bin directory
mkdir bin
cp build/progsrc/smilextract/SMILExtract bin
## test if it works: this should show some help info
./bin/SMILExtract -h
## So you'll need to check that the version used in the scripts/extract_gemaps.py to that location
## back in the top level directory lstm_turn_taking_prediction
cd ../..
## make a backup of the opensmile config files
mv utils/opensmile/config utils/opensmile/config_orig
## copy over Roddy's version
cp -r utils/config utils/opensmile
## in case of network disconnections
screen
## Get a GPU interactive session
qlogin -l h_rt=36:00:00 -pe gpu 1 -l h_vmem=32G
## load anaconda
module load anaconda
source activate slptorch
## set CUDA_VISIBLE_DEVICES in interactive mode
source /exports/applications/support/set_cuda_visible_devices.sh
echo "CUDA_VISIBLE_DEVICES: $CUDA_VISIBLE_DEVICES"
cd /exports/eddie/scratch/clai/lstm_turn_taking_prediction
## FEATURE EXTRACTION: you only have to do this the first time
## split the audio channels
sh scripts/split_channels.sh
## Get the features using a bunch of python scripts. This doesn't require a GPU, i.e. could be parallelized but is not at the moment.
## You may want to just run the first few commands in this script at first to check that the feature extraction is actually working.
python prepare_data.py
## Opensmile (called by scripts/extract_gemaps.py) creates a lot of output but as long as these are warnings and not errors, it's ok.
## In the long term, you may want to think about addressing some of the errors though!
## train the model, e.g.
python icmi_18_results_no_subnets.py
## NOTE: I change the number of epochs for icmi_18_results_no_subnets.py to 3 for debugging purposes.
## You'll want to change it back to a larger number to actually run experiments!
## Get a GPU interactive session
qlogin -l h_rt=36:00:00 -pe gpu 1 -l h_vmem=32G
## start up conda
module load anaconda
## Create a new environment for ophelia.
## The main thing to note that the code is based on python2.7 which is now no longer officially supported
## but you can still use it via conda
conda create -n ophelia python=2.7
source activate ophelia
## set CUDA_VISIBLE_DEVICES in interactive mode
source /exports/applications/support/set_cuda_visible_devices.sh
echo "CUDA_VISIBLE_DEVICES: $CUDA_VISIBLE_DEVICES"
## just check that you're using the latest version of pip from conda (so packages will get downloaded to your conda environment)
conda install pip
## There are a few issues with the numpy dependencies with mcd, so install these separately
conda install numpy
pip install mcd
## download stuff into the shared scratch space for the moment
cd /exports/eddie/scratch/clai
## get the ophelia repo
git clone https://github.com/CSTR-Edinburgh/ophelia.git
cd ophelia
## NOTE: remove numpy, mcd, tensorboard and tensorflow-gpu from requirements.txt in the repo otherwise you might get some clashes:
pip install -r ./requirements.txt
## Install tensorflow using conda. This basically 'just works' but you'll need to use a slightly
## later version of tensorflow and CUDA than are listed in the requirements.txt
conda install tensorflow-gpu=1.3.0 cudatoolkit=8
# check whether tensorflow words in python:
python
import tensorflow as tf
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
if tf.test.gpu_device_name():
print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))
else:
print("Please install GPU version of TF")
CODEDIR=/exports/eddie/scratch/clai/ophelia
## Since we are using cudatoolkit we need to remove the changes to LD_LIBRARY_PATH and PATH
## ... and basically just follow the steps on the ophelia github page to setup festival etc.
## You'll need to edit ophelia/utils/submit-tf.sh so that it looks for the CUDA libraries in the right places.
## Here's what I would suggest.
PYTHON=python
## Generic script for submitting any tensorflow job to GPU
# usage: submit.sh [scriptname.py script_arguments ... ]
## Location of this script: assume gpu_lock.py is in same place -
SCRIPTPATH=$( cd $(dirname $0) ; pwd -P )
## DON'T use gpu_lock.py on eddie: grid engine will assign the gpu id via CUDA_VISIBLE_DEVICES
#gpu_id=$(python2 $SCRIPTPATH/gpu_lock.py --id-to-hog)
### DONT use these paths on eddie! The cuda libraries aren't here. If you've installed cudatoolkit via conda you don't
### have to make any changes to LD_LIBRARY_PATH and you don't have to set CUDA_HOME. All the necessary libraries
### are in your conda environment lib folder
#export LD_LIBRARY_PATH=/opt/cuda-8.0.44/extras/CUPTI/lib64/:/opt/cuda-8.0.44/:/opt/cuda-8.0.44/lib64:/opt/cuDNN-7.0/:/opt/cuDNN-6.0_8.0/:/opt/cuda/:/opt/cuDNN-6.0_8.0/lib64:/opt/cuDNN-6.0/lib6
#export LD_LIBRARY_PATH=/opt/cuda-8.0.44/extras/CUPTI/lib64/:/opt/cuda-8.0.44/:/opt/cuda-8.0.44/lib64:/opt/cuDNN-7.0/:/opt/cuDNN-6.0_8.0/:/opt/cuda/:/opt/cuDNN-6.0_8.0/lib64:/opt/cuDNN-6.0/lib6:/opt/cuda-9.0.176.1/lib64/:/opt/cuda-9.1.85/lib64/:/opt/cuDNN-7.1_9.1/lib64
#export CUDA_HOME=/opt/cuda-8.0.44/
export KERAS_BACKEND=tensorflow
#export CUDA_VISIBLE_DEVICES=$gpu_id
## On eddie set the gpu_id using CUDA_VISIBLE_DEVICES rather than the other way around!
gpu_id=$CUDA_VISIBLE_DEVICES
echo "gpu_id: $CUDA_VISIBLE_DEVICES"
if [ $gpu_id -gt -1 ]; then
$PYTHON $@
## Not using gpu_lock so no need to release!
# python2 $SCRIPTPATH/gpu_lock.py --free $gpu_id
else
echo 'Let us wait! No GPU is available!'
fi

Data Storage issues

https://www.wiki.ed.ac.uk/display/ResearchServices/Data+Staging https://www.wiki.ed.ac.uk/display/ResearchServices/DataStore

Request group storage based on your personal allocation by contacting the IS helpline with the information requested here:

  • Group space name;
  • Storage owner (Lead PI);
  • Owning School;
  • List of University Usernames and amount of reallocated quota per user for all the researchers that will be reallocating to this space;
  • List of University Usernames who should have access to this space.
# There seems to be an issue with compiling opensmile with libgcc and libstdc++ version 9.3 (which are default on conda now).
# You may need to downgrade to 7.3.0 versions:
conda install gcc_linux-64=7.3.0
conda install libgcc-ng=7.3.0
conda install gxx_linux-64=7.3.0
## possibly just libstdcxx is the problem
conda install libstdcxx-ng=7.3.0
  • example grid submission script
## Here's an example of how you might use the cuda modules on eddie.
## It does seem to work, but the current advice is just to use conda to install the appropriate version cudatoolkit.
## Then you won't have to worry about setting many environment variables.
## You'll probably want to do this on an interactive gpu node so that you can test if it works right away
## qlogin -l h_rt=02:00:00 -pe gpu 1 -l h_vmem=32G
## start up the environment and install some packages if you like
module load anaconda
source activate slp
module load cuda/10.2.89
## If you're on an interactive node you need to run this so that the system knows to look for GPUs
source /exports/applications/support/set_cuda_visible_devices.sh
## If the following gives you nothing then something has gone wrong and you haven't been allocated a GPU
echo $CUDA_VISIBLE_DEVICES
## find out where the CUDA code actually is (nvcc is the CUDA compiler which you won't need to use directly):
which nvcc
## This gives me: /exports/applications/apps/SL7/cuda/10.2.89/bin/nvcc
## So, now I can tell bash where to look for CUDA code here:
export CUDA_HOME=/exports/applications/apps/SL7/cuda/10.2.89/
conda install pytorch torchaudio torchvision
## You can also install pytorch using pip
#pip3 install torch torchvision torchaudio
## Now check whether you've got a GPU in python
python -c 'import torch; print("cuda available?: ", torch.cuda.is_available())'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment