AruniRC/install_env_gypsum.md

## install_env_gypsum.md

      
    Raw
  

              install_env_gypsum.md
            
          
    This walkthrough describes setting up Detectron (3rd party pytorch implementation) and Graph Conv Net (GCN) repos on the UMass cluster Gypsum. Most commands are specific to that setting.
Gypsum environment

$ module list
Currently Loaded Modulefiles:
  1) slurm/16.05.8                         3) hdf5/1.6.10                           5) gcc5/5.4.0                            7) cudnn/5.1
  2) openmpi/gcc/64/1.10.1                 4) fftw2/openmpi/open64/64/float/2.1.5   6) cuda80/toolkit/8.0.61                 8) hdf5_18/1.8.17

Make sure that only these modules are loaded and not multiple versions of CUDA etc that can cause build conflicts further on.
Create conda env

conda create -n detectron-context python=3.5
If you need to install conda on the Gypsum cluster, follow these instructions.
pip install https://download.pytorch.org/whl/cu80/torch-0.4.0-cp35-cp35m-linux_x86_64.whl
pip install numpy -I

Test it out

Start python at the command line and try to import torch (without errors):
$ python
>>> import torch

Rest of the packages:
pip install torchvision
pip install matplotlib
pip install scipy
pip install pyyaml
pip install cython
pip install pycocotools
pip install opencv-python
conda install cffi   

Visualization installs

pip install tensorboardX
pip install tensorboard_logger
pip install tensorboard

Setup Detectron-pytorch

Assuming you are in the root of the detectron project folder
cd lib  # please change to this directory
srun --pty --gres gpu:1 --mem 60000 sh make.sh

Make sure that there are no fatal errors in the output log of the make command above. Common issues are usually multiple versions of CUDA or CuDNN being present in the Slurm modules.
Put the Imagenet pre-trained models in data/pretrained_model (python tools/download_imagenet_weights.py).
Then, verify setup by running COCO-2017 inference code:
CFG_PATH=configs/baselines/e2e_faster_rcnn_R-50-C4_1x.yaml
WT_PATH=/mnt/nfs/work1/elm/arunirc/Research/detectron-video/mask-rcnn.pytorch/data/detectron_trained_model/e2e_faster_rcnn_R-50-C4_1x.pkl

srun --pty -p m40-long --gres gpu:4 --mem 100000 python tools/test_net.py \
--set TEST.SCORE_THRESH 0.1 TRAIN.JOINT_TRAINING False TRAIN.GT_SCORES False \
--multi-gpu-testing \
--dataset coco2017 \
--cfg ${CFG_PATH} \
--load_detectron ${WT_PATH} \
--output_dir Outputs

Setup pygcn

cd pygcn-master
srun --pty python setup.py install

Check everything is working:
cd pygcn
srun --pty --mem 60000 python train.py