Skip to content

Instantly share code, notes, and snippets.

@ksferguson
Last active November 21, 2020 22:51
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ksferguson/0b384e892689617d1539d35c1254eb01 to your computer and use it in GitHub Desktop.
Save ksferguson/0b384e892689617d1539d35c1254eb01 to your computer and use it in GitHub Desktop.
Paperspace fast.ai & MLIB Setup Notes

Paperspace fast.ai Setup

  1. Go here to get started: https://github.com/reshamas/fastai_deeplearn_part1/blob/master/tools/paperspace.md

  2. See #1 last section on ssh public key setup

#to generate public/private key pair, once per ssh machine...
#accept defaults, enter password optionally
ssh-keygen
#fill in name of public key and public IP
ssh-copy-id -i ~/.ssh/id_rsa.pub paperspace@ip    
#... to shell in
ssh paperspace@ip
#for jupyter notebook to easily run on your machine using copy-pasted localhost:8888 token url
ssh paperspace@ip -L localhost:8888:localhost:8888

#everyday operations
ssh paperspace@ip -L localhost:8888:localhost:8888
jupyter notebook --no-browser

#troubeshooting: to find and kill an open port bind
ps aux | grep localhost
kill nnn
  1. Per #1, Initially & periodically update:
source activate fastai (if needed)
cd fastai
git pull
#conda updates take several minutes the first time (>50 packages updated)
conda env update
  1. To stay in sync with multiple work spaces (Paperspace box, local Ubuntu, etc.), fork the fastai/fastai project and add an upstream remote to pull changes as original project is updated.
#from Paperspace fastai directory (just for example)
cd ~
mkdir github
cd github
git clone https://github.com/ksferguson/fastai.git fastai
git remote -v
  • Go ahead and add config info (git will remind you first time you try to commit)
git config --global user.email "33105688+ksferguson@users.noreply.github.com"
git config --global user.name "ksferguson"
  • Add upstream remote to original project
git remote add upstream git://github.com/fastai/fastai.git
git remote -v
  1. When updates noticed/required from upstream
  • Update from upstream
git fetch upstream
git checkout master
git merge upstream/master
  • Push changes to your user account (to master or other branch per usual git workflow)
git push origin master
  1. Either repeat setups in #4 for other local copies and update as in #5, or for convenience, maintain one local repo (say on Paperspace machine) as the primary one, for others just pull after everything has been updated on github.com
#pull files from my latest branch
git pull origin ksf-dl1
  1. Fixup missing symbolic link(s)
cd ~/github/fastai/courses/dl1
ln -s ~/data data
  1. Kaggle CLI - Dataset Downloads
conda create --clone fastai --name kaggle
source activate kaggle
pip install kaggle
kaggle --help

#provision api key either from kaggle or copy from terminal on other machine
#warning - proceed at your own risk copying credentials to the cloud

scp ~/.kaggle/kaggle.json paperspace@ip:~/.kaggle/

#back on fastai machine
#secure key from other users 
chmod 600 /home/paperspace/.kaggle/kaggle.json

#search competitions on kaggle for datasets of interest, goto data tab and copy kaggle cli command

source activate kaggle
cd ~\.kaggle

#original Dogs vs Cats
kaggle competitions download -c dogs-vs-cats

#Dogs vs Cats Kernel Redux -at first glance, appears to be almost identical to original
kaggle competitions download -c dogs-vs-cats-redux-kernels-edition

#KMCL Challenge 1 - new, smaller, dogs vs cats dataset from Google
kaggle competitions download -c kmlc-challenge-1-cats-vs-dogs

#note these files download to ~/.kaggle/competitions (by default at least), so copy to your data folder

#copy (from directory containing "foldername")
#cp -avr foldername newtargetfolder
cp -avr kmlc-challenge-1-cats-vs-dogs ~/data

#untar - as needed
cd ~/data/kmlc-challenge-1-cats-vs-dogs
ls -la
#tar -xvzf /path/to/yourfile.tgz
tar -xvzf test.tgz
tar -xvzf train.tgz
tar -xvzf validation.tgz
#rename folder to match default
mv validation valid
  1. References

Installed cite2c to insert references from Zotero into Jupyter notebooks (https://github.com/takluyver/cite2c). See also http://blog.juliusschulz.de/blog/ultimate-ipython-notebook

pip install cite2c
python -m cite2c.install

Caveat: This extension only displays the references if the Jupyter host (e.g. my local/paperspace Jupyter notebook) has the cite2c CSS installed. To 'publish' either PDF & post or copy the references into a markdown cell before pushing to GitHub or other repo. This freezes the CSS generated-on-page-load text.

  1. fastai No such file or directory: '/home/paperspace/github/fastai/courses/dl1/fastai/weights/resnext_101_64x4d.pth'
#specific to my directory structure under 'github'
cd ~/github/fastai/fastai
wget http://files.fast.ai/models/weights.tgz
tar -xvzf weights.tgz
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment