PeterSprague/FastAI_on_local_Jetson_as_host_user.txt

## FastAI_on_local_Jetson_as_host_user.txt
# Install and access FastAI course locally on a Nvidia Jetson natively as user

Steps
--------
1) set up host Jetson with newest v32.6.1 18.04 L4T w/ Jetpack
2) build custom Python3.6 Fastai Jupyter kernel on Jetson host to access full Cuda, etc. header files for compiling
3) test

# References
https://docs.fast.ai/
https://github.com/fastai
https://docs.nvidia.com/deeplearning/frameworks/index.html
https://catalog.ngc.nvidia.com/orgs/nvidia/containers/l4t-ml
https://qooba.net/2020/05/10/fastai-with-tensorrt-on-jetson-nano/
https://github.com/streicherlouw/fastai2_jetson_nano
https://forums.fast.ai/t/platform-nvidia-jetson-xavier-nx/72119
https://www.pyimagesearch.com/2019/05/06/getting-started-with-the-nvidia-jetson-nano/
https://jkjung-avt.github.io/setting-up-xavier-nx/
https://pypi.org/project/jetson-stats/

# testing on Jetson TX2 & Nano 4Gb
# new flash with L4T 32.6.1 with Jetpack 4.6
# TX2 - 8Gb ram & 250Gb SSD
# 1st training cell imports fastai & downloads images/model, then errors, needs debugging

# Nano - 32Gb SD card ==> errors, needs debugging
# use >=64Gb SD card for ML, 32Gb runs out of space quickly
# training will be very slow for most models on Nano

# on Jetson [ssh or local console as "user"]
$ sudo apt update && sudo apt upgrade -y
$ sudo apt install -y curl htop vim git python3-pip
$ sudo apt install -y build-essential cmake
$ sudo apt install -y libatlas-base-dev gfortran
$ sudo apt install -y libhdf5-serial-dev hdf5-tools
$ sudo apt install -y libopenblas-base libopenblas-dev libopenmpi-dev

# fix docker error when using Jetpack v4.6
https://github.com/dusty-nv/jetson-containers/issues/108
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
   && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
   && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
$ sudo apt-get update && apt upgrade -y
# $ sudo apt-get install nvidia-docker2=2.8.0-1

# ========================================================
# for Python3.6 installed in user
# doesn't work from a venv, may work with work-around
# https://stackoverflow.com/questions/55600132/installing-local-packages-with-python-virtualenv-system-site-packages#55600285
# installs some out of date packages
# don't just blindly copy & run, versions & options change
# items in [] are notes or options, don't include the brackets in cmd
#-------------------------------------
$ mkdir -p ~/Development/Tools/jupyter_kernels
$ cd ~/Development/Tools/jupyter_kernels
# insert in .bashrc or run from cmd line each time
$ PATH=/home/user/.local/bin:$PATH

$ pip3 install -U --user pip setuptools wheel
$ pip3 install --user "Cython>=0.25.0,<3.0" --no-cache-dir [--no-binary :all:]

# install matplotlib from source, need <v3.4 for python3.6 [current v3.5.1]
https://matplotlib.org/stable/users/installing/index.html
https://github.com/matplotlib/matplotlib/issues/16027
https://github.com/matplotlib/matplotlib/blob/v3.3.4/INSTALL.rst
# need to build whl
$ sudo apt install libfreetype6-dev
$ pip3 install --user numpy>=1.15 setuptools cycler>=0.10.0 python-dateutil>=2.1 kiwisolver>=1.0.0 pillow>=6.2 pyparsing>=2.0.3
$ cd ~/Software-local/Matplotlib
$ git clone
$ cd matplotlib
$ git checkout v3.3.4
$ pip3 install --user .

$ cd ~/Development/Tools/jupyter_kernels
$ pip3 install --user jupyterlab ipywidgets pudb

# initial test of fastai on TX2
# torchvision 0.11.2 errors on import
https://discuss.pytorch.org/t/failed-to-load-image-python-extension-could-not-find-module/140278
https://github.com/pytorch/vision/blob/main/torchvision/io/image.py
https://forums.developer.nvidia.com/t/pytorch-for-jetson-version-1-10-now-available/72048

# install older version of torchvision for Jetsons, needs previous version of torch
https://pytorch.org/get-started/locally/
https://forums.developer.nvidia.com/t/pytorch-for-jetson-version-1-10-now-available/72048
https://stackoverflow.com/questions/62407851/pytorch-doesnt-find-a-cuda-device#
# download Jetson torch v1.10.0 whl [only for Python3.6]
$ wget https://nvidia.box.com/shared/static/fjtbno0vpo676a25cgvuqc1wty0fkkg6.whl -O  torch-1.10.0-cp36-cp36m-linux_aarch64.whl
$ pip3 install --user torch-1.10.0-cp36-cp36m-linux_aarch64.whl
$ pip3 install --user torchvision==0.11.1
# forces install of numpy v1.19.5
# reinstall previous numpy versions if required instead because of errors
$ pip3 install "numpy==1.21.5" --no-cache-dir [--no-binary :all: - errors w/ --no-binary]

# test in python console
$ python
>> import torch
>> x = torch.rand(5, 3)
>> print(x)
# test gpu driver & cuda availability
>> import torch
>> torch.cuda.is_available()
==> works

# test numpy
https://numpy.org/doc/stable/reference/testing.html
$ python
>> import numpy
>> numpy.test(label='slow')
==> no errors

# break this out because getting errors when just using "pip install fastai"
# check for errors and build individual modules from source that failed
# check on latest release version of thinc for Python3.6 [v7.4.3 for 3.6]

# preinstall blis - build from source for arm64, host Jetson has Cuda development env
https://github.com/explosion/spaCy/issues/3861
$ BLIS_ARCH="generic" pip3 --no-cache-dir install blis --user --no-binary blis [installed v0.7.3]

# generate thinc requirements.txt
https://github.com/explosion/thinc/blob/v7.4.3/requirements.txt
$ wget -c https://raw.githubusercontent.com/explosion/thinc/v7.4.3/requirements.txt -O requirements-thinc-v8.0.13.txt
$ pip3 install --user -r requirements-thinc-v7.4.3.txt --no-cache-dir [--no-binary :all:]
$ pip3 install --user "thinc[cuda102,torch]==7.4.3" --no-cache-dir --no-deps [installed 7.4.3]
# don't use "thinc --pre", fails because of murmurhash or ?pip conflicts with Cpython versions
# see: https://github.com/pypa/pip/issues/10222

https://github.com/explosion/spaCy/issues/3861
$ pip3 install spacy[cuda]==2.3.4 --user --no-cache-dir [installed v2.3.4]

# check on latest release version of fastai for Python3.6 [v2.5.3 for 3.6 as of Jan-05-2022]
# generate fastai requirements.txt
https://github.com/fastai/fastai/blob/2.5.3/settings.ini
$ vi requirements-fastai-v2.5.3.txt
# need to preinstall scipy & scikit-learn to user to have newer versions
# import fastai was erroring from scikit-learn
$ pip3 install -U scipy --user --no-cache-dir no-binary scipy [installed v1.5.4]
$ pip3 install -U scikit-learn --user --no-cache-dir --no-binary scikit-learn [installed v0.24.2]
$ pip3 install --user -r requirements-fastai-v2.5.3.txt --no-cache-dir
# error: can't install any version of matplotlib
# built matplotlib from source, moved step to above

$ pip3 install --user fastai --no-cache-dir --no-deps

# install Jupyterlab start script
# include LD_PRELOAD in start script, see:
OSError: /usr/lib/aarch64-linux-gnu/libgomp.so.1: cannot allocate memory in static TLS block
--> https://forums.developer.nvidia.com/t/what-is-cannot-allocate-memory-in-static-tls-block/169225
https://forums.developer.nvidia.com/t/oserror-usr-lib-aarch64-linux-gnu-libgomp-so-1-only-in-jupyter-notebook/174881/8
$ cd ~/Development/Tools
$ echo '#!/usr/bin/env bash
    jupyter lab --ip=0.0.0.0 --port=9743 --allow-root --no-browser'
    LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libgomp.so.1:$LD_PRELOAD >> ~/Development/Tools/run_jupyter.sh
$ chmod 755 ~/Development/Tools/run_jupyter.sh

# git clone FastAI book & course material
$ mkdir -p ~/Development/ML/projects/fastai/course
$ cd ~/Development/ML/projects/fastai/course
$ git clone https://github.com/fastai/course20.git
$ git clone https://github.com/fastai/fastbook.git

# create 01_intro_cell-1 test script for debugging
$ cd ~/Development/ML/projects/fastai/course/fastbook
$ vi 01_intro_cell-01_test.py
from fastai.vision.all import *
path = untar_data(URLs.PETS)/'images'

def is_cat(x): return x[0].isupper()

dls = ImageDataLoaders.from_name_func(path, get_image_files(path), valid_pct=0.2, seed=42, label_func=is_cat, item_tfms=Resize(224))
learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(1)


# run debugger in console
$ pudb3  01_intro_cell-01_test.py
step through code
==> works, no error: illegal instruction core dump

# test in Jupyter
# start Jupyterlab on port 9743
$ cd ~/Development/ML/projects/fastai/course/fastbook
$ ~/Development/Tools/run_jupyter.sh
# access from a browser
http://x.x.x.x:9743 --> fastbook --> 01_intro.ipynb
use Python 3 kernel
run "first training" cell
==> runs, no errors

#=========================================================================
	# Install and access FastAI course locally on a Nvidia Jetson natively as user

	Steps
	--------
	1) set up host Jetson with newest v32.6.1 18.04 L4T w/ Jetpack
	2) build custom Python3.6 Fastai Jupyter kernel on Jetson host to access full Cuda, etc. header files for compiling
	3) test

	# References
	https://docs.fast.ai/
	https://github.com/fastai
	https://docs.nvidia.com/deeplearning/frameworks/index.html
	https://catalog.ngc.nvidia.com/orgs/nvidia/containers/l4t-ml
	https://qooba.net/2020/05/10/fastai-with-tensorrt-on-jetson-nano/
	https://github.com/streicherlouw/fastai2_jetson_nano
	https://forums.fast.ai/t/platform-nvidia-jetson-xavier-nx/72119
	https://www.pyimagesearch.com/2019/05/06/getting-started-with-the-nvidia-jetson-nano/
	https://jkjung-avt.github.io/setting-up-xavier-nx/
	https://pypi.org/project/jetson-stats/

	# testing on Jetson TX2 & Nano 4Gb
	# new flash with L4T 32.6.1 with Jetpack 4.6
	# TX2 - 8Gb ram & 250Gb SSD
	# 1st training cell imports fastai & downloads images/model, then errors, needs debugging

	# Nano - 32Gb SD card ==> errors, needs debugging
	# use >=64Gb SD card for ML, 32Gb runs out of space quickly
	# training will be very slow for most models on Nano

	# on Jetson [ssh or local console as "user"]
	$ sudo apt update && sudo apt upgrade -y
	$ sudo apt install -y curl htop vim git python3-pip
	$ sudo apt install -y build-essential cmake
	$ sudo apt install -y libatlas-base-dev gfortran
	$ sudo apt install -y libhdf5-serial-dev hdf5-tools
	$ sudo apt install -y libopenblas-base libopenblas-dev libopenmpi-dev

	# fix docker error when using Jetpack v4.6
	https://github.com/dusty-nv/jetson-containers/issues/108
	$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
	&& curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey \| sudo apt-key add - \
	&& curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list \| sudo tee /etc/apt/sources.list.d/nvidia-docker.list
	$ sudo apt-get update && apt upgrade -y
	# $ sudo apt-get install nvidia-docker2=2.8.0-1

	# ========================================================
	# for Python3.6 installed in user
	# doesn't work from a venv, may work with work-around
	# https://stackoverflow.com/questions/55600132/installing-local-packages-with-python-virtualenv-system-site-packages#55600285
	# installs some out of date packages
	# don't just blindly copy & run, versions & options change
	# items in [] are notes or options, don't include the brackets in cmd
	#-------------------------------------
	$ mkdir -p ~/Development/Tools/jupyter_kernels
	$ cd ~/Development/Tools/jupyter_kernels
	# insert in .bashrc or run from cmd line each time
	$ PATH=/home/user/.local/bin:$PATH

	$ pip3 install -U --user pip setuptools wheel
	$ pip3 install --user "Cython>=0.25.0,<3.0" --no-cache-dir [--no-binary :all:]

	# install matplotlib from source, need <v3.4 for python3.6 [current v3.5.1]
	https://matplotlib.org/stable/users/installing/index.html
	https://github.com/matplotlib/matplotlib/issues/16027
	https://github.com/matplotlib/matplotlib/blob/v3.3.4/INSTALL.rst
	# need to build whl
	$ sudo apt install libfreetype6-dev
	$ pip3 install --user numpy>=1.15 setuptools cycler>=0.10.0 python-dateutil>=2.1 kiwisolver>=1.0.0 pillow>=6.2 pyparsing>=2.0.3
	$ cd ~/Software-local/Matplotlib
	$ git clone
	$ cd matplotlib
	$ git checkout v3.3.4
	$ pip3 install --user .

	$ cd ~/Development/Tools/jupyter_kernels
	$ pip3 install --user jupyterlab ipywidgets pudb

	# initial test of fastai on TX2
	# torchvision 0.11.2 errors on import
	https://discuss.pytorch.org/t/failed-to-load-image-python-extension-could-not-find-module/140278
	https://github.com/pytorch/vision/blob/main/torchvision/io/image.py
	https://forums.developer.nvidia.com/t/pytorch-for-jetson-version-1-10-now-available/72048

	# install older version of torchvision for Jetsons, needs previous version of torch
	https://pytorch.org/get-started/locally/
	https://forums.developer.nvidia.com/t/pytorch-for-jetson-version-1-10-now-available/72048
	https://stackoverflow.com/questions/62407851/pytorch-doesnt-find-a-cuda-device#
	# download Jetson torch v1.10.0 whl [only for Python3.6]
	$ wget https://nvidia.box.com/shared/static/fjtbno0vpo676a25cgvuqc1wty0fkkg6.whl -O torch-1.10.0-cp36-cp36m-linux_aarch64.whl
	$ pip3 install --user torch-1.10.0-cp36-cp36m-linux_aarch64.whl
	$ pip3 install --user torchvision==0.11.1
	# forces install of numpy v1.19.5
	# reinstall previous numpy versions if required instead because of errors
	$ pip3 install "numpy==1.21.5" --no-cache-dir [--no-binary :all: - errors w/ --no-binary]

	# test in python console
	$ python
	>> import torch
	>> x = torch.rand(5, 3)
	>> print(x)
	# test gpu driver & cuda availability
	>> import torch
	>> torch.cuda.is_available()
	==> works

	# test numpy
	https://numpy.org/doc/stable/reference/testing.html
	$ python
	>> import numpy
	>> numpy.test(label='slow')
	==> no errors

	# break this out because getting errors when just using "pip install fastai"
	# check for errors and build individual modules from source that failed
	# check on latest release version of thinc for Python3.6 [v7.4.3 for 3.6]

	# preinstall blis - build from source for arm64, host Jetson has Cuda development env
	https://github.com/explosion/spaCy/issues/3861
	$ BLIS_ARCH="generic" pip3 --no-cache-dir install blis --user --no-binary blis [installed v0.7.3]

	# generate thinc requirements.txt
	https://github.com/explosion/thinc/blob/v7.4.3/requirements.txt
	$ wget -c https://raw.githubusercontent.com/explosion/thinc/v7.4.3/requirements.txt -O requirements-thinc-v8.0.13.txt
	$ pip3 install --user -r requirements-thinc-v7.4.3.txt --no-cache-dir [--no-binary :all:]
	$ pip3 install --user "thinc[cuda102,torch]==7.4.3" --no-cache-dir --no-deps [installed 7.4.3]
	# don't use "thinc --pre", fails because of murmurhash or ?pip conflicts with Cpython versions
	# see: https://github.com/pypa/pip/issues/10222

	https://github.com/explosion/spaCy/issues/3861
	$ pip3 install spacy[cuda]==2.3.4 --user --no-cache-dir [installed v2.3.4]

	# check on latest release version of fastai for Python3.6 [v2.5.3 for 3.6 as of Jan-05-2022]
	# generate fastai requirements.txt
	https://github.com/fastai/fastai/blob/2.5.3/settings.ini
	$ vi requirements-fastai-v2.5.3.txt
	# need to preinstall scipy & scikit-learn to user to have newer versions
	# import fastai was erroring from scikit-learn
	$ pip3 install -U scipy --user --no-cache-dir no-binary scipy [installed v1.5.4]
	$ pip3 install -U scikit-learn --user --no-cache-dir --no-binary scikit-learn [installed v0.24.2]
	$ pip3 install --user -r requirements-fastai-v2.5.3.txt --no-cache-dir
	# error: can't install any version of matplotlib
	# built matplotlib from source, moved step to above

	$ pip3 install --user fastai --no-cache-dir --no-deps

	# install Jupyterlab start script
	# include LD_PRELOAD in start script, see:
	OSError: /usr/lib/aarch64-linux-gnu/libgomp.so.1: cannot allocate memory in static TLS block
	--> https://forums.developer.nvidia.com/t/what-is-cannot-allocate-memory-in-static-tls-block/169225
	https://forums.developer.nvidia.com/t/oserror-usr-lib-aarch64-linux-gnu-libgomp-so-1-only-in-jupyter-notebook/174881/8
	$ cd ~/Development/Tools
	$ echo '#!/usr/bin/env bash
	jupyter lab --ip=0.0.0.0 --port=9743 --allow-root --no-browser'
	LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libgomp.so.1:$LD_PRELOAD >> ~/Development/Tools/run_jupyter.sh
	$ chmod 755 ~/Development/Tools/run_jupyter.sh

	# git clone FastAI book & course material
	$ mkdir -p ~/Development/ML/projects/fastai/course
	$ cd ~/Development/ML/projects/fastai/course
	$ git clone https://github.com/fastai/course20.git
	$ git clone https://github.com/fastai/fastbook.git

	# create 01_intro_cell-1 test script for debugging
	$ cd ~/Development/ML/projects/fastai/course/fastbook
	$ vi 01_intro_cell-01_test.py
	from fastai.vision.all import *
	path = untar_data(URLs.PETS)/'images'

	def is_cat(x): return x[0].isupper()

	dls = ImageDataLoaders.from_name_func(path, get_image_files(path), valid_pct=0.2, seed=42, label_func=is_cat, item_tfms=Resize(224))
	learn = cnn_learner(dls, resnet34, metrics=error_rate)
	learn.fine_tune(1)


	# run debugger in console
	$ pudb3 01_intro_cell-01_test.py
	step through code
	==> works, no error: illegal instruction core dump

	# test in Jupyter
	# start Jupyterlab on port 9743
	$ cd ~/Development/ML/projects/fastai/course/fastbook
	$ ~/Development/Tools/run_jupyter.sh
	# access from a browser
	http://x.x.x.x:9743 --> fastbook --> 01_intro.ipynb
	use Python 3 kernel
	run "first training" cell
	==> runs, no errors

	#=========================================================================