Birch-san/CUDA-12-1-1-pytorch.md

## CUDA-12-1-1-pytorch.md

      
    Raw
  

              CUDA-12-1-1-pytorch.md
            
          
    Installing CUDA 12.1.1 + PyTorch nightly + Python 3.10 on Ubuntu 22.10

Should you keep your NVIDIA driver?

CUDA 12.1.1 toolkit is gonna offer to install Nvidia driver 530 for us. It's from New Feature branch. It's likely to be newer than the default Nvidia driver you would've installed via apt-get (apt would prefer to give you 525, i.e. Production Branch).
If you're confident that you already have a new enough Nvidia driver for CUDA 12.1.1, and you'd like to keep your driver: feel free to skip this "uninstall driver" step.
But if you're not sure, or you know your driver is too old: let's uninstall it. CUDA will install a new driver for us later.
Here's how to uninstall Nvidia drivers if you installed via apt:
sudo apt-get --purge remove "*nvidia*"
You don't need to uninstall CUDA toolkit — you can have multiple versions of CUDA; they're installed to a folder named according to version number.

Nvidia drivers are backwards-compatible, so latest Nvidia driver supports both new and old CUDA.
Reboot. Read the next step first in case you lose graphics.
When I rebooted after uninstalling drivers: all I see is a black screen with a blinking white cursor

Press Alt+F3 to get an interactive shell session.
Decide whether you want to give up (see next step), or continue (see "Install CUDA toolkit").
Something went horribly wrong and I want to go back to stable drivers installed via apt

apt is gonna want to compile kernel modules for you. Make sure gcc 12 is the default gcc for your OS, or select gcc 12 explicitly.
Install drivers:
apt install nvidia-driver-525 nvidia-dkms-525
525 is the current Production Branch driver. If something newer comes out, you'll be able to check version numbers on Nvidia's site:

https://www.nvidia.com/Download/index.aspx

It would be better to check what's available for your OS by searching Ubuntu's apt repository but I don't know how.
Installing the drivers failed!

Tell it to try reinstalling, but this time save the build output to a log.
sudo apt -y reinstall nvidia-driver-525 nvidia-dkms-525 | tee nvidia.log
Read the log file:
less nvidia.log
If compilation failed, it's usually because the wrong version of gcc was selected.

It may refuse to retry because an error log exists from the last run. Find the log file it's talking about and delete it, then try reinstalling again.
Install CUDA Toolkit

Find latest version (e.g. 12.1.1)

Determine latest available CUDA toolkit:

https://developer.nvidia.com/cuda-downloads?target_os=Linux

In this case we see 12.1.1.
Install

CUDA is gonna want to compile kernel modules for you. Make sure gcc 12 is the default gcc for your OS, or select gcc 12 explicitly.
Install CUDA 12.1.1 (use the runfile install):

https://developer.nvidia.com/cuda-12-1-1-download-archive
I picked "Linux > x86_64 > Ubuntu > 22.04 > runfile (local)", which gave me these instructions:
wget https://developer.download.nvidia.com/compute/cuda/12.1.1/local_installers/cuda_12.1.1_530.30.02_linux.run
sudo sh cuda_12.1.1_530.30.02_linux.run
If you really like the Nvidia driver you got from apt-get (and you ignored me when I told you to uninstall it): you can tell the installer to continue, without replacing your current driver.

This is fine, so long as your driver is sufficiently new. It wants 530 (at the time of writing).
Reboot.
Install CuDNN

Maybe this is only required if you're gonna build PyTorch from source? Check next step to see your options for installing from a wheel instead.
Login/register to your Nvidia account:

https://developer.nvidia.com/cudnn
Download CuDNN

Latest version at time of writing was 8.9.0:

https://developer.nvidia.com/rdp/cudnn-download
Install CuDNN:

https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html#installlinux-deb

I grabbed the "Local Installer for Ubuntu22.04 x86_64 (Deb)".
If you're not sure about the instructions on that page, here's what I did:
sudo dpkg -i cudnn-local-repo-ubuntu2204-8.9.0.131_1.0-1_amd64.deb
sudo cp /var/cudnn-local-repo-ubuntu2204-8.9.0.131/cudnn-local-D7522631-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get install libcudnn8=8.9.0.131-1+cuda12.1
sudo apt-get install libcudnn8-dev=8.9.0.131-1+cuda12.1
Will I need to compile PyTorch from source?

There is a CUDA 12.1 wheel of pytorch available on conda. I tried installing this:
conda create -n p310-cu121-whl -c pytorch-nightly pytorch=*=py3.10_cuda12.1_cudnn8.8.1_0
…but conda's env-solver told me there was a conflict (with glibc?), so that didn't work.
A wheel is available on download.pytorch.org. Maybe you could install via pip?
After this, you might want torchvision.
I don't see a torchvision distribution on conda or on download.pytorch.org).
Anyway, I wrote instructions from compiling from source, so that's what you'll get.
Install conda

If you have conda already, or if you prefer to manage your Python environment some other way: you can skip this step.
One advantage of Conda is that it gives us a convenient way to get Python 3.11 (there's no deadsnakes distribution for Ubuntu 22.10), but we may prefer 3.10 anyway because of support for torch.compile().
Download:

https://www.anaconda.com/products/distribution
Install:
bash Anaconda-latest-Linux-x86_64.sh
eval "$(/home/birch/anaconda3/bin/conda shell.bash hook)"
conda config --set auto_activate_base false
conda init
Now let's create a conda env. We'll have to pick which version of Python we want.
Yes, you can create both environments (Python 3.10 and 3.11), and activate whichever you prefer for the task you're doing.
We are lucky that there is a magma-cuda121 conda package.

But if you're trying to apply these instructions for some newer CUDA, you may need to build magma from source.
Create conda env (Python 3.10)

Prefer Python 3.10 if you intend to use torch.compile() to speed up training.
We add the pytorch channel to get magma-cuda121, not to get pytorch itself — we'll build PyTorch ourselves from source.
We'll install all the dependencies required to compile PyTorch from source:
conda create -n p310-cu121 -c nvidia/label/cuda-12.1.1 -c pytorch -c defaults magma-cuda121 astunparse numpy ninja pyyaml setuptools cmake typing_extensions six requests dataclasses mkl mkl-include python=3.10
Activate your Python 3.10 conda environment:
conda activate p310-cu121
OR Create conda env (Python 3.11)

Prefer Python 3.11 if you know you will not be using torch.compile(), and you prefer a faster JIT (e.g. for fast start-up of scripts, and better-performing Python scripts).
We'll add the conda-forge channel, because it gives us a way to download Python 3.11.

If we were on Ubuntu 22.04, the standard way would be to install Python via the deadsnakes snap, but that's not available  for Ubuntu 22.10.
conda create -n p311-cu121 -c nvidia/label/cuda-12.1.1 -c conda-forge -c pytorch -c defaults magma-cuda121 astunparse numpy ninja pyyaml setuptools cmake typing_extensions six requests dataclasses mkl mkl-include python=3.11.1
Activate your Python 3.11 conda environment:
conda activate p311-cu121
Build PyTorch from source

Activate your conda environment, if you haven't done so already.
Make sure gcc 12 is the default gcc for your OS, or select gcc 12 explicitly.
Look up your CUDA Architecture, e.g. 4090 has architecture 8.9. Use this to determine the TORCH_CUDA_ARCH_LIST variable below.
If you want to install TensorRT too: You'll need to do your own research on that (I'm not sure I see a CUDA 12.1 distribution available yet). It's for exporting a model and massively optimizing it for inference.
We disable maybe-uninitialized and uninitialized warnings from gcc due to this false-positive warning-as-error from gcc 12.1 on AMD processors:

pytorch/pytorch#77939
git clone --depth 1 --recursive https://github.com/pytorch/pytorch.git
cd pytorch
CUDA_DIR=/usr/local/cuda-12.1
CXXFLAGS='-Wno-maybe-uninitialized -Wno-uninitialized -Wno-free-nonheap-object -Wno-nonnull' CFLAGS='-Wno-maybe-uninitialized -Wno-uninitialized -Wno-free-nonheap-object -Wno-nonnull' USE_ROCM=0 TORCH_CUDA_ARCH_LIST=8.9 PATH="$CUDA_DIR/bin:$PATH" LD_LIBRARY_PATH=$CUDA_DIR/lib64 python setup.py develop
# if you need to try again: git clean -dfx
something weird happened. if we try to use the pytorch we just compiled, we get an ImportError:
python -c 'import torch'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/birch/git/pytorch/torch/__init__.py", line 229, in <module>
    from torch._C import *  # noqa: F403
ImportError: /home/birch/anaconda3/envs/p310-cu121/bin/../lib/libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by /home/birch/git/pytorch/torch/lib/libtorch_python.so)

I think the system glibc, or maybe the glibc in our conda environment… is not as new as the glibc that gcc12 linked libtorch_python.so against?
you can override Linux's load-time linker when you launch python, to load the newer glibc that we linked torch against:
LD_PRELOAD=$(gcc -print-file-name=libstdc++.so.6) python -c 'import torch; print(torch.__version__); print(torch.version.cuda)'
2.1.0a0+gitb8d7a28
12.1

this is not my proudest moment.
Build Torchvision from source

Activate your conda environment, if you haven't done so already.
Make sure gcc 12 is the default gcc for your OS, or select gcc 12 explicitly.
Look up your CUDA Architecture, e.g. 4090 has architecture 8.9. Use this to determine the TORCH_CUDA_ARCH_LIST variable below.
You'll, uh, have to override the load-time linker again because this thing is gonna import torch, or rather our cursed libtorch_python.so.
git clone --depth 1 https://github.com/pytorch/vision.git
cd vision
CUDA_DIR=/usr/local/cuda-12.1
LD_PRELOAD=$(gcc -print-file-name=libstdc++.so.6) TORCH_CUDA_ARCH_LIST=8.9 PATH="$CUDA_DIR/bin:$PATH" LD_LIBRARY_PATH="$CUDA_DIR/lib64" python setup.py install
Build xformers from source:

You can skip this step if you're happy to use pytorch's built-in flash attention instead.
Activate your conda environment, if you haven't done so already.
Make sure gcc 12 is the default gcc for your OS, or select gcc 12 explicitly.
Look up your CUDA Architecture, e.g. 4090 has architecture 8.9. Use this to determine the TORCH_CUDA_ARCH_LIST variable below.
git clone --depth 1 --recursive https://github.com/facebookresearch/xformers.git
cd xformers
pip install ninja
CUDA_DIR=/usr/local/cuda-12.1
LD_PRELOAD=$(gcc -print-file-name=libstdc++.so.6) TORCH_CUDA_ARCH_LIST=8.9 PATH="$CUDA_DIR/bin:$PATH" LD_LIBRARY_PATH="$CUDA_DIR/lib64" python setup.py install
Build NVIDIA TransformerEngine from source

Activate your conda environment, if you haven't done so already.
Make sure gcc 12 is the default gcc for your OS, or select gcc 12 explicitly.
Look up your CUDA Architecture, e.g. 4090 has architecture 8.9. Use this to determine the TORCH_CUDA_ARCH_LIST variable below.
git clone --depth 1 --recursive https://github.com/NVIDIA/TransformerEngine.git
cd TransformerEngine
pip install ninja
CUDA_DIR=/usr/local/cuda-12.1
LD_PRELOAD=$(gcc -print-file-name=libstdc++.so.6) TORCH_CUDA_ARCH_LIST=8.9 PATH="$CUDA_DIR/bin:$PATH" LD_LIBRARY_PATH="$CUDA_DIR/lib64" python setup.py install
Did compilation fail? Perhaps with error expected template-name before ‘<’ token?
It's a problem with CUDA 12's nvcc compiling the pybind11 headers in pytorch, wherever it's being included from. the compiler error will tell you the full path to where it's locating cast.h from. in my case it was including it from my git clone'd pytorch directory, /home/birch/git/pytorch/torch/include/pybind11/detail/../cast.h. but depending on how you installed everything, perhaps it'd locate it from somewhere more like /usr/include/pybind11/cast.h. anyway, find that file from the error, then edit it:
torch/include/pybind11/cast.h#L39
// Shortcut for calling a caster's `cast_op_type` cast operator for casting a type_caster to a T
template <typename T>
typename make_caster<T>::template cast_op_type<T> cast_op(make_caster<T> &caster) {
-     return caster.operator typename make_caster<T>::template cast_op_type<T>();
+     return caster;
}
then try compiling again:
LD_PRELOAD=$(gcc -print-file-name=libstdc++.so.6) TORCH_CUDA_ARCH_LIST=8.9 PATH="$CUDA_DIR/bin:$PATH" LD_LIBRARY_PATH="$CUDA_DIR/lib64" python setup.py install