Install working tensorflow or pytorch via standard conda environment workflow.
The recommended conda-based install process works smoothly:
$ # Create a fresh environment
$ conda create --name py37_torch python=3.7 --yes
$ # Activate new environment
$ source activate py37_torch
$ # Install tensorflow
$ conda install tensorflow --yes
$ # Install pytorch
$ conda install pytorch-cpu torchvision-cpu -c pytorch --yes
The gotcha is that when we try to then use the package we just installed, we get an GLIBC error like this:
$ python -c "import torch"
ImportError: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by .../site-packages/torch/lib/libshm.so)
Badness! Clearly, the current computing system doesn't have a recent-enough GLIBC. However, if this is a cluster computing system, you often don't have root access and can't easily upgrade the GLIBC.
Credit: StackOverflow answer by Theo T.
This is for Python 3.7 (works for 3.6 too!) (See an older list for Python 2.7 at bottom of this doc).
$ # Make a folder within the environment to hold useful things
$ mkdir -p /path/to/conda/envs/py37_torch1.0/custom_libs/
$ cd /path/to/conda/envs/py37_torch1.0/custom_libs/
$ # Get libc files (URL verified by MCH on 2019/08/21)
$ wget http://mirrors.kernel.org/ubuntu/pool/main/g/glibc/libc6_2.23-0ubuntu10_amd64.deb
$ wget http://mirrors.kernel.org/ubuntu/pool/main/g/glibc/libc6-dev_2.23-0ubuntu10_amd64.deb
$ # Unpack files into current directory (will create usr/ and lib/ and lib64/ folders)
$ ar p libc6_2.23-0ubuntu10_amd64.deb data.tar.xz | tar xvJ
$ ar p libc6-dev_2.23-0ubuntu10_amd64.deb data.tar.xz | tar xvJ
What have we accomplished? You should have some new folders in your current directory, labeld usr/
and lib/
and lib64/
.
We can verify that before, we had an OLD libc, and now we have a shiny new one!
$ strings /lib/libc.so.6 | grep GLIBC_2. | tail -n3
GLIBC_2.10
GLIBC_2.11
GLIBC_2.12
$ strings lib/x86_64-linux-gnu/libc.so.6 | grep GLIBC_2 | tail -n3
GLIBC_2.18
GLIBC_2.22
GLIBC_2.23
# Get libstdc++ (URL verified by MCH on 2019/02/18)
wget ftp://195.220.108.108/linux/mageia/distrib/4/x86_64/media/core/updates/libstdc++6-4.8.2-3.2.mga4.x86_64.rpm
# Alternative URL:
# wget http://ftp.riken.jp/Linux/scientific/6.0/x86_64/os/Packages/libstdc++-4.4.4-13.el6.x86_64.rpm
# Unpack into current directory (will add content to lib/ and lib64/ folders)
rpm2cpio libstdc++6-4.8.2-3.2.mga4.x86_64.rpm | cpio -idmv
Step 2: Use patchelf to make your python install use these userspace libraries instead of the system defaults
Credit: Stackoverview answer by Evalds Urtans
# Be sure correct environment is active
$ source activate py37_torch
# Install patchelf
(py37_torch) $ conda install patchelf -c conda-forge --yes
Step 2b: Use attached script to alter the conda env's python
executable to use the custom GLIBC libraries
(py37_torch) $ bash rewrite_python_exe_glibc_with_patchelf.sh
$ # Make a folder within the environment to hold useful things
$ mkdir -p /path/to/conda/envs/py27_torch1.0/custom_libs/
$ cd /path/to/conda/envs/py27_torch1.0/custom_libs/
$ # Get libc files (URL verified by MCH on 2019/02/18)
$ wget https://launchpadlibrarian.net/137699828/libc6_2.17-0ubuntu5_amd64.deb
$ wget https://launchpadlibrarian.net/137699829/libc6-dev_2.17-0ubuntu5_amd64.deb
$ # Unpack files into current directory (will create usr/ and lib/ and lib64/ folders)
$ ar p libc6_2.17-0ubuntu5_amd64.deb data.tar.gz | tar zx
$ ar p libc6-dev_2.17-0ubuntu5_amd64.deb data.tar.gz | tar zx
Hi, I just tried your solution to modify my anaconda environment on the university cluster. Things went well, but after running the patchelf, the PyTorch package cannot recognize the CUDA drive anymore.
import torch
print(torch.cuda.device_count()) # --> 0
print(torch.cuda.is_available()) # --> False
print(torch.version.cuda) # --> 11.3
Do you have any idea what is happening? Thank you.