Skip to content

Instantly share code, notes, and snippets.

@michaelchughes
Last active May 30, 2024 01:16
Show Gist options
  • Save michaelchughes/85287f1c6f6440c060c3d86b4e7d764b to your computer and use it in GitHub Desktop.
Save michaelchughes/85287f1c6f6440c060c3d86b4e7d764b to your computer and use it in GitHub Desktop.
Fixes for GLIBC errors when installing tensorflow or pytorch on older Red Hat or CentOS cluster environments

Goal

Install working tensorflow or pytorch via standard conda environment workflow.

Basic Setup : Install pytorch in a fresh conda environment

The recommended conda-based install process works smoothly:

$ # Create a fresh environment
$ conda create --name py37_torch python=3.7 --yes

$ # Activate new environment
$ source activate py37_torch

$ # Install tensorflow
$ conda install tensorflow --yes

$ # Install pytorch 
$ conda install pytorch-cpu torchvision-cpu -c pytorch --yes

Roadblock

The gotcha is that when we try to then use the package we just installed, we get an GLIBC error like this:

$ python -c "import torch"
ImportError: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by .../site-packages/torch/lib/libshm.so)

Badness! Clearly, the current computing system doesn't have a recent-enough GLIBC. However, if this is a cluster computing system, you often don't have root access and can't easily upgrade the GLIBC.

Step 1: Install recent copies of glibc and libc++ in userspace

Credit: StackOverflow answer by Theo T.

Step 1a: (NEW FOR PYTHON 3.7) Download and unpack some pre-compiled GLIBC shared libraries

This is for Python 3.7 (works for 3.6 too!) (See an older list for Python 2.7 at bottom of this doc).

$ # Make a folder within the environment to hold useful things
$ mkdir -p /path/to/conda/envs/py37_torch1.0/custom_libs/
$ cd /path/to/conda/envs/py37_torch1.0/custom_libs/

$ # Get libc files (URL verified by MCH on 2019/08/21)
$ wget http://mirrors.kernel.org/ubuntu/pool/main/g/glibc/libc6_2.23-0ubuntu10_amd64.deb
$ wget http://mirrors.kernel.org/ubuntu/pool/main/g/glibc/libc6-dev_2.23-0ubuntu10_amd64.deb

$ # Unpack files into current directory (will create usr/ and lib/ and lib64/ folders)
$ ar p libc6_2.23-0ubuntu10_amd64.deb data.tar.xz | tar xvJ
$ ar p libc6-dev_2.23-0ubuntu10_amd64.deb data.tar.xz | tar xvJ

What have we accomplished? You should have some new folders in your current directory, labeld usr/ and lib/ and lib64/.

We can verify that before, we had an OLD libc, and now we have a shiny new one!

Check the OLD location of libc.so.6

$ strings /lib/libc.so.6 | grep GLIBC_2. | tail -n3
GLIBC_2.10
GLIBC_2.11
GLIBC_2.12

NEW version of libc.so.6 in working directory

$ strings lib/x86_64-linux-gnu/libc.so.6 | grep GLIBC_2 | tail -n3
GLIBC_2.18
GLIBC_2.22
GLIBC_2.23

Step 1b: Download and unpack some pre-compiled LIBSTDC++ shared libraries

# Get libstdc++ (URL verified by MCH on 2019/02/18)
wget ftp://195.220.108.108/linux/mageia/distrib/4/x86_64/media/core/updates/libstdc++6-4.8.2-3.2.mga4.x86_64.rpm

# Alternative URL:
# wget http://ftp.riken.jp/Linux/scientific/6.0/x86_64/os/Packages/libstdc++-4.4.4-13.el6.x86_64.rpm

# Unpack into current directory (will add content to lib/ and lib64/ folders)
rpm2cpio libstdc++6-4.8.2-3.2.mga4.x86_64.rpm | cpio -idmv

Step 2: Use patchelf to make your python install use these userspace libraries instead of the system defaults

Credit: Stackoverview answer by Evalds Urtans

Step 2a: Install patchelf into current conda env

# Be sure correct environment is active
$ source activate py37_torch

# Install patchelf
(py37_torch) $ conda install patchelf -c conda-forge --yes

Step 2b: Use attached script to alter the conda env's python executable to use the custom GLIBC libraries

(py37_torch) $ bash rewrite_python_exe_glibc_with_patchelf.sh

-- DEPRECATED --

Step 1a: (OLD FOR PYTHON 2.7) Download and unpack some pre-compiled GLIBC shared libraries

$ # Make a folder within the environment to hold useful things
$ mkdir -p /path/to/conda/envs/py27_torch1.0/custom_libs/
$ cd /path/to/conda/envs/py27_torch1.0/custom_libs/

$ # Get libc files (URL verified by MCH on 2019/02/18)
$ wget https://launchpadlibrarian.net/137699828/libc6_2.17-0ubuntu5_amd64.deb
$ wget https://launchpadlibrarian.net/137699829/libc6-dev_2.17-0ubuntu5_amd64.deb

$ # Unpack files into current directory (will create usr/ and lib/ and lib64/ folders)
$ ar p libc6_2.17-0ubuntu5_amd64.deb data.tar.gz | tar zx
$ ar p libc6-dev_2.17-0ubuntu5_amd64.deb data.tar.gz | tar zx
#!/usr/env bash
# TODO edit this line to specify location of new glibc
export GLIBC_PATH=/cluster/tufts/hugheslab/miniconda2/envs/ape/custom_libs/
export GLIBC_LD_PATH=$GLIBC_PATH/lib/x86_64-linux-gnu/ld-2.23.so
if [[ ! -f $GLIBC_LD_PATH ]]; then
echo "ERROR: Provided GLIBC_LD_PATH not valid"
exit
fi
echo "OVERWRITING PYTHON EXECUTABLE:"
python_exe=`which python`
echo $python_exe
IS_CONDA_ENV=`python -c "print('$python_exe'.count('/envs/') > 0)"`
echo "IS_CONDA_ENV: $IS_CONDA_ENV"
if [[ $IS_CONDA_ENV -ne 'True' ]]; then
echo "ERROR: Current python executable not in conda env. Will not alter to avoid problems."
exit
fi
CONDA_ENV_LIB=`python -c "print('$python_exe'.replace('/bin/python', '/lib'))"`
echo "CREATING BACKUP PYTHON"
python_tmp_exe=`python -c "print('$python_exe'.replace('python', 'python_backup'))"`
cp $python_exe $python_tmp_exe
echo "$python_tmp_exe"
rpath=$GLIBC_PATH/lib/x86_64-linux-gnu:$CONDA_ENV_LIB:/usr/lib64:/lib64:/lib
echo "CALLING PATCHELF on 'python' binary"
patchelf --set-interpreter $GLIBC_LD_PATH --set-rpath $rpath $python_exe
echo "DONE! patchelf complete"
@hzcheney
Copy link

@LitMSCTBB I think you should downgrade your python to 3.6 and try again.

@ZhuofanShen
Copy link

Hi, I just tried your solution to modify my anaconda environment on the university cluster. Things went well, but after running the patchelf, the PyTorch package cannot recognize the CUDA drive anymore.
import torch
print(torch.cuda.device_count()) # --> 0
print(torch.cuda.is_available()) # --> False
print(torch.version.cuda) # --> 11.3
Do you have any idea what is happening? Thank you.

@KleinWang
Copy link

Hi, thanks for your method. It almost works but I get the bug as follows. Do you have any idea to fix it? Thank you very much

(SG) klein@fawn:GPU$python -c 'import torch'
Traceback (most recent call last):
File "", line 1, in
File "/data/L/Brain/klein/anaconda3/envs/SG/lib/python3.10/site-packages/torch/init.py", line 198, in
_load_global_deps()
File "/data/L/Brain/klein/anaconda3/envs/SG/lib/python3.10/site-packages/torch/init.py", line 151, in _load_global_deps
ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
File "/data/L/Brain/klein/anaconda3/envs/SG/lib/python3.10/ctypes/init.py", line 374, in init
self._handle = _dlopen(self._name, mode)
OSError: /lib64/librt.so.1: symbol __vdso_clock_gettime, version GLIBC_PRIVATE not defined in file libc.so.6 with link time reference

@KleinWang
Copy link

If I import torch twice in a jupyter notebook, the second time will be successful. However torch.nn.NLLLoss() has bugs again.

(SG) klein@fawn:GPU$python
Python 3.10.4 (main, Mar 31 2022, 08:41:55) [GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

import torch
Traceback (most recent call last):
File "", line 1, in
File "/data/L/Brain/klein/anaconda3/envs/SG/lib/python3.10/site-packages/torch/init.py", line 198, in
_load_global_deps()
File "/data/L/Brain/klein/anaconda3/envs/SG/lib/python3.10/site-packages/torch/init.py", line 151, in _load_global_deps
ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
File "/data/L/Brain/klein/anaconda3/envs/SG/lib/python3.10/ctypes/init.py", line 374, in init
self._handle = _dlopen(self._name, mode)
OSError: /lib64/librt.so.1: symbol __vdso_clock_gettime, version GLIBC_PRIVATE not defined in file libc.so.6 with link time reference
import torch
torch.nn.NLLLoss()
Traceback (most recent call last):
File "", line 1, in
File "/data/L/Brain/klein/anaconda3/envs/SG/lib/python3.10/site-packages/torch/nn/modules/loss.py", line 207, in init
super(NLLLoss, self).init(weight, size_average, reduce, reduction)
File "/data/L/Brain/klein/anaconda3/envs/SG/lib/python3.10/site-packages/torch/nn/modules/loss.py", line 26, in init
self.register_buffer('weight', weight)
File "/data/L/Brain/klein/anaconda3/envs/SG/lib/python3.10/site-packages/torch/nn/modules/module.py", line 308, in register_buffer
elif not isinstance(name, torch._six.string_classes):
AttributeError: module 'torch' has no attribute '_six'

@zouguangxian
Copy link

based on this gist, I succeed to install python 3.8 with conda on CentOS 6. GLIBC 2.17 and patchelf are compiled from source code. rewrite_python_exe_glibc_with_patchelf.sh is rewritten with one-line command.

https://gist.github.com/zouguangxian/31856f63fe2ac1bad11f404728dfb305

@zwben
Copy link

zwben commented Jul 28, 2022

Hi, thanks for your method. It almost works but I get the bug as follows. Do you have any idea to fix it? Thank you very much

(SG) klein@fawn:GPU$python -c 'import torch' Traceback (most recent call last): File "", line 1, in File "/data/L/Brain/klein/anaconda3/envs/SG/lib/python3.10/site-packages/torch/init.py", line 198, in _load_global_deps() File "/data/L/Brain/klein/anaconda3/envs/SG/lib/python3.10/site-packages/torch/init.py", line 151, in _load_global_deps ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL) File "/data/L/Brain/klein/anaconda3/envs/SG/lib/python3.10/ctypes/init.py", line 374, in init self._handle = _dlopen(self._name, mode) OSError: /lib64/librt.so.1: symbol __vdso_clock_gettime, version GLIBC_PRIVATE not defined in file libc.so.6 with link time reference

I got the same problem. I installed the latest pytorch (1.12.0) which requires glibc > 2.27. I installed glibc 2.31 and libstdc++6-9.3.1, but got this problem when I import torch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment