Skip to content

Instantly share code, notes, and snippets.

@michaelchughes
Last active May 30, 2024 01:16
Show Gist options
  • Save michaelchughes/85287f1c6f6440c060c3d86b4e7d764b to your computer and use it in GitHub Desktop.
Save michaelchughes/85287f1c6f6440c060c3d86b4e7d764b to your computer and use it in GitHub Desktop.
Fixes for GLIBC errors when installing tensorflow or pytorch on older Red Hat or CentOS cluster environments

Goal

Install working tensorflow or pytorch via standard conda environment workflow.

Basic Setup : Install pytorch in a fresh conda environment

The recommended conda-based install process works smoothly:

$ # Create a fresh environment
$ conda create --name py37_torch python=3.7 --yes

$ # Activate new environment
$ source activate py37_torch

$ # Install tensorflow
$ conda install tensorflow --yes

$ # Install pytorch 
$ conda install pytorch-cpu torchvision-cpu -c pytorch --yes

Roadblock

The gotcha is that when we try to then use the package we just installed, we get an GLIBC error like this:

$ python -c "import torch"
ImportError: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by .../site-packages/torch/lib/libshm.so)

Badness! Clearly, the current computing system doesn't have a recent-enough GLIBC. However, if this is a cluster computing system, you often don't have root access and can't easily upgrade the GLIBC.

Step 1: Install recent copies of glibc and libc++ in userspace

Credit: StackOverflow answer by Theo T.

Step 1a: (NEW FOR PYTHON 3.7) Download and unpack some pre-compiled GLIBC shared libraries

This is for Python 3.7 (works for 3.6 too!) (See an older list for Python 2.7 at bottom of this doc).

$ # Make a folder within the environment to hold useful things
$ mkdir -p /path/to/conda/envs/py37_torch1.0/custom_libs/
$ cd /path/to/conda/envs/py37_torch1.0/custom_libs/

$ # Get libc files (URL verified by MCH on 2019/08/21)
$ wget http://mirrors.kernel.org/ubuntu/pool/main/g/glibc/libc6_2.23-0ubuntu10_amd64.deb
$ wget http://mirrors.kernel.org/ubuntu/pool/main/g/glibc/libc6-dev_2.23-0ubuntu10_amd64.deb

$ # Unpack files into current directory (will create usr/ and lib/ and lib64/ folders)
$ ar p libc6_2.23-0ubuntu10_amd64.deb data.tar.xz | tar xvJ
$ ar p libc6-dev_2.23-0ubuntu10_amd64.deb data.tar.xz | tar xvJ

What have we accomplished? You should have some new folders in your current directory, labeld usr/ and lib/ and lib64/.

We can verify that before, we had an OLD libc, and now we have a shiny new one!

Check the OLD location of libc.so.6

$ strings /lib/libc.so.6 | grep GLIBC_2. | tail -n3
GLIBC_2.10
GLIBC_2.11
GLIBC_2.12

NEW version of libc.so.6 in working directory

$ strings lib/x86_64-linux-gnu/libc.so.6 | grep GLIBC_2 | tail -n3
GLIBC_2.18
GLIBC_2.22
GLIBC_2.23

Step 1b: Download and unpack some pre-compiled LIBSTDC++ shared libraries

# Get libstdc++ (URL verified by MCH on 2019/02/18)
wget ftp://195.220.108.108/linux/mageia/distrib/4/x86_64/media/core/updates/libstdc++6-4.8.2-3.2.mga4.x86_64.rpm

# Alternative URL:
# wget http://ftp.riken.jp/Linux/scientific/6.0/x86_64/os/Packages/libstdc++-4.4.4-13.el6.x86_64.rpm

# Unpack into current directory (will add content to lib/ and lib64/ folders)
rpm2cpio libstdc++6-4.8.2-3.2.mga4.x86_64.rpm | cpio -idmv

Step 2: Use patchelf to make your python install use these userspace libraries instead of the system defaults

Credit: Stackoverview answer by Evalds Urtans

Step 2a: Install patchelf into current conda env

# Be sure correct environment is active
$ source activate py37_torch

# Install patchelf
(py37_torch) $ conda install patchelf -c conda-forge --yes

Step 2b: Use attached script to alter the conda env's python executable to use the custom GLIBC libraries

(py37_torch) $ bash rewrite_python_exe_glibc_with_patchelf.sh

-- DEPRECATED --

Step 1a: (OLD FOR PYTHON 2.7) Download and unpack some pre-compiled GLIBC shared libraries

$ # Make a folder within the environment to hold useful things
$ mkdir -p /path/to/conda/envs/py27_torch1.0/custom_libs/
$ cd /path/to/conda/envs/py27_torch1.0/custom_libs/

$ # Get libc files (URL verified by MCH on 2019/02/18)
$ wget https://launchpadlibrarian.net/137699828/libc6_2.17-0ubuntu5_amd64.deb
$ wget https://launchpadlibrarian.net/137699829/libc6-dev_2.17-0ubuntu5_amd64.deb

$ # Unpack files into current directory (will create usr/ and lib/ and lib64/ folders)
$ ar p libc6_2.17-0ubuntu5_amd64.deb data.tar.gz | tar zx
$ ar p libc6-dev_2.17-0ubuntu5_amd64.deb data.tar.gz | tar zx
#!/usr/env bash
# TODO edit this line to specify location of new glibc
export GLIBC_PATH=/cluster/tufts/hugheslab/miniconda2/envs/ape/custom_libs/
export GLIBC_LD_PATH=$GLIBC_PATH/lib/x86_64-linux-gnu/ld-2.23.so
if [[ ! -f $GLIBC_LD_PATH ]]; then
echo "ERROR: Provided GLIBC_LD_PATH not valid"
exit
fi
echo "OVERWRITING PYTHON EXECUTABLE:"
python_exe=`which python`
echo $python_exe
IS_CONDA_ENV=`python -c "print('$python_exe'.count('/envs/') > 0)"`
echo "IS_CONDA_ENV: $IS_CONDA_ENV"
if [[ $IS_CONDA_ENV -ne 'True' ]]; then
echo "ERROR: Current python executable not in conda env. Will not alter to avoid problems."
exit
fi
CONDA_ENV_LIB=`python -c "print('$python_exe'.replace('/bin/python', '/lib'))"`
echo "CREATING BACKUP PYTHON"
python_tmp_exe=`python -c "print('$python_exe'.replace('python', 'python_backup'))"`
cp $python_exe $python_tmp_exe
echo "$python_tmp_exe"
rpath=$GLIBC_PATH/lib/x86_64-linux-gnu:$CONDA_ENV_LIB:/usr/lib64:/lib64:/lib
echo "CALLING PATCHELF on 'python' binary"
patchelf --set-interpreter $GLIBC_LD_PATH --set-rpath $rpath $python_exe
echo "DONE! patchelf complete"
@geblanco
Copy link

geblanco commented Apr 3, 2020

This was extremely useful, made us able to get up and running torch with an old centOS 6. However, we found various issues:

  • Your instructions are mixed with StackOverflow answer by Theo T. in that you download libc_2.23, but do the ar p command for lib_2.17, it's just a matter of changing the command to reflect the correct libs (happens with libcstd++ too). The same goes for ld_2.17 in the rewrite script.

When installing 2.23, for some reason, patching the python binary caused pip & network related stuff to break, yielding an environment unable to install more packages, but with the already installed ones running. We found the following solutions:

  • Reinstalling python and pip (conda install -c anaconda pip python==3.7), installing desired packages (pip install ...), and patching elf again, ./rewrite_python... step.
  • Installing libc_2.17 instead of libc_2.23, as in StackOverflow answer by Theo T., that, for some reason, doesn't break network, resulting in a functional, but patched environment.

Anyway, thank you very much for this gist, it has saved my day!

@michaelchughes
Copy link
Author

michaelchughes commented May 23, 2020

This has been updated to fix the version inconsistencies mentioned above.

Yes I agree this might break some pip stuff as is, so you can revert to the backup python as needed, then repatch.

@machineyearning
Copy link

Hi, I'm getting this error when running the bash script:

CALLING PATCHELF on 'python' binary
patchelf: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by patchelf)
patchelf: /usr/lib64/libstdc++.so.6: version `CXXABI_1.3.9' not found (required by patchelf)
patchelf: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by patchelf)
DONE! patchelf complete

The output I get from strings lib/x86_64-linux-gnu/libc.so.6 | grep GLIBC_2 | tail -n3 looks correct:

GLIBC_2.18
GLIBC_2.22
GLIBC_2.23

Any idea how to fix this?

@ManifoldFR
Copy link

I think the links for libc6 packages need to be updated to http://security.ubuntu.com/ubuntu/pool/main/g/glibc/libc6_2.23-0ubuntu11.2_amd64.deb

@XStargate
Copy link

Hi, I'm getting this error when running the bash script:

CALLING PATCHELF on 'python' binary
patchelf: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by patchelf)
patchelf: /usr/lib64/libstdc++.so.6: version `CXXABI_1.3.9' not found (required by patchelf)
patchelf: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by patchelf)
DONE! patchelf complete

The output I get from strings lib/x86_64-linux-gnu/libc.so.6 | grep GLIBC_2 | tail -n3 looks correct:

GLIBC_2.18
GLIBC_2.22
GLIBC_2.23

Any idea how to fix this?

I also met the same problem. Did you fix it?

@Colhodm
Copy link

Colhodm commented Jan 14, 2021

Bump on the above, I'm having the same issue. Was anybody able to resolve this?

@hzcheney
Copy link

hzcheney commented Apr 9, 2021

Thanks a lot ! It really helps me !!!
But the link file mentioned above doesn't exist, so I googled it and find some alternative files that may be useful to you guys!

@zhongzisha
Copy link

For the problem of /usr/lib64/libstdc++.so.6: version GLIBCXX_..*' not found`, one possible solution is as follows:
First compile the gcc-5.4.0 from source,

mkdir build
cd build
../configure --prefix=/home/user/gcc-5.4.0-install --enable-languages=c,c++
make -j8
make install

then,

export GCC_ROOT=/home/user/gcc-5.4.0-install
export PATH=$GCC_ROOT/bin:$PATH
export LD_LIBRARY_PATH=$GCC_ROOT/lib64:$GCC_ROOT/lib:$LD_LIBRARY_PATH

This works fine in the current environment: CentOS 6.5 64bit, Anaconda 5.2, patchelf 0.12, glibc 2.17.
The following files are used to solve the GLIBC problem:

libc6_2.17-0ubuntu5_amd64.deb
libc6-dev_2.17-0ubuntu5_amd64.deb
libstdc++-4.4.4-13.el6.x86_64.rpm

@haanamomo
Copy link

It seems I successfully ran the script, but then I got this error,

segmentation fault  python

Does anyone have any clue?

@jung-youjin
Copy link

@fangwei18 Any updates? I'm on different OS but facing similar issues. I presume segmentation fault is due to export part.

@ausstein
Copy link

@fangwei18 @jung-youjin

I fixed this by using an older version of pytorch then rerunning the script.
conda install pytorch==1.4.0 torchvision==0.5.0 cpuonly -c pytorch
conda install -c anaconda pip python==3.7
bash rewrite_python_exe_glibc_with_patchelf.sh

I am not sure if you need to go this far back but pytorch 1.4 is sufficent for me.
I am on OpenSuse 11 and I am so happy to finally have this running. I hope this helps

@KleinWang
Copy link

Thank you, it helps me a lot.

@LitMSCTBB
Copy link

Thanks for the solution; the process ran smoothly. But I'm having an error with opencv-python. Using conda install -c menpo opencv results in this:

Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: -
Found conflicts! Looking for incompatible packages.                                                                                                                   failed

UnsatisfiableError: The following specifications were found
to be incompatible with the existing python installation in your environment:

Specifications:

  - opencv -> python[version='2.7.*|>=3.7,<3.8.0a0|>=3.6,<3.7.0a0|>=3.5,<3.6.0a0|>=2.7,<2.8.0a0']

Your python: python=3.9

If python is on the left-most side of the chain, that's the version you've asked for.
When python appears to the right, that indicates that the thing on the left is somehow
not available for the python version you are constrained to. Note that conda will not
change your python version to a different minor version unless you explicitly specify
that.

The following specifications were found to be incompatible with your system:

  - feature:/linux-64::__glibc==2.12=0
  - feature:|@/linux-64::__glibc==2.12=0
  - opencv -> libgcc-ng[version='>=7.2.0'] -> __glibc[version='>=2.17']

Your installed version is: 2.12

Anyone been able to solve this issue?

@hzcheney
Copy link

@LitMSCTBB I think you should downgrade your python to 3.6 and try again.

@ZhuofanShen
Copy link

Hi, I just tried your solution to modify my anaconda environment on the university cluster. Things went well, but after running the patchelf, the PyTorch package cannot recognize the CUDA drive anymore.
import torch
print(torch.cuda.device_count()) # --> 0
print(torch.cuda.is_available()) # --> False
print(torch.version.cuda) # --> 11.3
Do you have any idea what is happening? Thank you.

@KleinWang
Copy link

Hi, thanks for your method. It almost works but I get the bug as follows. Do you have any idea to fix it? Thank you very much

(SG) klein@fawn:GPU$python -c 'import torch'
Traceback (most recent call last):
File "", line 1, in
File "/data/L/Brain/klein/anaconda3/envs/SG/lib/python3.10/site-packages/torch/init.py", line 198, in
_load_global_deps()
File "/data/L/Brain/klein/anaconda3/envs/SG/lib/python3.10/site-packages/torch/init.py", line 151, in _load_global_deps
ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
File "/data/L/Brain/klein/anaconda3/envs/SG/lib/python3.10/ctypes/init.py", line 374, in init
self._handle = _dlopen(self._name, mode)
OSError: /lib64/librt.so.1: symbol __vdso_clock_gettime, version GLIBC_PRIVATE not defined in file libc.so.6 with link time reference

@KleinWang
Copy link

If I import torch twice in a jupyter notebook, the second time will be successful. However torch.nn.NLLLoss() has bugs again.

(SG) klein@fawn:GPU$python
Python 3.10.4 (main, Mar 31 2022, 08:41:55) [GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

import torch
Traceback (most recent call last):
File "", line 1, in
File "/data/L/Brain/klein/anaconda3/envs/SG/lib/python3.10/site-packages/torch/init.py", line 198, in
_load_global_deps()
File "/data/L/Brain/klein/anaconda3/envs/SG/lib/python3.10/site-packages/torch/init.py", line 151, in _load_global_deps
ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
File "/data/L/Brain/klein/anaconda3/envs/SG/lib/python3.10/ctypes/init.py", line 374, in init
self._handle = _dlopen(self._name, mode)
OSError: /lib64/librt.so.1: symbol __vdso_clock_gettime, version GLIBC_PRIVATE not defined in file libc.so.6 with link time reference
import torch
torch.nn.NLLLoss()
Traceback (most recent call last):
File "", line 1, in
File "/data/L/Brain/klein/anaconda3/envs/SG/lib/python3.10/site-packages/torch/nn/modules/loss.py", line 207, in init
super(NLLLoss, self).init(weight, size_average, reduce, reduction)
File "/data/L/Brain/klein/anaconda3/envs/SG/lib/python3.10/site-packages/torch/nn/modules/loss.py", line 26, in init
self.register_buffer('weight', weight)
File "/data/L/Brain/klein/anaconda3/envs/SG/lib/python3.10/site-packages/torch/nn/modules/module.py", line 308, in register_buffer
elif not isinstance(name, torch._six.string_classes):
AttributeError: module 'torch' has no attribute '_six'

@zouguangxian
Copy link

based on this gist, I succeed to install python 3.8 with conda on CentOS 6. GLIBC 2.17 and patchelf are compiled from source code. rewrite_python_exe_glibc_with_patchelf.sh is rewritten with one-line command.

https://gist.github.com/zouguangxian/31856f63fe2ac1bad11f404728dfb305

@zwben
Copy link

zwben commented Jul 28, 2022

Hi, thanks for your method. It almost works but I get the bug as follows. Do you have any idea to fix it? Thank you very much

(SG) klein@fawn:GPU$python -c 'import torch' Traceback (most recent call last): File "", line 1, in File "/data/L/Brain/klein/anaconda3/envs/SG/lib/python3.10/site-packages/torch/init.py", line 198, in _load_global_deps() File "/data/L/Brain/klein/anaconda3/envs/SG/lib/python3.10/site-packages/torch/init.py", line 151, in _load_global_deps ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL) File "/data/L/Brain/klein/anaconda3/envs/SG/lib/python3.10/ctypes/init.py", line 374, in init self._handle = _dlopen(self._name, mode) OSError: /lib64/librt.so.1: symbol __vdso_clock_gettime, version GLIBC_PRIVATE not defined in file libc.so.6 with link time reference

I got the same problem. I installed the latest pytorch (1.12.0) which requires glibc > 2.27. I installed glibc 2.31 and libstdc++6-9.3.1, but got this problem when I import torch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment