Source: #clusters by Olexa Bilanuik
Important Environment Variables
PATH
: Ordered, colon (:
)-separated list of directories. List of directories searched by the shell for executables to execute.LD_LIBRARY_PATH
: Ordered, colon (:
)-separated list of directories. Contributes to, but is not the only, list of directories searched by the dynamic linker for libraries to load.
Execution of a Program in a Shell
- If you execute a command
cmd arg arg arg...
in a shell, your shell will:- Search the directories on the
PATH
for the first such directory that contains a file or symlink by the name ofcmd
, - Execute it, passing the given arguments
arg arg arg...
.
- Search the directories on the
- Typical problems at this stage:
bash: /usr/bin/blob: No such file or directory
(Yourcmd
is a path to somewhere that doesn't exist)bash: blob: command not found
(Yourcmd
could not be found in any directory on thePATH
)
Library Loading
Any time a library is dynamically requested by name, such as
-
- At program startup or
-
- when a Python module with native extensions is
import
-ed), a search is begun for that library. If a library by that name is already loaded, that is what is used. Otherwise, a procedure is followed to search the filesystem, and a handle to it is returned. This usesLD_LIBRARY_PATH
, but that's not the whole story!!!! This process may be recursive if the loaded object has its own dependencies.
- when a Python module with native extensions is
Typical problems at this stage:
cmd: error while loading shared libraries: libwhatever.so.1: cannot open shared object file: No such file or directory
(Thecmd
or shared library could not findlibwhatever.so.1
after the search procedure).
Assuming this succeeds, the dynamic linker attempts to link any symbols (functions, global variables, ...) that the loading object requested from the loaded object. All symbols have a name, and it may rarely include a version.
Typical problems at this stage:
Error relocating /usr/lib/libwhatever.so.1: functionname: symbol not found
(The library that was loaded doesn't have the symbolfunctionname
in it. Probably because the wrong library was loaded.)/lib64/libc.so.6: version
GLIBC_2.14' not found (required by /usr/lib/libstdc++.so.6)(The library you loaded uses symbol versioning (this is almost always
libc.so`) and its symbols are too old. Probably because the wrong library was loaded, but also possible your libraries really are too old.)
Python Import
- A failure to find a Python module for
import
on thesys.path
at all is aModuleNotFoundError
. - A failure to
import
a Python module that was found due to (e.g.) a failed shared-library dynamic load/link is anImportError
.
Useful Command-Line Tools to Debug These Problems:
-
Which file will be selected by BASH for execution can be found by executing
which cmd
. This matters if you have multiple directories in whichpython
is installed (like the system Python, the lab stack Python, or your own Conda envs). -
Which libraries are requested (and found) by a shared library or executable loaded in isolation is given by
ldd /path/to/cmd
orldd path/to/libwhatever.so
. -
Which symbols a shared library defines, exposes and requests is given by
nm path/to/libwhatever.so
(but see alsonm
's-D
flag). -
Detail is given by
readelf -d path/to/libwhatever.so
(The-a
is extreme overkill) :boom: Detailed Library Search Procedure (WARNING: BATSHIT CRAZY) 💥 -
The search for libraries uses
LD_LIBRARY_PATH
, but that's not the only thing that influences the search order. -
The dynamic loader can also use two optional PATHs that are directly embedded within the executable/shared library:
RPATH
andRUNPATH
. That allows a program to "know" ahead-of-time where to find its libraries. -
The difference is that 1)
RUNPATH
overridesRPATH
and 2)RUNPATH
is searched afterLD_LIBRARY_PATH
, not before. -
Anaconda binaries are specially-built to use a
RUNPATH
that points into the right place relative to all other Anaconda software. -
Compute Canada's compilers are trained to insert a
RUNPATH
to the correct locations (wherelibc.so.6
is, for instance) in everything they compile. -
Compute Canada's usual Linux utilities are built with those compilers, so they do find the right libraries and do work.
-
Software downloaded off the internet does not have the
RUNPATH
/RPATH
properly set, so it will not find the right libraries and will not work. -
To fix this, Compute Canada has a script called
setrpaths.sh
that directly patches the binary to use the rightRUNPATH
.
The Search Procedure:
RPATH of the loading object,
then the RPATH of its loader (unless it has a RUNPATH), ...,
until the end of the chain, which is either the executable
or an object loaded by dlopen
Unless executable has RUNPATH:
RPATH of the executable
LD_LIBRARY_PATH
RUNPATH of the loading object
ld.so.cache
default dirs```
https://blog.qt.io/blog/2011/10/28/rpath-and-runpath/