As Tensorflow is continuously evolving, it is normal to find a situation in which you require multiple versions of Tensorflow to coexist on the same machine. Those versions can be different enough to have different CUDA library dependencies. In this case, you can be tempted to upgrade to the latest release but maybe some of your solutions are still in production or just there are not more holes in your calendar.
In this gist I will cover how to install several CUDA libraries to support different tensorflow verions. However, there are some red lines that you have to respect as the GCC versions, that must be the same, and the nvidia drivers that must support the target CUDA versions. You can check that information in the Tensoroflow website.
The basic idea is to install the CUDA libraries and abuse of the linux system to find the correct libraries when executing the target tensorflow version. As you will know the LD_LIBRARY_PATH is an environment variable that defines those places where the system must locate the libraries when running an executable. The order of the folders defined in this variable matters, so if the target library is not found in the first preference, the system will check the following one up to find it or just check the entire list.
Go to NVIDIA developer website and dowload the target CUDA Toolkits. Be aware of the CUDA Toolkits installations as it will try to upgrade your nvidia drivers. I recommend to install the oldest one first and leave the most updated to the end.
By default, cuda libraries are installed in linux in /usr/local. So, once you have finishing installing all the cuda versions you will find a respective folders for each one of the version in conjunction with a symbolic link with the name cuda pointing to the latest installed cuda toolkit. In my case, I have installed versions 10.0 and 10.1, so my /usr/local lists the following:
...
cuda -> /usr/local/cuda-10.1
cuda-10.0
cuda-10.1
...
Each cuda installation must be independent and it will be required to install the respective CUDNN library versions even if they are the same. So, proceed to dowload the respective CUDNN libraries for linux from the developer envidia websited and remember that the login is compulsory. The installation is just move the libraries to the target cuda subolders. For instance, ...
$ tar xvf cudnn-10.0-linux-x64-v6.0.tgz
$ sudo cp -P cuda/lib64/* /usr/local/cuda/lib64/
$ sudo cp cuda/include/* /usr/local/cuda/include/
Finally, you must configure the LD_LIBRARY_PATH including all the cuda directories to find the respective libraries. I use to define that envionment variable in shell bash script "~/.bashrc". So, I recommend to modify it and include the following lines to configure it. For instance:
#nvidia
export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64:/usr/local/cuda-10.1/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-10.0/extras/CUPTI/lib64:/usr/local/cuda-10.1/extras/CUPTI/lib64:$LD_LIBRARY_PATH
Remember to update your loaded libraries using the following command:
sudo ldconfig
**Posible errors: **
It is likely that you have mismatch problems with the nvidia drivers modules. It is necessary to reload the nvidia modules for the system since it must refresh the initramfs and reload the modules. You can simply reboot the machine but maybe you are working on a server and that option is not on the table. If you cannot restart the machine you must unload and reload the modules. This problem is solved in stackoverflow and I include the solution to have more self-contained gist.
List of the nvidia drivers modules loaded by the system:
lsmod | grep nvidia
Typical modules you may get:
nvidia_uvm 634880 8
nvidia_drm 53248 0
nvidia_modeset 790528 1 nvidia_drm
nvidia 12312576 86 nvidia_modeset,nvidia_uvm
Now, unload the modules. The goal is to have all the modules unloaded specially nvidia:
sudo rmmod nvidia_drm
sudo rmmod nvidia_modeset
sudo rmmod nvidia_uvm
sudo rmmod nvidia
If you have problems unloading some of the modules because they are in use, you must kill the processes using that module. The error message you may get is: "rmmod: ERROR: Module nvidia is in use". So, first list the processes using:
sudo lsof /dev/nvidia*
And kill all the found processes. If you succeed, the outcome must be empty when typing:
lsmod | grep nvidia
Now, if you test nvidia-smi, it must work flawlessly.