Skip to content

Instantly share code, notes, and snippets.

@atifraza
Last active February 22, 2024 21:45
Show Gist options
  • Star 16 You must be signed in to star a gist
  • Fork 4 You must be signed in to fork a gist
  • Save atifraza/b1a92ae7c549dd011590209f188ed2a0 to your computer and use it in GitHub Desktop.
Save atifraza/b1a92ae7c549dd011590209f188ed2a0 to your computer and use it in GitHub Desktop.
Basics of the conda package manager

Conda basics and best practices

Best practices

  • Don't install additional packages into the base environment
  • Give meaningful names to environments, e.g., PROJECT_NAME-env
  • Always specify package version numbers
  • Install all required packages in one command to reduce later conflicts
  • Install pip in each environment to avoid using the system default version
  • Always version control the environment.yml file
  • Always export environments with the --no-builds flag
  • Creating a new environment is better than updating one, even with --prune
  • Use the --force option to overwrite an existing environment

Creating environments

Environments are created using the conda create command.

conda create --name basic-ml-env python pip

Specific package versions can be specified as major.minor version numbers

conda create --name basic-ml-env python=3.6 pip=20.0

By default, environmnets are created in the default environment directory but an environmnet can also be created in a specific directory. This is useful in case the environment is intended to be created in a sub-directory called env in the main project directory.

conda create --prefix ./env python pip

Activating and deactivating environments

A named environment is activated using:

conda activate basic-ml-env

If the environment is in the env directory of the project, then use:

conda activate ./env

An environment is deactivated using:

conda deactivate

Searching packages

Packages can be searched using:

conda search scikit-learn

Installing packages into existing environments

Most packages are available in conda repositories and can be installed as:

conda activate basic-ml-env
conda install numpy=1.18 scikit-learn=0.22

If a package is not available in the conda repositories, it can be installed via pip.

conda activate basic-ml-env
pip install combo==0.1.*

Freeze existing packages in environments

When installing new packages in an environment, the --freeze-installed option of the conda install command freezes the previously installed packages and the required package with an older version might be installed for compatibility without updating the versions of pre-installed packages.

Listing existing environments

conda env list

Listing installed packages in an environment

The list of packages for the current environment is obtained as follows:

conda list

while the list of packages for an arbitrary environment is obtained as:

conda list --name basic-ml-env

or

conda list --prefix /path/to/env

Deleting an environment

conda remove --name name-of-env --all

or

conda remove --prefix /path/to/env --all

Creating an environment file

The scaffolding of a conda environment can be defined as a YAML text file. The default file name is environment.yml.

The command conda env create looks for an environment.yml file in the current directory to create an associated environment. If the environment file is saved with a different name than the default, the following command can be used instead.

conda env create --file alt-environment-file-name.yml

The basic structure of an environment.yml file is as follows:

name: machine-learning-env

dependencies:
  - ipython
  - matplotlib
  - pandas
  - python
  - scikit-learn
  - pip
    - tensorflow=1.13

The above environment.yml file would create an environment named machine-learning-env. If the environment is intended to be created in the ./env sub-directory, then the name property should be set to null. The following file snippet shows such an example and also includes version numbers for the packages.

name: null

dependencies:
  - ipython=7.13
  - matplotlib=3.1
  - pandas=1.0
  - python=3.6
  - scikit-learn=0.22
  - pip=20.0
    - tensorflow=1.13

Exporting an environment

The conda env export command exports the environment details, e.g., the environment name and list of packages with version and build information. Using the --no-builds flag allows to export an environment file such that only the version numbers get specified. This enables better environment reproducibility.

conda env export --name basic-ml-env --no-builds

An environment can be updated from an environment.yml file as:

conda env update --name basic-ml-env --file environment.yml --prune

The --prune flag allows to remove packages that are no longer required.

Creating an environment when another environment with the same name exists is useful if a fresh environment is required for some reason.

conda env create --name basic-ml-env --file environment.yml --force

Making Jupyter aware of the conda environments

If JupyterHub or JupyterLab are installed in the base environment and it is required that Jupyter should run based on the particular environment and not from the base environment, a kernel spec file can be created to enable this. This may require the installation of the nb_conda_kernels package in the base environment.

conda install jupyterlab nb_conda_kernels

Before, creating the kernel spec file, the conda environment should have the ipykernel package installed. The follwoing environment.yml file can be used as a starting point for this purpose.

name: xgboost-env

dependencies:
  - ipykernel=5.3
  - ipython=7.13
  - matplotlib=3.1
  - pandas=1.0
  - python=3.6
  - scikit-learn=0.22
  - xgboost=1.0
  - pip=20.0

Next, the specific conda environment is created as:

conda env create --file environment.yml --force

Now, the environment is activated and the kernel spec file is created as:

conda activate xgboost-env
python -m ipykernel install --user --name xgboost-env --display-name "XGBoost"

The kernel spec fentries can also be removed using:

jupyter kernelspec list
jupyter kernelspec uninstall my-env  # jupyter kernelspec remove my-env

Conda channels

Channels are used for distributing packages. The Anaconda managed channels are refered to as defaults and, as the name suggests, packages are searched and installed from this channel by default.

conda-forge is a popular, community managed channel that gets updated frequently, while the packages in the defaults channel get updated after extensive quality control, and/or once a release is deemed stable enough. Sometimes, conda-forge also hosts packages that do not make their way into the defaults channel.

Installing a package from a specific channel requires the following:

conda install --channel conda-forge --name basic-ml-env scipy=1.3

Multiple channels can be specified in a single command, where the channel specified first has higher priority than the channel specified later on. For example, in the following command, conda-forge has higher priority than bioconda.

conda install --channel conda-forge --channel bioconda scipy=1.3

Searching, and installing, a package from a specific channel is carried out as follows:

conda search conda-forge::kaggle
conda install --name basic-ml-env conda-forge::kaggle

The environment.yml file can also provide a list of channels to be used for searching and installing packages, including the channel priority. The order of packages in an environment.yml file does not imply priority but the order of channels does imply priority.

name: deeplearning-env

channels:
  - intel
  - conda-forge
  - defaults

dependencies:
  - ipykernel=5.3
  - ipython=7.13
  - matplotlib=3.1
  - pandas=1.0
  - python=3.6
  - scikit-learn=0.22
  - xgboost=1.0
  - tensorflow-intel=1.13
  - pip=20.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment