Skip to content

Instantly share code, notes, and snippets.

@jaceklaskowski
Last active June 22, 2023 08:07
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jaceklaskowski/cf4e82638d2bb459d922df6889c95f8b to your computer and use it in GitHub Desktop.
Save jaceklaskowski/cf4e82638d2bb459d922df6889c95f8b to your computer and use it in GitHub Desktop.
Random Python Notes

Random Notes about Python

Day 8. conda-libmamba-solver

https://www.anaconda.com/blog/a-faster-conda-for-a-growing-community

conda install -c conda-forge --solver=libmamba ...

Day 7. Mamba

From the official documentation:

  • Mamba is a fast, robust, and cross-platform package manager.
  • It runs on Windows, OS X and Linux (ARM64 and PPC64LE included) and is fully compatible with conda packages and supports most of conda’s commands.
  • mamba: a Python-based CLI conceived as a drop-in replacement for conda, offering higher speed and more reliable environment solutions

Day 6 mack

Day 5 Miniconda

miniconda:

  • Powerful and flexible package manager
  • a free minimal installer for conda
  • a small, bootstrap version of Anaconda with only conda, Python, the packages they depend on, and a small number of other useful packages (e.g., pip, zlib)
  • Use conda install to install additional conda packages from the Anaconda repository
  • Miniconda Docker images

From Installation:

The fastest way to obtain conda is to install Miniconda, a mini version of Anaconda that includes only conda and its dependencies.

$ conda
usage: conda [-h] [-V] command ...

conda is a tool for managing and deploying applications, environments and packages.
...

Anaconda

From docker-miniconda:

Anaconda is the leading open data science platform powered by Python. The open source version of Anaconda is a high performance distribution and includes over 100 of the most popular Python packages for data science. Additionally, it provides access to over 720 Python and R packages that can easily be installed using the conda dependency and environment manager, which is included in Anaconda.

conda update

conda update to upgrade

$ conda update conda                                                                                                                 1 ↵
Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

TODO

Dask (miniconda)

Conda

$ cd /Users/jacek/dev/sandbox/dask

It's common to name the environments after the project.

$ conda create --help                                                                                                                2 ↵
usage: conda create [-h] [--clone ENV] (-n ENVIRONMENT | -p PATH) [-c CHANNEL] [--use-local] [--override-channels]
                    [--repodata-fn REPODATA_FNS] [--strict-channel-priority] [--no-channel-priority] [--no-deps | --only-deps] [--no-pin]
                    [--copy] [-C] [-k] [--offline] [-d] [--json] [-q] [-v] [-y] [--download-only] [--show-channel-urls] [--file FILE]
                    [--no-default-packages] [--solver {classic} | --experimental-solver {classic}] [--dev]
                    [package_spec ...]

Create a new conda environment from a list of specified packages. To use the newly-created environment, use 'conda activate envname'. This command requires either the -n NAME or -p PREFIXoption.
...
Target Environment Specification:
  -n ENVIRONMENT, --name ENVIRONMENT
                        Name of environment.
...
$ conda create -n dask-sandbox dask
Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /usr/local/Caskroom/miniconda/base/envs/dask-sandbox

  added / updated specs:
    - dask


The following NEW packages will be INSTALLED:
...
Proceed ([y]/n)? y


Downloading and Extracting Packages

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate dask-sandbox
#
# To deactivate an active environment, use
#
#     $ conda deactivate

$ conda activate dask-sandbox

FIXME How to conda activate dask-sandbox anytime we cd to the directory (like pyenv local dask-sandbox would do)?

Let's use a modern Python interface.

$ conda install --help
usage: conda install [-h] [--revision REVISION] [-n ENVIRONMENT | -p PATH] [-c CHANNEL] [--use-local] [--override-channels]
                     [--repodata-fn REPODATA_FNS] [--strict-channel-priority] [--no-channel-priority] [--no-deps | --only-deps]
                     [--no-pin] [--copy] [-C] [-k] [--offline] [-d] [--json] [-q] [-v] [-y] [--download-only] [--show-channel-urls]
                     [--file FILE] [--solver {classic} | --experimental-solver {classic}] [--force-reinstall]
                     [--freeze-installed | --update-deps | -S | --update-all | --update-specs] [-m] [--clobber] [--dev]
                     [package_spec ...]

Installs a list of packages into a specified conda environment.
...
Install the package 'scipy' into the currently-active environment::

    conda install scipy

Install a list of packages into an environment, myenv::

    conda install -n myenv scipy curl wheel

Install a specific version of 'python' into an environment, myenv::

    conda install -p path/to/myenv python=3.7.13
$ conda install -n dask-sandbox jupyter notebook
Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /usr/local/Caskroom/miniconda/base/envs/dask-sandbox

  added / updated specs:
    - jupyter
    - notebook
...
Proceed ([y]/n)? y


Downloading and Extracting Packages

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
$ jupyter notebook

That opens http://localhost:8888/tree.

Followed 10 Minutes to Dask and got my very first Dask app up and running! Yay!

Python Wheels on Databricks

From Itai's #1DatabrickAWeek - week 29:

  1. You can run Python wheel tasks on the Databricks platform
  2. A Python wheel is a way to package a project's components into a single file that can be installed on a target system (similar to JAR files in the JVM world)
  3. With Databricks Jobs, you can now run Python wheel tasks on clusters (similar to running an Apache Spark JAR or a notebook), providing the package name, entry point, and parameters.
  4. You can define these tasks through the UI (Jobs) or through the #REST API (Jobs API 2.1).
  5. Deploy Production Pipelines Even Easier With Python Wheel Tasks

Review

  1. Databricks CLI eXtensions (dbx)
  2. Great Expectations (GX)
  3. poetry
  4. https://github.com/davidhalter/jedi
  5. https://tox.wiki
  6. flake8
  7. Black

Learning Python by Watching Open Source Projects

  1. https://github.com/databricks/databricks-cli

Tools

  1. pyenv lets you easily switch between multiple versions of Python.
  2. pyenv-virtualenv - a pyenv plugin to manage virtual environments created by virtualenv or Anaconda

Commands

pip install git+https://github.com/ibis-project/ibis.git#egg=ibis-framework[pandas,dask,postgres]
python3 -bb -m pytest tests/fugue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment