jaceklaskowski/python.md

## python.md

      
    Raw
  

              python.md
            
          
    Random Notes about Python

Day 8. conda-libmamba-solver

https://www.anaconda.com/blog/a-faster-conda-for-a-growing-community
conda install -c conda-forge --solver=libmamba ...
Day 7. Mamba

From the official documentation:

Mamba is a fast, robust, and cross-platform package manager.
It runs on Windows, OS X and Linux (ARM64 and PPC64LE included) and is fully compatible with conda packages and supports most of conda’s commands.
mamba: a Python-based CLI conceived as a drop-in replacement for conda, offering higher speed and more reliable environment solutions

Day 6 mack


Code review of https://github.com/MrPowers/mack (to learn Python, the tooling, Delta Lake and concepts like SCD2)

Day 5 Miniconda

miniconda:

Powerful and flexible package manager
a free minimal installer for conda
a small, bootstrap version of Anaconda with only conda, Python, the packages they depend on, and a small number of other useful packages (e.g., pip, zlib)
Use conda install to install additional conda packages from the Anaconda repository
Miniconda Docker images

continuumio/miniconda3


From Installation:

The fastest way to obtain conda is to install Miniconda, a mini version of Anaconda that includes only conda and its dependencies.

$ conda
usage: conda [-h] [-V] command ...

conda is a tool for managing and deploying applications, environments and packages.
...
Anaconda

From docker-miniconda:

Anaconda is the leading open data science platform powered by Python. The open source version of Anaconda is a high performance distribution and includes over 100 of the most popular Python packages for data science. Additionally, it provides access to over 720 Python and R packages that can easily be installed using the conda dependency and environment manager, which is included in Anaconda.

conda update

conda update to upgrade
$ conda update conda                                                                                                                 1 ↵
Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.
TODO


docker-miniconda and Usage
Databricks AutoML

What is AutoML?


Dask (miniconda)

Conda
$ cd /Users/jacek/dev/sandbox/dask
It's common to name the environments after the project.
$ conda create --help                                                                                                                2 ↵
usage: conda create [-h] [--clone ENV] (-n ENVIRONMENT | -p PATH) [-c CHANNEL] [--use-local] [--override-channels]
                    [--repodata-fn REPODATA_FNS] [--strict-channel-priority] [--no-channel-priority] [--no-deps | --only-deps] [--no-pin]
                    [--copy] [-C] [-k] [--offline] [-d] [--json] [-q] [-v] [-y] [--download-only] [--show-channel-urls] [--file FILE]
                    [--no-default-packages] [--solver {classic} | --experimental-solver {classic}] [--dev]
                    [package_spec ...]

Create a new conda environment from a list of specified packages. To use the newly-created environment, use 'conda activate envname'. This command requires either the -n NAME or -p PREFIXoption.
...
Target Environment Specification:
  -n ENVIRONMENT, --name ENVIRONMENT
                        Name of environment.
...
$ conda create -n dask-sandbox dask
Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /usr/local/Caskroom/miniconda/base/envs/dask-sandbox

  added / updated specs:
    - dask


The following NEW packages will be INSTALLED:
...
Proceed ([y]/n)? y


Downloading and Extracting Packages

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate dask-sandbox
#
# To deactivate an active environment, use
#
#     $ conda deactivate

$ conda activate dask-sandbox
FIXME How to conda activate dask-sandbox anytime we cd to the directory (like pyenv local dask-sandbox would do)?
Let's use a modern Python interface.
$ conda install --help
usage: conda install [-h] [--revision REVISION] [-n ENVIRONMENT | -p PATH] [-c CHANNEL] [--use-local] [--override-channels]
                     [--repodata-fn REPODATA_FNS] [--strict-channel-priority] [--no-channel-priority] [--no-deps | --only-deps]
                     [--no-pin] [--copy] [-C] [-k] [--offline] [-d] [--json] [-q] [-v] [-y] [--download-only] [--show-channel-urls]
                     [--file FILE] [--solver {classic} | --experimental-solver {classic}] [--force-reinstall]
                     [--freeze-installed | --update-deps | -S | --update-all | --update-specs] [-m] [--clobber] [--dev]
                     [package_spec ...]

Installs a list of packages into a specified conda environment.
...
Install the package 'scipy' into the currently-active environment::

    conda install scipy

Install a list of packages into an environment, myenv::

    conda install -n myenv scipy curl wheel

Install a specific version of 'python' into an environment, myenv::

    conda install -p path/to/myenv python=3.7.13
$ conda install -n dask-sandbox jupyter notebook
Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /usr/local/Caskroom/miniconda/base/envs/dask-sandbox

  added / updated specs:
    - jupyter
    - notebook
...
Proceed ([y]/n)? y


Downloading and Extracting Packages

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
$ jupyter notebook
That opens http://localhost:8888/tree.
Followed 10 Minutes to Dask and got my very first Dask app up and running! Yay!
Python Wheels on Databricks

From Itai's #1DatabrickAWeek - week 29:

You can run Python wheel tasks on the Databricks platform
A Python wheel is a way to package a project's components into a single file that can be installed on a target system (similar to JAR files in the JVM world)
With Databricks Jobs, you can now run Python wheel tasks on clusters (similar to running an Apache Spark JAR or a notebook), providing the package name, entry point, and parameters.
You can define these tasks through the UI (Jobs) or through the #REST API (Jobs API 2.1).
Deploy Production Pipelines Even Easier With Python Wheel Tasks

Review


Databricks CLI eXtensions (dbx)
Great Expectations (GX)
poetry
https://github.com/davidhalter/jedi
https://tox.wiki
flake8
Black

Learning Python by Watching Open Source Projects


https://github.com/databricks/databricks-cli

Tools


pyenv lets you easily switch between multiple versions of Python.
pyenv-virtualenv - a pyenv plugin to manage virtual environments created by virtualenv or Anaconda

Commands

pip install git+https://github.com/ibis-project/ibis.git#egg=ibis-framework[pandas,dask,postgres]

python3 -bb -m pytest tests/fugue