phobson/Practical Conda.md

## Practical Conda.md

      
    Raw
  

              Practical Conda.md
            
          
    Starting at the beginning:

Let's just assume you're using an Ubuntu-ish distro of Linux. In some ways that makes
this a little more complicated, but on the other hand, it lets me assume you have experience
with other package managers. So the big thing here is that conda is it's own little scientific
apt-get (python packages, GIS tools, R + R packages, gcc, etc) that goes off and builds sandboxes
contained in individual rooms. Then there's pip. Pip is specifically for python packages only and
in my opinion, should only be used when the conda package isn't available.
Back to conda: conda is a package manager that depends on python, but is not per se an installation
of python. So:
conda update conda

Will update conda, but won't touch python for all intents and purposes.
Now let's go through a workflow. First thing you do is open up a terminal. I'm on my mac, but the
same concepts apply. First thing to make sure is that you added conda to your PATH when you installed.
Sounds like you did.
paul$ which conda
/Users/paul/miniconda3/bin/conda  # <--- yup it's in my path

So now:
paul$ conda update conda
Fetching package metadata: ......
# All requested packages already installed.
# packages in environment at /Users/paul/miniconda3:
#
conda                     3.10.1                   py33_0

OK. Conda is up-to-date. Chances are if you start using python or pip, you're actually using your
system's version (from apt-get). This is important. Just like you make a package for each R project
you do (maybe? I don't really know), you can know make a sandbox (environment) for each project.
Why is this important?

Let's say my boss needs to me read in a bunch of water level data, resample it to hourly data, and
blah blah. I create a sandbox with the current versions of all the libraries I need, do the one-off in
an jupyter notebook and forget about. Two years later, I can come back to that same environment and notebook
and even though all the packages have moved on and fixed bugs and introduced new bugs, I'm confident that the
sandbox, I can fully reproduce the analysis since those packages haven't changed.
So let's make an environment:

paul$ conda create --name=demotest ipython pandas=0.14 python=3.4
Fetching package metadata: ......
Solving package specifications: .
Package plan for installation in environment /Users/paul/miniconda3/envs/demotest:

The following NEW packages will be INSTALLED:

    dateutil:   2.4.1-py34_0
    ipython:    3.1.0-py34_0
    numpy:      1.9.2-py34_0
    openssl:    1.0.1k-1
    pandas:     0.14.1-np19py34_0
    pip:        6.1.1-py34_0
    python:     3.4.3-0
    python.app: 1.2-py34_3
    pytz:       2015.2-py34_0
    readline:   6.2-2
    scipy:      0.15.1-np19py34_0
    setuptools: 15.0-py34_0
    six:        1.9.0-py34_0
    sqlite:     3.8.4.1-1
    tk:         8.5.18-0
    xz:         5.0.5-0
    zlib:       1.2.8-0

Proceed ([y]/n)? y

Linking packages ...
[      COMPLETE      ]|####################################################################| 100%
#
# To activate this environment, use:
# $ source activate demotest
#
# To deactivate this environment, use:
# $ source deactivate

So at this point we can't access aything that we installed until we step into the sandbox. You do that with:
paul$ source activate demotest
discarding /Users/paul/miniconda3/bin from PATH
prepending /Users/paul/miniconda3/envs/demotest/bin to PATH
(demotest)paul$

Conda changes the prompt of the terminal to let us know which sandbox we're in. At this point we can install new
packages with conda install <package>. Also notice that pip was installed by default so we can use that too,
if a package isn't available through conda. I mean, really we could use pip for everything, but I think it's
best you only use it when you can't get something in conda. That said, in our newly activated environment, let's
install something using both things:
(demotest)paul$ conda install seaborn
Fetching package metadata: ......
Solving package specifications: .
Package plan for installation in environment /Users/paul/miniconda3/envs/demotest:

The following NEW packages will be INSTALLED:

    freetype:        2.5.2-0
    libpng:          1.5.13-1
    matplotlib:      1.4.3-np19py34_1
    pyparsing:       2.0.3-py34_0
    python-dateutil: 2.4.2-py34_0
    seaborn:         0.5.1-np19py34_0

The following packages will be UPDATED:

    pandas:          0.14.1-np19py34_0 --> 0.16.0-np19py34_1

Proceed ([y]/n)? y

Unlinking packages ...
[      COMPLETE      ]|####################################################################| 100%
Linking packages ...
[      COMPLETE      ]|####################################################################| 100%
(demotest)paul$

Just like apt-get, conda updated what it needed to get the depenencies.
Now with pip:

(demotest)paul$ pip install coveralls
Collecting coveralls
  Using cached coveralls-0.5.zip
Collecting PyYAML>=3.10 (from coveralls)
  Using cached PyYAML-3.11.tar.gz
Collecting docopt>=0.6.1 (from coveralls)
  Using cached docopt-0.6.2.tar.gz
Collecting coverage<3.999,>=3.6 (from coveralls)
  Using cached coverage-3.7.1.tar.gz
Collecting requests>=1.0.0 (from coveralls)
  Using cached requests-2.6.0-py2.py3-none-any.whl
Installing collected packages: PyYAML, docopt, coverage, requests, coveralls
  Running setup.py install for PyYAML
  Running setup.py install for docopt
  Running setup.py install for coverage
  Running setup.py install for coveralls
Successfully installed PyYAML-3.11 coverage-3.7.1 coveralls-0.5 docopt-0.6.2 requests-2.6.0

It pretty much did the same thing. But the drawback here is that within the demotest sandbox,
the conda update will only be able to update packages installed with conda.
And that might be the answer to your question. You see, you can install conda through pip into an existing
python installation. In that case, when you're dealing with your existing python, conda didn't install it
(apt-get did) so conda can't update it. The way around that is to just create a new environment (sandbox).
Say you make a second environment:

(demotest)paul$ source deactivate  # <--- now we're ignoring everyting in demotest
discarding /Users/paul/miniconda3/envs/demotest/bin from PATH
paul$ conda create --name=demo27 python=2.7
Fetching package metadata: ......
Solving package specifications: .
Package plan for installation in environment /Users/paul/miniconda3/envs/demo27:

The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    pip-6.1.1                  |           py27_0         1.4 MB

The following NEW packages will be INSTALLED:

    openssl:    1.0.1k-1
    pip:        6.1.1-py27_0
    python:     2.7.9-1
    readline:   6.2-2
    setuptools: 15.0-py27_0
    sqlite:     3.8.4.1-1
    tk:         8.5.18-0
    zlib:       1.2.8-0

Proceed ([y]/n)? y

Fetching packages ...
pip-6.1.1-py27 100% |#######################################################| Time: 0:00:01 803.14 kB/s
Extracting packages ...
[      COMPLETE      ]|#####################################################| 100%
Linking packages ...
[      COMPLETE      ]|######################################################| 100%
#
# To activate this environment, use:
# $ source activate demo27
#
# To deactivate this environment, use:
# $ source deactivate
#

Activate it

paul$ source activate demo27
discarding /Users/paul/miniconda3/bin from PATH
prepending /Users/paul/miniconda3/envs/demo27/bin to PATH

And try to use pandas

(demo27)paul$ python
Python 2.7.9 |Continuum Analytics, Inc.| (default, Dec 15 2014, 10:37:34)
[GCC 4.2.1 (Apple Inc. build 5577)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://binstar.org
>>> import pandas
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named pandas
>>> exit()

It fails because this environment knowns nothing about the demotest environment we created containing pandas.
So now you want to install a local package from source.

You have two choices: build with conda and use pip. This is the one place where I recommend pip.
the conda build process is complicated b/c it's actually more geared towards making a binary package
to distribute to other via conda (which great, but I don't think that's what you're trying to do).
From your activated environment and inside the source directory of the package containing the
setup.py file, executing pip install . will execute python setup.py install on that source and install
it into your local environment. If you downloaded the package from e.g., github and simply need to use it,
you're done. If  you want to modify and develop the package, you can then run pip install -e .. The -e
is for "editable" and so now source code changed will be reflected everytime your restart your python
interpreter or manually reload the module (reload <package> on python 2, import imp; imp.reload(<package>)
for python 3. Here's what that looks like:
(demo27)paul$ cd sources/wqio
(demo27)Paul-Hobsons-iMac:wqio paul$ pip install .
Processing /Users/paul/sources/wqio
Requirement already satisfied (use --upgrade to upgrade): seaborn in /Users/paul/miniconda3/envs/demo27/lib/python2.7/site-packages (from wqio==0.1)
Installing collected packages: wqio
  Running setup.py install for wqio
Successfully installed wqio-0.1

In your specific example, you can use the `pip install requirements.txt command. But I would open up that
file and try to get everything through conda first.
Questions

Locations of envs and source code for local packages


The part I am still slightly confused about is the relationship between the location of the environments you create
(~/anaconda/envs/whatever) to your actual project -- so if I've forked and cloned someone's git repository into
~/projects/whatever, do I create an environment in that directory? Do I need to move the project into one of the
environments in ~/anaconda/envs/? Or do I just need to activate the environment I want, but work on the project
wherever it's located? I think it's the last one, but words like "environment" and "sandbox" make it sound like
everything should be all wrapped up in one neat little box, i.e. the same directory.

I can see how the two concepts can seem to be in conflict, but here's why the last scenario is correct:

Environments should be completely self-contained and are always created in ~/anaconda/envs
Source code should be where you can easily access (and edit) it, i.e., your user directory
Packages installed from source might be in multiple environments, so there should only be one git clone command
Using pip install . to install a local package actually compiles the source code and any extension modules and
copies those over to e.g., ~/anaconda/envs/demotest
Using pip install -e . creates a link in e.g., ~/anaconda/envs/demotest to the source directory so that the whole
modify/test/fix work flow is possible. Since the business end of that link is in the environment's directory, this
doesn't break down any walls in our sandbox.

So I my machines, I have ~20 - 30 environments in ~/miniconda3/envs. Then I have N directories ~/Documents/work/sources/
with everything I get from Github and or just things I'm working on.
Looking I a small cross-section of those, I have:

~/Documents/work/sources/wqio
~/Documents/work/sources/pybmp

Both are custom python libraries that I'm writing. And both projects are moving very fast, but they're also used by a
handful of my colleagues. wqio stands for water quality inflow/outflow and is geared towards analyzing the efficacy of
various water quality treatment methods (really, just comparing the distributions of two different datasets, but
whatever). pybmp is a little module that pulls data out of the International Stormwater BMP Database and passes it
into the data structures defined in wqio.
The development of the two modules is completely independent, but conceptually they are very tightly coupled.
So I have the following environments:

~/miniconda3/envs/wqio34
~/miniconda3/envs/wqio27

In these, I've use pip install -e . so that as I tweak wqio, I can just do
$ source activate wqio34
(wqio34) $ python -c "import wqio; wqio.test()"
(wqio34) $ source deactivate
$ source activate wqio27
(wqio27) $ python -c "import wqio; wqio.test()"

And run the test suite directly from source code in both environments before pushing up to github and seeing what
Travis CI and coveralls have to say about the whole thing.
And the I have:

~/miniconda3/envs/pybmp
~/miniconda3/envs/pybmpdev

The former has the latests "stable" version of wqio (from pip install . some time ago). The later has the latest source
version of both wqio and pybmp (pip install -e . in both source dirs) so that I run the latests tests in both modules
just to make sure latest of everything all works.