Let's just assume you're using an Ubuntu-ish distro of Linux. In some ways that makes
this a little more complicated, but on the other hand, it lets me assume you have experience
with other package managers. So the big thing here is that conda
is it's own little scientific
apt-get
(python packages, GIS tools, R + R packages, gcc, etc) that goes off and builds sandboxes
contained in individual rooms. Then there's pip
. Pip is specifically for python packages only and
in my opinion, should only be used when the conda package isn't available.
Back to conda: conda is a package manager that depends on python, but is not per se an installation of python. So:
conda update conda
Will update conda, but won't touch python for all intents and purposes.
Now let's go through a workflow. First thing you do is open up a terminal. I'm on my mac, but the same concepts apply. First thing to make sure is that you added conda to your PATH when you installed. Sounds like you did.
paul$ which conda
/Users/paul/miniconda3/bin/conda # <--- yup it's in my path
So now:
paul$ conda update conda
Fetching package metadata: ......
# All requested packages already installed.
# packages in environment at /Users/paul/miniconda3:
#
conda 3.10.1 py33_0
OK. Conda is up-to-date. Chances are if you start using python or pip, you're actually using your system's version (from apt-get). This is important. Just like you make a package for each R project you do (maybe? I don't really know), you can know make a sandbox (environment) for each project.
Let's say my boss needs to me read in a bunch of water level data, resample it to hourly data, and blah blah. I create a sandbox with the current versions of all the libraries I need, do the one-off in an jupyter notebook and forget about. Two years later, I can come back to that same environment and notebook and even though all the packages have moved on and fixed bugs and introduced new bugs, I'm confident that the sandbox, I can fully reproduce the analysis since those packages haven't changed.
paul$ conda create --name=demotest ipython pandas=0.14 python=3.4
Fetching package metadata: ......
Solving package specifications: .
Package plan for installation in environment /Users/paul/miniconda3/envs/demotest:
The following NEW packages will be INSTALLED:
dateutil: 2.4.1-py34_0
ipython: 3.1.0-py34_0
numpy: 1.9.2-py34_0
openssl: 1.0.1k-1
pandas: 0.14.1-np19py34_0
pip: 6.1.1-py34_0
python: 3.4.3-0
python.app: 1.2-py34_3
pytz: 2015.2-py34_0
readline: 6.2-2
scipy: 0.15.1-np19py34_0
setuptools: 15.0-py34_0
six: 1.9.0-py34_0
sqlite: 3.8.4.1-1
tk: 8.5.18-0
xz: 5.0.5-0
zlib: 1.2.8-0
Proceed ([y]/n)? y
Linking packages ...
[ COMPLETE ]|####################################################################| 100%
#
# To activate this environment, use:
# $ source activate demotest
#
# To deactivate this environment, use:
# $ source deactivate
So at this point we can't access aything that we installed until we step into the sandbox. You do that with:
paul$ source activate demotest
discarding /Users/paul/miniconda3/bin from PATH
prepending /Users/paul/miniconda3/envs/demotest/bin to PATH
(demotest)paul$
Conda changes the prompt of the terminal to let us know which sandbox we're in. At this point we can install new
packages with conda install <package>
. Also notice that pip
was installed by default so we can use that too,
if a package isn't available through conda. I mean, really we could use pip
for everything, but I think it's
best you only use it when you can't get something in conda. That said, in our newly activated environment, let's
install something using both things:
(demotest)paul$ conda install seaborn
Fetching package metadata: ......
Solving package specifications: .
Package plan for installation in environment /Users/paul/miniconda3/envs/demotest:
The following NEW packages will be INSTALLED:
freetype: 2.5.2-0
libpng: 1.5.13-1
matplotlib: 1.4.3-np19py34_1
pyparsing: 2.0.3-py34_0
python-dateutil: 2.4.2-py34_0
seaborn: 0.5.1-np19py34_0
The following packages will be UPDATED:
pandas: 0.14.1-np19py34_0 --> 0.16.0-np19py34_1
Proceed ([y]/n)? y
Unlinking packages ...
[ COMPLETE ]|####################################################################| 100%
Linking packages ...
[ COMPLETE ]|####################################################################| 100%
(demotest)paul$
Just like apt-get
, conda updated what it needed to get the depenencies.
(demotest)paul$ pip install coveralls
Collecting coveralls
Using cached coveralls-0.5.zip
Collecting PyYAML>=3.10 (from coveralls)
Using cached PyYAML-3.11.tar.gz
Collecting docopt>=0.6.1 (from coveralls)
Using cached docopt-0.6.2.tar.gz
Collecting coverage<3.999,>=3.6 (from coveralls)
Using cached coverage-3.7.1.tar.gz
Collecting requests>=1.0.0 (from coveralls)
Using cached requests-2.6.0-py2.py3-none-any.whl
Installing collected packages: PyYAML, docopt, coverage, requests, coveralls
Running setup.py install for PyYAML
Running setup.py install for docopt
Running setup.py install for coverage
Running setup.py install for coveralls
Successfully installed PyYAML-3.11 coverage-3.7.1 coveralls-0.5 docopt-0.6.2 requests-2.6.0
It pretty much did the same thing. But the drawback here is that within the demotest
sandbox,
the conda update
will only be able to update packages installed with conda.
And that might be the answer to your question. You see, you can install conda through pip into an existing python installation. In that case, when you're dealing with your existing python, conda didn't install it (apt-get did) so conda can't update it. The way around that is to just create a new environment (sandbox).
(demotest)paul$ source deactivate # <--- now we're ignoring everyting in demotest
discarding /Users/paul/miniconda3/envs/demotest/bin from PATH
paul$ conda create --name=demo27 python=2.7
Fetching package metadata: ......
Solving package specifications: .
Package plan for installation in environment /Users/paul/miniconda3/envs/demo27:
The following packages will be downloaded:
package | build
---------------------------|-----------------
pip-6.1.1 | py27_0 1.4 MB
The following NEW packages will be INSTALLED:
openssl: 1.0.1k-1
pip: 6.1.1-py27_0
python: 2.7.9-1
readline: 6.2-2
setuptools: 15.0-py27_0
sqlite: 3.8.4.1-1
tk: 8.5.18-0
zlib: 1.2.8-0
Proceed ([y]/n)? y
Fetching packages ...
pip-6.1.1-py27 100% |#######################################################| Time: 0:00:01 803.14 kB/s
Extracting packages ...
[ COMPLETE ]|#####################################################| 100%
Linking packages ...
[ COMPLETE ]|######################################################| 100%
#
# To activate this environment, use:
# $ source activate demo27
#
# To deactivate this environment, use:
# $ source deactivate
#
paul$ source activate demo27
discarding /Users/paul/miniconda3/bin from PATH
prepending /Users/paul/miniconda3/envs/demo27/bin to PATH
(demo27)paul$ python
Python 2.7.9 |Continuum Analytics, Inc.| (default, Dec 15 2014, 10:37:34)
[GCC 4.2.1 (Apple Inc. build 5577)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://binstar.org
>>> import pandas
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named pandas
>>> exit()
It fails because this environment knowns nothing about the demotest
environment we created containing pandas.
You have two choices: build with conda and use pip. This is the one place where I recommend pip.
the conda build
process is complicated b/c it's actually more geared towards making a binary package
to distribute to other via conda (which great, but I don't think that's what you're trying to do).
From your activated environment and inside the source directory of the package containing the
setup.py
file, executing pip install .
will execute python setup.py install
on that source and install
it into your local environment. If you downloaded the package from e.g., github and simply need to use it,
you're done. If you want to modify and develop the package, you can then run pip install -e .
. The -e
is for "editable" and so now source code changed will be reflected everytime your restart your python
interpreter or manually reload the module (reload <package>
on python 2, import imp; imp.reload(<package>)
for python 3. Here's what that looks like:
(demo27)paul$ cd sources/wqio
(demo27)Paul-Hobsons-iMac:wqio paul$ pip install .
Processing /Users/paul/sources/wqio
Requirement already satisfied (use --upgrade to upgrade): seaborn in /Users/paul/miniconda3/envs/demo27/lib/python2.7/site-packages (from wqio==0.1)
Installing collected packages: wqio
Running setup.py install for wqio
Successfully installed wqio-0.1
In your specific example, you can use the `pip install requirements.txt command. But I would open up that file and try to get everything through conda first.
The part I am still slightly confused about is the relationship between the location of the environments you create (
~/anaconda/envs/whatever
) to your actual project -- so if I've forked and cloned someone's git repository into~/projects/whatever
, do I create an environment in that directory? Do I need to move the project into one of the environments in~/anaconda/envs/
? Or do I just need to activate the environment I want, but work on the project wherever it's located? I think it's the last one, but words like "environment" and "sandbox" make it sound like everything should be all wrapped up in one neat little box, i.e. the same directory.
I can see how the two concepts can seem to be in conflict, but here's why the last scenario is correct:
- Environments should be completely self-contained and are always created in ~/anaconda/envs
- Source code should be where you can easily access (and edit) it, i.e., your user directory
- Packages installed from source might be in multiple environments, so there should only be one
git clone
command - Using
pip install .
to install a local package actually compiles the source code and any extension modules and copies those over to e.g., ~/anaconda/envs/demotest - Using
pip install -e .
creates a link in e.g., ~/anaconda/envs/demotest to the source directory so that the whole modify/test/fix work flow is possible. Since the business end of that link is in the environment's directory, this doesn't break down any walls in our sandbox.
So I my machines, I have ~20 - 30 environments in ~/miniconda3/envs. Then I have N directories ~/Documents/work/sources/ with everything I get from Github and or just things I'm working on.
Looking I a small cross-section of those, I have:
- ~/Documents/work/sources/wqio
- ~/Documents/work/sources/pybmp
Both are custom python libraries that I'm writing. And both projects are moving very fast, but they're also used by a
handful of my colleagues. wqio
stands for water quality inflow/outflow and is geared towards analyzing the efficacy of
various water quality treatment methods (really, just comparing the distributions of two different datasets, but
whatever). pybmp
is a little module that pulls data out of the International Stormwater BMP Database and passes it
into the data structures defined in wqio.
The development of the two modules is completely independent, but conceptually they are very tightly coupled.
So I have the following environments:
- ~/miniconda3/envs/wqio34
- ~/miniconda3/envs/wqio27
In these, I've use pip install -e .
so that as I tweak wqio, I can just do
$ source activate wqio34
(wqio34) $ python -c "import wqio; wqio.test()"
(wqio34) $ source deactivate
$ source activate wqio27
(wqio27) $ python -c "import wqio; wqio.test()"
And run the test suite directly from source code in both environments before pushing up to github and seeing what Travis CI and coveralls have to say about the whole thing.
And the I have:
- ~/miniconda3/envs/pybmp
- ~/miniconda3/envs/pybmpdev
The former has the latests "stable" version of wqio (from pip install .
some time ago). The later has the latest source
version of both wqio and pybmp (pip install -e .
in both source dirs) so that I run the latests tests in both modules
just to make sure latest of everything all works.