Python development using Pants
brian wickman - @wickman
Why use Pants for Python development?
Pants makes the manipulation and distribution of hermetically sealed Python environments painless.
But why another system?
There are several solutions for package management in Python. Almost
everyone is familiar with running
sudo easy_install PackageXYZ. This
leaves a lot to be desired. Over time, your Python installation will
collect dozens of packages, become annoyingly slow or even broken, and
reinstalling it will invariably break a number of the applications
that you were using.
A marked improvement over the
sudo easy_install model is
virtualenv to isolate Python environments on a
project by project basis. This is useful for development but does not
directly solve any problems related to deployment, whether it be to a production
environment or to your peers. It is also challenging to explain to a
A different solution altogether, zc.buildout attempts to provide a framework and recipes for many common development environments. It has arguably gone the farthest for automating environment reproducibility amongst the popular tools, but shares the same complexity problems as all the other abovementioned solutions.
Most solutions leave deployment as an afterthought. Why not make the development and deployment environments the same by taking the environment along with you?
Pants and PEX
The lingua franca of Pants is the PEX file (PEX itself does not stand for anything in particular, though in spirit you can think of it as a "Python EXecutable".)
PEX files are single-file lightweight virtual Python environments.
The only difference is no virtualenv setup instructions or
pip install foo bar baz. PEX files are self-bootstrapping Python
environments with no strings attached and no side-effects. Just a simple
mechanism that unifies both your development and your deployment.
First it is necessary to install Pants. See installation instructions.
TL;DR - 'Hello world!' with Pants Python
$ git clone git://github.com/twitter/commons $ cd commons $ mkdir -p src/python/twitter/my_project $ vi src/python/twitter/my_project/BUILD
python_binary( name = 'hello_world', source = 'hello_world.py' )
$ vi src/python/twitter/my_project/hello_world.py
To run directly:
$ ./pants py src/python/twitter/my_project:hello_world Build operating on target: PythonBinary(src/python/twitter/my_project/BUILD:hello_world) Hello world!
$ ./pants src/python/twitter/my_project:hello_world Build operating on targets: OrderedSet([PythonBinary(src/python/twitter/my_project/BUILD:hello_world)]) Building PythonBinary PythonBinary(src/python/twitter/my_project/BUILD:hello_world): Wrote /Users/wickman/clients/science-py-csl/dist/hello_world.pex
and run separately:
$ dist/hello_world.pex Hello world!
NOTE: The first time you run
./pants will likely take a ridiculous amount
of time as Pants bootstraps itself inside your directory. Note, it never
installs anything in a global site-packages.
Describing Python environments in Pants
Build dependencies in Pants are managed with
BUILD files that are
co-located with your source. These files are used to describe the following:
- libraries: bundles of sources and resources, that may or may not also depend on other libraries
- binaries: a single source (the executable) and libraries it depends upon
- requirements: external dependencies as resolved by dependency managers e.g. pypi in Python or ivy on the JVM
The main point of Pants is to take these
BUILD files and do something useful with them.
BUILD file format
These descriptions are stored in files named BUILD and colocated near the binaries/libraries they describe. Let's take for example the src/python/twitter/tutorial subtree in commons:
$ ls -lR src/python/twitter/tutorial/ total 16 -rw-r--r-- 1 wickman wheel 137 Apr 9 22:59 BUILD -rw-r--r-- 1 wickman wheel 118 Apr 9 22:59 hello_world.py
Let's take a look at the BUILD file in
python_binary( name = "hello_world", source = "hello_world.py", dependencies = [ pants("src/python/twitter/common/app"), ] )
This BUILD file names one target:
hello_world, which is a
python_binary target. The
contains one source file,
hello_world.py and depends upon one other
target, the format of which will be described shortly.
It should be noted that sources are relative to the location of the BUILD
file itself, e.g.
hello_world.py inside of
src/python/twitter/tutorial/BUILD actually refers to
from twitter.common import app def main(): print('Hello world!') app.main()
Dependencies, on the other hand, are relative to the source root of the repository which is defined
by the BUILD file that sits next to the
# Define the repository layout source_root('src/antlr', doc, page, python_antlr_library) source_root('src/java', annotation_processor, doc, jvm_binary, java_library, page) source_root('src/protobuf', doc, java_protobuf_library, page) source_root('src/python', doc, page, python_binary, python_library) source_root('src/scala', doc, jvm_binary, page, scala_library) source_root('src/thrift', doc, java_thrift_library, page, python_thrift_library) source_root('tests/java', doc, java_library, java_tests, page) source_root('tests/python', doc, page, python_library, python_tests, python_test_suite) source_root('tests/scala', doc, page, scala_library, scala_tests)
This file can be tailored to map to any source root structure such as Maven
style, Twitter style (as described above) or something more flat such as a
setup.py-based project. This however is an advanced topic that is not
covered in this document.
src/python/twitter/tutorial/BUILD, only one target is defined,
hello_world. This target is addressed by
src/python/twitter/tutorial:hello_world which means the target
src/python/twitter/tutorial/BUILD. In general,
targets take the form
<path>:<target name> with the special cases:
- in the case of
BUILDcomponent may be elided and instead
path/to/directory:targetmay be used
path/to/directoryis short form for
src/python/twitter/common/appis short form for
pants('src/python/twitter/common/app') in its
pants() keyword is akin to a "pointer dereference" for an address. It will point
to whatever target is described at that address, in this case a
python_library( name = "app", sources = globs('*.py'), dependencies = [ pants('src/python/twitter/common/dirutil'), pants('src/python/twitter/common/lang'), pants('src/python/twitter/common/options'), pants('src/python/twitter/common/util'), pants('src/python/twitter/common/app/modules'), ] )
which in turn includes even more dependencies. The job of Pants is to manage the transitive closure of all these dependencies and manipulate collections of these targets for you.
Python target types
BUILD files themselves are just Python. The only thing magical is that the
from twitter.pants import * has been autoinjected. This
provides a number of Python-specific targets such as:
and a whole host of other targets including Java, Scala, Python, Markdown,
pants target and so forth. See
src/python/twitter/pants/__init__.py for a comprehensive list of targets.
python_library target has a name, zero or more source files, zero or
more resource files, and zero or more dependencies. These dependencies may
python_library-like targets (
python_antlr_library and so forth) or
python_binary target is almost identical to a
python_library target except instead of
sources, it takes one
of two possible parameters:
source: The source file that should be executed within the "library" otherwise defined by
entry_point: The entry point that should be executed within the "library" otherwise defined by
python_binary. Entry points take the format of
pkg_resources.EntryPoint, which is something akin to
some.module.name:my.attrwhich means run the function pointed by
my.attrinside the module
some.moduleinside the environment. The
:my.attrcomponent can be omitted and the module is executed directly (presuming it has a
python_requirement target describes an external dependency as understood by easy_install or pip. It takes only
a single non-keyword argument of the
Requirement-style string, e.g.
python_requirement('django-celery') python_requirement('tornado==2.2') python_requirement('kombu>=2.1.1,<3.0')
This will resolve the dependency and its transitive closure, for example
django-celery pulls down the following
Pants takes care of handling these dependencies for you. It will never install anything globally. Instead it will
build the dependency and cache it in
.pants.d and assemble them a la carte into an execution environment.
python_thrift_library target takes the same arguments as
python_library arguments, except that files described
sources must be thrift files. If your library or binary depends upon this target type, Python bindings
will be autogenerated and included within your environment.
Building your first PEX
Now you're ready to build your first PEX file (technically you already have,
by building Pants itself.) By default if you specify
./pants <target>, it
assumes you mean
./pants build <target> and does precisely that:
$ PANTS_VERBOSE=1 ./pants src/python/twitter/tutorial:hello_world Build operating on targets: OrderedSet([PythonBinary(src/python/twitter/tutorial/BUILD:hello_world)]) Resolver: Calling environment super => 0.046ms Building PythonBinary PythonBinary(src/python/twitter/tutorial/BUILD:hello_world): Building PythonBinary PythonBinary(src/python/twitter/tutorial/BUILD:hello_world): Dumping library: PythonLibrary(src/python/twitter/common/app/BUILD:app) [relative module: ] Dumping library: PythonLibrary(src/python/twitter/common/dirutil/BUILD:dirutil) [relative module: ] Dumping library: PythonLibrary(src/python/twitter/common/lang/BUILD:lang) [relative module: ] Dumping library: PythonLibrary(src/python/twitter/common/options/BUILD:options) [relative module: ] Dumping library: PythonLibrary(src/python/twitter/common/util/BUILD:util) [relative module: ] Dumping library: PythonLibrary(src/python/twitter/common/app/modules/BUILD:modules) [relative module: ] Resolver: Calling environment super => 0.016ms Dumping binary: twitter/tutorial/hello_world.py Wrote /private/tmp/wickman-commons/dist/hello_world.pex
You will see that despite specifying just one dependency, the transitive
hello_world pulled in all of
and its direct descendants. That's because those library targets depended
upon other library targets, than in turn depending on even more. At the end
of the day, we bundle up the closed set of all dependencies and bundle them
Since it uses the
twitter.common.app framework, we know we can fire it up
and poke around with
$ dist/hello_world.pex --help Options: -h, --help, --short-help show this help message and exit. --long-help show options from all registered modules, not just the __main__ module.
If we specify
--long-help, we can see the help of transitively included
$ dist/hello_world.pex --long-help Options: -h, --help, --short-help show this help message and exit. --long-help show options from all registered modules, not just the __main__ module. From module twitter.common.app: --app_daemonize Daemonize this application. [default: False] --app_profile_output=FILENAME Dump the profiling output to a binary profiling format. [default: None] --app_daemon_stderr=TWITTER_COMMON_APP_DAEMON_STDERR Direct this app's stderr to this file if daemonized. [default: /dev/null] --app_debug Print extra debugging information during application initialization. [default: False] --app_daemon_stdout=TWITTER_COMMON_APP_DAEMON_STDOUT Direct this app's stdout to this file if daemonized . [default: /dev/null] --app_profiling Run profiler on the code while it runs. Note this can cause slowdowns. [default: False] --app_ignore_rc_file Ignore default arguments from the rc file. [default: False] --app_pidfile=TWITTER_COMMON_APP_PIDFILE The pidfile to use if --app_daemonize is specified. [default: None]
Or we can simply execute it as intended:
$ dist/hello_world.pex Hello world!
Environment manipulation with
We've only discussed so far the "pants build" command. There's also a
dedicated "py" command that allows you to manipulate the environments
python_library targets, such as drop into
an interpreter with the environment set up for you.
pants py semantics
The default behavior of
pants py <target> is the following:
python_binarytargets, build the environment and execute the target
- For one or more
python_librarytargets, build the environment that is the transitive closure of all targets and drop into an interpreter.
- For a combination of
python_librarytargets, build the transitive closure of all targets and execute the first binary target.
python_binary( name = "hello_world", source = "hello_world.py", dependencies = [ pants(":hello_world_lib") ] ) python_library( name = "hello_world_lib", dependencies = [ pants("src/python/twitter/common/app"), python_requirement("tornado"), python_requirement("psutil"), ] )
This uses the
python_requirement target which can refer to any string in
pkg_resources.Requirement format as
recognized by tools such as
pip as described above.
Now that we've created a library-only target
src/python/twitter/tutorial:hello_world_lib, let's drop
into it using
pants py with verbosity turned on so that we can see what's
going on in the background:
$ PANTS_VERBOSE=1 ./pants py src/python/twitter/tutorial:hello_world_lib Build operating on target: PythonLibrary(src/python/twitter/tutorial/BUILD:hello_world_lib) Resolver: Calling environment super => 0.019ms Building PythonBinary PythonLibrary(src/python/twitter/tutorial/BUILD:hello_world_lib): Dumping library: PythonLibrary(src/python/twitter/tutorial/BUILD:hello_world_lib) [relative module: ] Dumping library: PythonLibrary(src/python/twitter/common/app/BUILD:app) [relative module: ] Dumping library: PythonLibrary(src/python/twitter/common/dirutil/BUILD:dirutil) [relative module: ] Dumping library: PythonLibrary(src/python/twitter/common/lang/BUILD:lang) [relative module: ] Dumping library: PythonLibrary(src/python/twitter/common/options/BUILD:options) [relative module: ] Dumping library: PythonLibrary(src/python/twitter/common/util/BUILD:util) [relative module: ] Dumping library: PythonLibrary(src/python/twitter/common/app/modules/BUILD:modules) [relative module: ] Dumping requirement: tornado Dumping requirement: psutil Resolver: Calling environment super => 0.029ms Resolver: Activating cache /private/tmp/wickman-commons/3rdparty/python => 356.432ms Resolver: Resolved tornado => 357.219ms Resolver: Activating cache /private/tmp/wickman-commons/.pants.d/.python.install.cache => 41.117ms Resolver: Fetching psutil => 10144.264ms Resolver: Building psutil => 1794.474ms Resolver: Distilling psutil => 224.896ms Resolver: Constructing distribution psutil => 2.855ms Resolver: Resolved psutil => 12210.066ms Dumping distribution: .../tornado-2.2-py2.6.egg Dumping distribution: .../psutil-0.4.1-py2.6-macosx-10.4-x86_64.egg Python 2.6.7 (r267:88850, Aug 31 2011, 15:49:05) [GCC 4.2.1 (Apple Inc. build 5664)] on darwin Type "help", "copyright", "credits" or "license" for more information. (InteractiveConsole) >>>
In the background,
pants used cached version of
tornado but fetched
psutil from pypi and any necessary transitive dependencies (none in this
case) and built a platform-specific version for us.
You can convince yourself that the environment contains all the dependencies
sys.path and importing libraries as you desire:
>>> import psutil >>> help(psutil) >>> from twitter.common import app >>> help(app)
It should be stressed that dependencies built by Pants are never installed globally. These dependencies only exist for the duration of the Python interpreter forked by Pants.
Running an application using
Let us turn our
hello_world.py into a basic
top application using
from twitter.common import app import psutil import tornado.ioloop import tornado.web class MainHandler(tornado.web.RequestHandler): def get(self): self.write('<pre>Running pids:\n%s</pre>' % '\n'.join(map(str, psutil.get_pid_list()))) def main(): application = tornado.web.Application([ (r"/", MainHandler) ]) application.listen(8888) tornado.ioloop.IOLoop.instance().start() app.main()
We have now split our application into two parts: the
target and the
hello_world_lib library target. If we run
pants py src/python/twitter/tutorial:hello_world_lib, the default behavior is to
drop into an interpreter.
If we run
pants py src/python/twitter/tutorial:hello_world, the default behavior is to run
the binary target pointed to by
$ ./pants py src/python/twitter/tutorial:hello_world
Then point your browser to localhost:8888
pants py --pex
There is also a
--pex option to pants py that allows you to build a PEX
file from a union of python_library targets that does not necessarily have a
python_binary target defined for it. Since there is no entry point
specified, the resulting .pex file just behaves like a Python interpreter,
but with the sys.path bootstrapped for you:
$ ./pants py --pex src/python/twitter/tutorial:hello_world_lib Build operating on target: PythonLibrary(src/python/twitter/tutorial/BUILD:hello_world_lib) Wrote /private/tmp/wickman-commons/dist/hello_world_lib.pex $ ls -la dist/hello_world_lib.pex -rwxr-xr-x 1 wickman wheel 1404174 Apr 10 13:00 dist/hello_world_lib.pex
Now if you use dist/hello_world_lib.pex, since it has no entry point, it will drop you into an interpreter:
$ dist/hello_world_lib.pex Python 2.6.7 (r267:88850, Aug 31 2011, 15:49:05) [GCC 4.2.1 (Apple Inc. build 5664)] on darwin Type "help", "copyright", "credits" or "license" for more information. (InteractiveConsole) >>> import tornado
As mentioned before, it's like a single-file lightweight alternative to a
virtualenv. We can even use it to run our
$ dist/hello_world_lib.pex src/python/twitter/tutorial/hello_world.py
This can be an incredibly powerful and lightweight way to manage and deploy
virtual environments without using
PEX file as interpreter
As mentioned above, PEX files without default entry points behave like Python interpreters that
carry their dependencies with them. For example, let's create a target that
provides a Fabric dependency within
python_library( name = 'fabric', dependencies = [ python_requirement('Fabric') ] )
And let's build a fabric PEX file:
$ ./pants py --pex src/python/twitter/tutorial:fabric Build operating on target: PythonLibrary(src/python/twitter/tutorial/BUILD:fabric) Wrote /private/tmp/wickman-commons/dist/fabric.pex
By default it does nothing more than drop us into an interpreter:
$ dist/fabric.pex Python 2.6.7 (r267:88850, Aug 31 2011, 15:49:05) [GCC 4.2.1 (Apple Inc. build 5664)] on darwin Type "help", "copyright", "credits" or "license" for more information. (InteractiveConsole) >>>
But suppose we have a local script that depends upon Fabric,
from fabric.api import * def main(): local('echo hello world') if __name__ == '__main__': main()
We can now use
fabric.pex as if it were a Python interpreter but with
fabric available in its environment. Note that fabric has never been
installed globally in any site-packages anywhere. It is just bundled inside
$ dist/fabric.pex fabric_hello_world.py [localhost] local: echo hello world hello world
An advanced feature of
python_binary targets, you may in addition specify
direct entry points into PEX files rather than a source file. For example,
if we wanted to build an a la carte
fab wrapper for fabric:
python_binary(name = "fab", entry_point = "fabric.main:main", dependencies = [ python_requirement("fabric"), ] )
$ ./pants src/python/twitter/tutorial:fab Build operating on targets: OrderedSet([PythonBinary(src/python/twitter/tutorial/BUILD:fab)]) Building PythonBinary PythonBinary(src/python/twitter/tutorial/BUILD:fab): Wrote /private/tmp/wickman-commons/dist/fab.pex
dist/fab.pex behaves like a standalone
$ dist/fab.pex -h Usage: fab [options] <command>[:arg1,arg2=val2,host=foo,hosts='h1;h2',...] ... Options: -h, --help show this help message and exit -d NAME, --display=NAME print detailed info about command NAME -F FORMAT, --list-format=FORMAT formats --list, choices: short, normal, nested -l, --list print list of possible commands and exit --set=KEY=VALUE,... comma separated KEY=VALUE pairs to set Fab env vars --shortlist alias for -F short --list -V, --version show program's version number and exit -a, --no_agent don't use the running SSH agent -A, --forward-agent forward local agent to remote end --abort-on-prompts abort instead of prompting (for password, host, etc) ...
Pants also has excellent support for JVM-based builds and can do similar things like resolving external JARs and packaging them as standalone environments with default entry points.
Manipulating PEX behavior with environment variables
Given a PEX file, it is possible to alter its default behavior during invocation.
If you have a PEX file with a prescribed executable source or
entry_point specified, it may still
occasionally be useful to drop into an interpreter with the environment bootstrapped. If you
PEX_INTERPRETER=1 in your environment, the PEX bootstrapper will skip any execution and instead
launch an interactive interpreter session.
If your environment is failing to bootstrap or simply bootstrapping very slowly, it can be useful to
PEX_VERBOSE=1 in your environment to get debugging output printed to the console. Debugging output
- Fetched dependencies
- Built dependencies
- Activated dependencies
- Packages scrubbed out of
sys.pathused to launch the interpreter
If you have a PEX file without a prescribed entry point, or want to change
entry_point for the duration of a single invocation, you can set
PEX_MODULE=entry_point using the same format as described in the
python_binary Pants target.
This can be a useful tool for bundling up a number of packages together and being able to use a single file to execute scripts from each of them.
Another common pattern is to link
pytest into your PEX file, and run
PEX_MODULE=pytest my_pex.pex tests/*.py to run your test suite in its
There is nascent support for performing code coverage within PEX files by
PEX_COVERAGE=<suffix>. By default the coverage files will be written
into the current working directory with the file pattern
requires that the
coverage Python module has been linked into your PEX.
You can then combine the coverage files by running
PEX_MODULE=coverage my_pex.pex .coverage.suffix* and run a report using
PEX_MODULE=coverage my_pex.pex report. Since PEX files are just zip files,
coverage is able
to understand and extract source and line numbers from them in order to
produce coverage reports.
How PEX files work
the utility of zipimport and
As an aside, in Python, you may not know that you can import code from directories:
$ mkdir -p foo $ touch foo/__init__.py $ echo "print 'spam'" > foo/bar.py $ python -c 'import foo.bar' spam
All that is necessary is the presence of
__init__.py to signal to the importer that we
are dealing with a package. Similarly, a directory can be made "executable":
$ echo "print 'i like flowers'" > foo/__main__.py $ python foo i like flowers
And because the
zipimport module now provides a default import hook for
Pythons >= 2.4, if the Python import framework sees a zip file, with the
inclusion of a proper
__init__.py, it can be treated similarly to a
directory. But since a directory can be executable, if we just drop a
__main__.py into a zip file, it suddenly becomes executable:
$ pushd foo && zip /tmp/flower.zip __main__.py && popd /tmp/foo /tmp adding: __main__.py (stored 0%) /tmp $ python flower.zip i like flowers
And since zip files don't actually start until the zip magic number, you can embed arbitrary strings at the beginning of them and they're still valid zips. Hence simple PEX files are born:
$ echo '#!/usr/bin/env python2.6' > flower.pex && cat flower.zip >> flower.pex $ chmod +x flower.pex $ ./flower.pex i like flowers
$ unzip -l pants.pex | tail -2 warning [pants.pex]: 25 extra bytes at beginning or within zipfile (attempting to process anyway) -------- ------- 7900812 543 files $ head -c 25 pants.pex #!/usr/bin/env python2.6
__main__.py in a real PEX file is somewhat special:
import os import sys __entry_point__ = None if '__file__' in locals() and __file__ is not None: __entry_point__ = os.path.dirname(__file__) elif '__loader__' in locals(): from pkgutil import ImpLoader if hasattr(__loader__, 'archive'): __entry_point__ = __loader__.archive elif isinstance(__loader__, ImpLoader): __entry_point__ = os.path.dirname(__loader__.get_filename()) if __entry_point__ is None: sys.stderr.write('Could not launch python executable!\n') sys.exit(2) sys.path.insert(0, os.path.join(__entry_point__, '.bootstrap')) from twitter.common.python.importer import monkeypatch monkeypatch() del monkeypatch from twitter.common.python.pex import PEX PEX(__entry_point__).execute()
PEX is just a class that manages requirements (often embedded within PEX
files as egg distributions in the
.deps directory) and autoimports them
sys.path, then executes a prescribed entry point.
If you read the code closely, you'll notice that it relies upon monkeypatching
zipimport. Inside the
twitter.common.python library we've provided a recursive
zip importer derived from Google's pure Python zipimport
module that allows for depending upon eggs within eggs or zips (and so forth)
so that PEX files need not extract egg dependencies to disk a priori. This even
extends to C extensions (.so and .dylib files) which are written to disk long
enough to be dlopened before being unlinked.
Strictly speaking this monkeypatching is not necessary and we may consider making that optional.
Advanced Pants/PEX features
TODO: converting python_library targets to eggs
TODO: auto dependency resolution from within PEX files
TODO: dynamically self-updating PEX files
TODO: tailoring your dependency resolution environment with pants.ini, including local cheeseshop mirrors
TODO: multi-interpreter / multi-platform support with pants.multi / pants goal setup