Skip to content

Instantly share code, notes, and snippets.

@dstufft
Created August 12, 2014 15:56
Show Gist options
  • Save dstufft/65a30972c1bc36cbed5d to your computer and use it in GitHub Desktop.
Save dstufft/65a30972c1bc36cbed5d to your computer and use it in GitHub Desktop.

Client Side

Peep

Peep is a wrapper around pip which ensures cryptographic integrity of the downloads. This removes the need to trust the index server, however it does mandate the need to pin to exact versions. It is really the only name in the game of client side verification. All other solutions for verification rely on placing an implicit trust in the validity of the index server.

This tool puts the "truth" of what we expect to install into a specially formatted requirements.txt file. This includes all dependencies, essentially the dependency list must be flattened inside of the requirements.txt

Python packages are typically distributed as a sdist file and one or more Wheel files (which may be platform dependent). In order to support multiple platforms the peep command supports registering multiple hashes with a single dependency and as long as the file matches one of them it is considered acceptable.

Server side

The server side answer is a little complicated. Essentially pip will trust the server to give it the correct thing and defining the "correct" thing is the hard part. All of these options require either living on a trusted network and/or using a TLS connection to validate that the server is a trusted server.

Plain Web server

Pratically any web server that can serve static files and can generate an auto index can be used as a local PyPI server. Essentially this is the simplest option. You simply place the files into a single directory by hand (or using any tool that you wish) and then serve that directory with an autoindex and TLS. You can then point pip to it using:

pip install --no-index --find-links https://internal-pypi/ <whatever>

This makes it simple to centralize the management of what packages are acceptable however it is more manual than other methods.

Devpi

Devpi is a tool that creates an internal PyPI package. Similarly to the plain web server option it relies on living on either a trusted network or using TLS to secure it's connection between itself and pip.

Unlike the Plain web server option it supports replication and simple multi index support, including the ability to "push" packages from one index server to another. This could allow having a seperate index for each environment and using a CLI app push releases from one to another as they are vetted.

Also unlike the plain web server option it supports uploading with setup.py, twine or it's own personal client. This can enable easier releasing of new versions without having to grant SSH access to the server.

Finally it can also mirror PyPI. It uses a lazy method of mirroring so it will present all of the packages available on PyPI immediately, but it won't actually store those packages locally until someone attempts to install one of them. At which point it will attempt to fetch that package from PyPI and will then cache it locally. This can be powerful when combined with multiple index support which can overlay a set of pre-compiled Wheels ontop of a PyPI mirror.

It's important to note that if not using the PyPI mirroring portion of this then it does not require public internet access.

Mirroring

If we're using peep to ensure that our builds are repeatable and that nobody has replaced the versions with something malicious then the server side becomes nothing more than an untrusted repository and we have the option of simply hosting a mirror instead of using it to attempt to provide vetted software.

Devpi

As above devpi supports lazy mirroring of PyPI, mutli index support and replication making it a good solution for a mirror that doesn't want to pay the disk cost of a full mirror but has a limit set of dependencies or generally has internet access. It's additional features can make it more compelling depending on the desired workflow.

Bandersnatch

Bandernsatch is the officially recommended tool for creating a full mirror of PyPI. It pushes all of the moving parts off to a cronjob which will do a delta based sync since the last sync. At the time of this writing it takes ~90G of storage to hold all of PyPI. This option only requires internet access to PyPI when the sync task is being ran. It's possible for that to only happen at pre ordained times however it's likely that this option would want constant access as well in order to keep things updated.

The server of packages is just using any web server that can serve static files. Autoindex support is not required.

The major downside to this option is the space required and the fact that there is no mechanism for hosting either private or forked packages, for that it would need one of the other options as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment