Skip to content

Instantly share code, notes, and snippets.

@aschmu
Last active May 2, 2024 12:34
Show Gist options
  • Star 19 You must be signed in to star a gist
  • Fork 6 You must be signed in to fork a gist
  • Save aschmu/4bdb40f5212eea6bf3ea698eeea2d60c to your computer and use it in GitHub Desktop.
Save aschmu/4bdb40f5212eea6bf3ea698eeea2d60c to your computer and use it in GitHub Desktop.
Setting up a PyPi mirror

I) Introduction

Taken from: [https://groups.google.com/forum/#!topic/devpi-dev/S-3ioWILTiY]).

We wanted to do some python developpement but we had a problem : our working network is completely isolated from any internet access, only having servers providing services like distribution packages repository mirror and web servers. The question then was how to make a full pypi mirror for usig pip install and pip search.

This is possible to be done using devpi, a web server, bandersnatch and an external hard drive. The idea is to dump all pypi packages with bandersnatch, transfer them to the server with the hard drive, serve them for install in the web server and finally enable the search feature by mirroring the web server using devpi.

Note : we ensure this works with devpi 4.1.0 and does not work with devpi 3.0.0. Versions between haven't been tested.

II) Dumping pypi with bandersnatch

  • Take a computer with internet access (or at least access to pypi.python.org) and install bandersnatch : pip install bandersnatch You can do it in a virtualenv if you don't want to mess your computer : virtualenv env source env/bin/activate pip install bandersnatch

  • Mount your external hard drive and tell bandersnatch to dump the files into it by editing /etc/bandersnatch.conf : directory = /path/to/external/drive/

  • Start the dump : bandersnatch mirror

This operation may take few days to be done, depending on your internet access speed. FYI, as we are writing this, pypi stores for about 250GB of data.

You will obtain a web folder with at least two folders in it : packages and simple

III) Make the webserver

  • Copy the web folder onto the server and configure the web server to make the web folder the root folder of a site (can be in subdirectory).

  • Configure pip on the server for the user that will run devpi to point to the webserver : ~/.pip/pip.conf : [global] index-url = http://localhost/path/to/web/simple

Note : it is important that the index-url points to simple

Now you should be able to use pip to install any python package you have dumped from pypi. You can ensure it by making a virtualenv and a pip install: virtualenv test source test/bin/activate pip install argparse rm -rf test

IV) Install devpi

Install devpi running the following (virtualenv is still usable for that) : pip install devpi-server devpi-web devpi-client

The devpi-client is not necessary on the server, it is just a CLI from which we will configure the devpi server. You can install it on one of the local machine if you want to. I will assume it is installed on the server though.

Run devpi-server : devpi-server --start --no-root-pypi

If you want to stop the server, use --stop. --no-root-pypi is used to tell the server not to create a mirror to the standard pypi server at first launch. It is not necessary to add it again for further launches.

Now you should have devpi-server running on the port 3141. You can try to connect to it using a web browser.

V) Configure devpi-server to mirror the web server

Note : Once again, it is important the url points to simple

Now the full pypi mirror should be usable. Let's try it : devpi use http://localhost:3141/root/pypi_mirror/+simple/ virtualenv test pip install argparse pip search flask rm -rf test

The pip search should return a lot of results. Note : the devpi server at version 4.1.0 does not provide packages descriptions with the search feature.

@rets-mah-ekaj
Copy link

Hello sir, I had a doubt. May be I didn't get it if it is already mentioned in the post.

I wanted to know that once we create this mirror, is there a way to automate the updates in the hard drive?
Like, if a package, say numpy, gets an update, how can I update that in my hard drive?

@donchkat
Copy link

donchkat commented Feb 15, 2023

Hello sir, I had a doubt. May be I didn't get it if it is already mentioned in the post.

I wanted to know that once we create this mirror, is there a way to automate the updates in the hard drive? Like, if a package, say numpy, gets an update, how can I update that in my hard drive?

@rets-mah-ekaj
I have the same question but I see there is no response, it's been a while since you've asked so I wonder if you got an answer in some other place maybe? If you did I would appreciate if you can share it with me.
Thanks!

@NotAMorningSpartan
Copy link

I'm not the original poster but I may be able to help.
Running bandersnatch mirror again will update your mirror. To run it continually, I would suggest either setting up a cronjob or a systemd-timer. Then, the copy on the hard drive will be updated. Once complete, you can take the drive, connect it back to the server, and use a program such as rsync to sync the newly updated files to the copy on your web server.
Official docs provide an example of a cronjob you could use.
https://pypi.org/project/bandersnatch/
Obviously, this will need to be adapted depending on your setup, but hopefully this can get you started.

@kgaainer
Copy link

kgaainer commented Apr 10, 2024

Hi, One question, say i want to upload a custom package to this local repository, how would you do that ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment