Note that any code following a $
(e.g. $ pipenv install
) should be run in the Terminal (Mac)/Command Prompt (Windows)/Command Line (Linux) in the appropriate directory.
Motivation: We want our research to be reproducible. Reproducibilty requires both that the person reproducing your research (1) has access to the same version of the code and (2) uses the exact same version of the packages needed in this code. To satisfy these requirements, GitHub is used to allow the reproducer to use the exact versions of the code and pipenv to easily install and use the exact same version of the packages used.
Pipenv can be installed with a simple $ pip3 install pipenv
We install packages in pipenv just like in pip, but when we do $ pipenv install example_package
, in addition to installing the package it adds the package to the Pipfile
and the Pipfile.lock
file
(if these files do not exist for this project yet, pipenv first creates them).
The Pipfile
file is a less technical file which only purpose is to be human-readable, while the Pipfile.lock
file is a technical file that is what pipenv
actually use.
The Pipfile
file is very similar to a requirements.txt
file, but it is nicer because anytime we install a package using pipenv
, it automatically adds that package
to the Pipfile
. Like a requirements.txt
, the Pipfile
does not require the user to define a version of the package.
pipenv
also adds the package to the Pipfile.lock
file and it always record which version that was installed even if we as a user did not define an exact version to install.
Therefore, even when we at the time of installing a package just wanted the most recent stable version of the package, anyone reproducing your work in the future will know exactly what version that was. Examples of a Pipfile
and a Pipfile.lock
are below.
As stated above, the Pipfile.lock
is the "machine-readable" version of the Pipfile
and it contains the exact versions that
the original authors used when they wrote the code. When you add more packages you again use $ pipenv install example_package
and pipenv will install that package, making sure that the new package has no dependency conflict with any package already installed, and finally update the Pipfile
and the Pipfile.lock
file.
- Starting with a new project (aka no
Pipfile
), we can simply do$ pipenv install package1 package2 package3
which will cause pipenv to create the virtual environment, create aPipfile
, install those packages, and then "lock" the versions inPipfile.lock
- To enter the virtual environment, you then navigate to the folder with the
Pipfile.lock
file and run$ pipenv shell
. This means that you step into the virtual environment and all the Python packages that you installed will now be available - If you need to add a new python package, simply execute
$ pipenv install new_package
and it will be added to the virtual environment and to thePipfile
as expected. - To exit the
$ pipenv shell
, just do$ exit
- Clone repo and navigate in the console to where the
Pipfile
and thePipfile.lock
file are located. - Run
$ pipenv sync
and it will create a virtual environment where the exact version of all packages from thePipfile.lock
will be installed - Then just run
$ pipenv shell
to enter the virtual environment
-
Instead of going into the shell with
$ pipenv shell
, you can also always run$ pipenv run <command>
(e.g.$ pipenv run python3 example_module.py
) which will run the<command>
in the Pipenv shell but your console will not enter into the shell. So, while you can run$ pipenv shell
and all subsequent commands will be in the virtual env without re-running, you would need to add$ pipenv run ...
before every command. I never do it this way (except in Docker containers...). In other words,foo@bar:~$ pipenv shell foo@bar:~$ python3 myscript1.py foo@bar:~$ python3 myscript2.py foo@bar:~$ python3 myscript3.py foo@bar:~$ exit foo@bar:~$ python3 myscript1.py # will throw ModuleNotFound exception since not in virtual environment
is equivalent to
foo@bar:~$ pipenv run python3 myscript1.py foo@bar:~$ pipenv run python3 myscript2.py foo@bar:~$ pipenv run python3 myscript3.py foo@bar:~$ python3 myscript1.py # will throw ModuleNotFound exception since not in virtual environment
-
Sometimes
$ pipenv install
(which basically re-installs/updates all your packages listed in the Pipfile) will take a long time to lock. This is not necessary, for example, when you are first starting a project and need to keep adding packages one by one. In this case, you can run$ pipenv install example-package --skip-lock
, which will still correctly install and add it to your Pipfile but without the long wait. Once you are ready to lock the dependencies, you can run$ pipenv lock
-
Pipenv can be annoying sometimes. If having a nonsensical problem, this almost always works:
- Run
$ pipenv --rm
which deletes the virtual environment - Delete Pipfile.lock:
$ rm Pipfile.lock
- Run
$ pipenv install
- Run
Anytime that we have passwords or sensitive information, we never ever ever want to (1) have it in our code and (2) commit it to Github. .env
files allow us to keep sensitive information in the "environment" and out of our code.
Let's say you are trying to use a PostgreSQL database from a Python script. The usual way to do this is with the psycopg2
package.
import psycopg2
conn = psycopg2.connect(
host='localhost',
database='mydb',
user='postgres',
password='StopLookingAtMyPassword!!!!!'
)
But this is not okay because now anyone who sees your code can get your login and steal all of your data. Let's use a .env
file to fix this.
-
Create a
.gitignore
file if one does not already exist and add.env
. This should ALWAYS be in the.gitignore
from the very beginning of a project and will prevent the.env
file from ever being committed to GitHub. -
Create a
.env
file in your root directory (along side thePipfile
andPipfile.lock
preferably) and paste the following text inside:USERNAME="postgres" PASSWORD="StopLookingAtMyPassword!!!!!"
-
Now, we are going to load our
USERNAME
andPASSWORD
variables into the environment with pipenv:foo@bar:~$ pipenv shell Loading .env environment variables... Launching subshell in virtual environment... . ~/repo-WcdiAtXE/bin/activate foo@bar:~$ echo $PASSWORD StopLookingAtMyPassword!!!!!
Note the "Loading .env environment variables...." below the
$ pipenv shell
command. Now all Python code that we run from this directory using$ python3 example_script.py
(and even within jupyter notebooks!) will have access to these environmental variables (detailed below). If we were to exit the pipenv environment (with$ exit
) and run$ echo $PASSWORD
again, it would be blank. -
Time to rewrite our code from above!
import os import psycopg2 conn = psycopg2.connect( host='localhost', database='mydb', user=os.environ['USERNAME'], password=os.environ['PASSWORD'] )
Note that we imported the
os
module and useos.environ
(which is a Python dictionary keyed on your environment variables) to grab our variables from the pipenv environment. Now, our code does not have any passwords!!!!
This is the human readable/editable file which is extremely similar in function to requirements.txt
[[source]]
url = "https://pypi.org/simple"
verify_ssl = true
name = "pypi"
[packages]
jupyter = "*"
pandas = "*"
sklearn = "0.23"
matplotlib = "*"
missingno = "*"
[dev-packages]
[requires]
python_version = "3.7"
This is the machine readable/editable version which is automatically created with the command pipenv install
or pipenv lock
. Below is only part of a file because the files are very long:
{
"_meta": {
"hash": {
"sha256": "8e1e467da58950511f3b0d2ff6103a5d3f07dc1e7c7a5064a1245d73ed9cb646"
},
"pipfile-spec": 6,
"requires": {
"python_version": "3.7"
},
"sources": [
{
"name": "pypi",
"url": "https://pypi.org/simple",
"verify_ssl": true
}
]
},
"default": {
"appnope": {
"hashes": [
"sha256:93aa393e9d6c54c5cd570ccadd8edad61ea0c4b9ea7a01409020c9aa019eb442",
"sha256:dd83cd4b5b460958838f6eb3000c660b1f9caf2a5b1de4264e941512f603258a"
],
"markers": "sys_platform == 'darwin' and platform_system == 'Darwin'",
"version": "==0.1.2"
},
"argon2-cffi": {
"hashes": [
"sha256:05a8ac07c7026542377e38389638a8a1e9b78f1cd8439cd7493b39f08dd75fbf",
"sha256:0bf066bc049332489bb2d75f69216416329d9dc65deee127152caeb16e5ce7d5",
"sha256:18dee20e25e4be86680b178b35ccfc5d495ebd5792cd00781548d50880fee5c5",
"sha256:392c3c2ef91d12da510cfb6f9bae52512a4552573a9e27600bdb800e05905d2b",
"sha256:57358570592c46c420300ec94f2ff3b32cbccd10d38bdc12dc6979c4a8484fbc",
"sha256:6678bb047373f52bcff02db8afab0d2a77d83bde61cfecea7c5c62e2335cb203",
"sha256:6ea92c980586931a816d61e4faf6c192b4abce89aa767ff6581e6ddc985ed003",
"sha256:77e909cc756ef81d6abb60524d259d959bab384832f0c651ed7dcb6e5ccdbb78",
"sha256:7d455c802727710e9dfa69b74ccaab04568386ca17b0ad36350b622cd34606fe",
"sha256:8a84934bd818e14a17943de8099d41160da4a336bcc699bb4c394bbb9b94bd32",
"sha256:9bee3212ba4f560af397b6d7146848c32a800652301843df06b9e8f68f0f7361",
"sha256:9dfd5197852530294ecb5795c97a823839258dfd5eb9420233c7cfedec2058f2",
"sha256:b160416adc0f012fb1f12588a5e6954889510f82f698e23ed4f4fa57f12a0647",
"sha256:ba7209b608945b889457f949cc04c8e762bed4fe3fec88ae9a6b7765ae82e496",
"sha256:cc0e028b209a5483b6846053d5fd7165f460a1f14774d79e632e75e7ae64b82b",
"sha256:d8029b2d3e4b4cea770e9e5a0104dd8fa185c1724a0f01528ae4826a6d25f97d",
"sha256:da7f0445b71db6d3a72462e04f36544b0de871289b0bc8a7cc87c0f5ec7079fa",
"sha256:e2db6e85c057c16d0bd3b4d2b04f270a7467c147381e8fd73cbbe5bc719832be"
],
"version": "==20.1.0"
},
"async-generator": {
"hashes": [
"sha256:01c7bf666359b4967d2cda0000cc2e4af16a0ae098cbffcb8472fb9e8ad6585b",
"sha256:6ebb3d106c12920aaae42ccb6f787ef5eefdcdd166ea3d628fa8476abe712144"
],
"markers": "python_version >= '3.5'",
"version": "==1.10"