Skip to content

Instantly share code, notes, and snippets.

@banditkings
Last active June 7, 2023 22:28
Show Gist options
  • Save banditkings/9f89478f34042ec85d631eb6853a582f to your computer and use it in GitHub Desktop.
Save banditkings/9f89478f34042ec85d631eb6853a582f to your computer and use it in GitHub Desktop.
Starting a new DS project with cookiecutter, pyenv, poetry, pytest, git

Project Scaffolding

Let's start a brand new project here with:

  • cookiecutter: project templating
  • pyenv: managing virtual environments for different python versions
  • poetry: dependency management
  • pytest: testing
  • git: version control

Cookiecutter

I'm using a https key auth so this worked for me instead of git@github:

# If you don't have cookiecutter yet:
# pip install cookiecutter

# if you don't have the cookiecutter yet and 
# can't cookiecutter from git, then you could
# clone the cookiecutter repo itself locally:
git clone <insert cookiecutter repo .git path here>

# Then use the following to init the new cookiecutter you just cloned:
cookiecutter cookiecutter_name

# If that doesn't work:
# python -m cookiecutter cookiecutter_name

Pyenv

Next we need to set a local pyenv version with pyenv. This creates a file .python-version in the directory such that when you navigate in there it knows to use pyenv version 3.10.6 (which aligns to the 13.0 ML cluster I have on Databricks)

# in the root directory of your project folder
pyenv local 3.10.6

Installing Python with pyenv and OSX

I noticed that some libraries (i.e. pytorch) will fail to install because they're missing some libraries. We need to add a flag during installing a python environment, to enable the CPython framework:

env PYTHON_CONFIGURE_OPTS="--enable-framework" pyenv install 3.10.11

Poetry init and add dependencies

# in the root directory of the project folder
poetry init

# add dependencies
poetry add numpy pandas scikit-learn statsmodels fastparquet

# add dev dependencies
poetry add --group=dev matplotlib plotly ipykernel pytest pydantic nbformat

Note that by default poetry kind of assumes you're not using a src folder so you may need to do some workarounds if you want to use a src folder here.

Testing with poetry:

poetry run pytest

Initialize

Create a new repo in git using the web interface.

# Within the project root directory:
git init
git add .
git commit -m "initial commit with cookiecutter"

# Replace 
git remote add origin <insert_repo_info>
git push -u origin master

Done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment