Skip to content

Instantly share code, notes, and snippets.

View ericmjl's full-sized avatar
🎯
Focusing

Eric Ma ericmjl

🎯
Focusing
View GitHub Profile
@ericmjl
ericmjl / ds-project-organization.md
Last active April 21, 2024 16:48
How to organize your Python data science project

UPDATE: I have baked the ideas in this file inside a Python CLI tool called pyds-cli. Please find it here: https://github.com/ericmjl/pyds-cli

How to organize your Python data science project

Having done a number of data projects over the years, and having seen a number of them up on GitHub, I've come to see that there's a wide range in terms of how "readable" a project is. I'd like to share some practices that I have come to adopt in my projects, which I hope will bring some organization to your projects.

Disclaimer: I'm hoping nobody takes this to be "the definitive guide" to organizing a data project; rather, I hope you, the reader, find useful tips that you can adapt to your own projects.

Disclaimer 2: What I’m writing below is primarily geared towards Python language users. Some ideas may be transferable to other languages; others may not be so. Please feel free to remix whatever you see here!

@ericmjl
ericmjl / prepopulated_repl.md
Last active May 1, 2022 19:44
pyscript templates and examples (HTML and Markdown)
@ericmjl
ericmjl / holoviews_datashader.py
Created August 31, 2018 18:14
Holoviews dynamic map with datashader
import datashader as ds
import holoviews as hv
from holoviews.operation.datashader import datashade
hv.extension('bokeh')
def scatter(dim1, dim2):
def _scatter(data):
return hv.Scatter(data, kdims=[dim1], vdims=[dim2], extents=(-10, -10, 10, 10))
return _scatter
@ericmjl
ericmjl / merger.py
Created June 5, 2015 16:50
A Python script for merging PDF files together.
"""
Author: Eric J. Ma
Purpose: To merge PDFs together in an automated fashion.
"""
import os
from PyPDF2 import PdfFileReader, PdfFileMerger
@ericmjl
ericmjl / september-2020-newsletter.md
Last active September 7, 2020 00:16
Data Science Programming September 2020 Newsletter

Data Science Programming September 2020 Newsletter

Hello fellow datanistas!

Welcome to the September edition of the programming-oriented data science newsletter. I hope you've all been staying safe amid the COVID-19 outbreak.

There's no special theme this month, just a smattering of cool tools and articles that I think will improve your productivity!

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@ericmjl
ericmjl / install_anaconda.sh
Created July 9, 2020 12:44
A script to install Anaconda on a new system
# Taken from https://github.com/ericmjl/dotfiles/blob/master/install_functions.sh
function install_anaconda {
bash anaconda.sh -b -p $HOME/anaconda
rm anaconda.sh
export PATH=$HOME/anaconda/bin:$PATH
# Install basic data science stack into default environment
conda install --yes pandas scipy numpy matplotlib seaborn jupyter ipykernel nodejs
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
$ docker run --gpus all -i -t jax:latest /bin/bash
(base) [docker@e697aef58065 ~]$ ls
anaconda cuda-repo-rhel8-10-2-local-10.2.89-440.33.01-1.0-1.x86_64.rpm
(base) [docker@e697aef58065 ~]$ which python
~/anaconda/bin/python
(base) [docker@e697aef58065 ~]$ conda activate mouse-hmm
(mouse-hmm) [docker@e697aef58065 ~]$ python
Python 3.7.6 | packaged by conda-forge | (default, Jun 1 2020, 18:57:50)
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
@ericmjl
ericmjl / test_d_separation.py
Last active May 30, 2020 15:14
Proposed change to d-separation tests based on pytest functions and fixtures.
@pytest.fixture
def path_graph():
"""Return a path graaph of length three."""
G = nx.path_graph(3, create_using=nx.DiGraph)
G.graph["name"] = "path"
nx.freeze(G)
return G
@pytest.fixture