Skip to content

Instantly share code, notes, and snippets.

View ericmjl's full-sized avatar
🎯
Focusing

Eric Ma ericmjl

🎯
Focusing
View GitHub Profile
@ericmjl
ericmjl / install_tmux.sh
Created June 1, 2018 12:33
A script to install Tmux on systems that you don't have root access
#!/bin/bash
# Script for installing tmux on systems where you don't have root access.
# tmux will be installed in $HOME/local/bin.
# It's assumed that wget and a C/C++ compiler are installed.
# exit on error
set -e
TMUX_VERSION=2.6
@ericmjl
ericmjl / ds-project-organization.md
Last active April 21, 2024 16:48
How to organize your Python data science project

UPDATE: I have baked the ideas in this file inside a Python CLI tool called pyds-cli. Please find it here: https://github.com/ericmjl/pyds-cli

How to organize your Python data science project

Having done a number of data projects over the years, and having seen a number of them up on GitHub, I've come to see that there's a wide range in terms of how "readable" a project is. I'd like to share some practices that I have come to adopt in my projects, which I hope will bring some organization to your projects.

Disclaimer: I'm hoping nobody takes this to be "the definitive guide" to organizing a data project; rather, I hope you, the reader, find useful tips that you can adapt to your own projects.

Disclaimer 2: What I’m writing below is primarily geared towards Python language users. Some ideas may be transferable to other languages; others may not be so. Please feel free to remix whatever you see here!

@ericmjl
ericmjl / random_scalar_graph.py
Created August 1, 2018 21:55
Generate lots of random graphs with scalar features on each node.
import networkx as nx
import numpy as np
def generate_graph():
num_nodes = np.random.randint(low=3, high=20)
G = nx.erdos_renyi_graph(n=num_nodes, p=0.3)
for n in G.nodes():
value = np.random.randint(low=1, high=20)
G.node[n]['value'] = value
return G
@ericmjl
ericmjl / create_dir.py
Last active August 13, 2018 13:26
Python: create directory if it doesn't exist, using pathlib!
from pathlib import Path
import os
# We will use the example of creating a .directory under home.
home = Path.home()
dirname = home / '.dir'
if not dirname.exists():
os.mkdir(dirname)
@ericmjl
ericmjl / variance_explained.py
Last active August 24, 2018 22:22
Difference between the my own implementation of explained variance and scikit-learn's
from sklearn.metrics import explained_variance_score
def var_explained(preds, actual):
"""
Implementation taken directly from the formula on this page:
http://scikit-learn.org/stable/modules/model_evaluation.html#explained-variance-score
"""
return 1 - ((preds - actual).var() / actual.var())
y_pred = np.array([3, -0.5, 2, 7])
@ericmjl
ericmjl / holoviews_datashader.py
Created August 31, 2018 18:14
Holoviews dynamic map with datashader
import datashader as ds
import holoviews as hv
from holoviews.operation.datashader import datashade
hv.extension('bokeh')
def scatter(dim1, dim2):
def _scatter(data):
return hv.Scatter(data, kdims=[dim1], vdims=[dim2], extents=(-10, -10, 10, 10))
return _scatter
@ericmjl
ericmjl / gp-test.ipynb
Created December 11, 2018 22:59
Doing GPs in numpy!
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@ericmjl
ericmjl / environment.yml
Created December 12, 2018 17:13
DL introductory hands-on workshop specfile
name: dl-workshop
channels:
- defaults
- conda-forge
- ericmjl
dependencies:
- python=3.7
- jupyter
- jupyterlab
- conda
@ericmjl
ericmjl / gp-test.ipynb
Created December 13, 2018 05:35
Extending GPs to 2 dimensions
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.