Skip to content

Instantly share code, notes, and snippets.

@DGrady
DGrady / random_seed.py
Created October 29, 2019 17:16
Randomly generate a seed for Numpy
# To reproduce a random sample, we need a fixed seed.
"{:_}".format(np.random.randint(np.iinfo(np.uint32).max))
@DGrady
DGrady / scikit-learn-character-tokenization.ipynb
Created September 18, 2019 17:05
Demonstration of the `char_wb` tokenization strategy in scikit-learn
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@DGrady
DGrady / template-xgboost.ipynb
Last active June 24, 2019 23:01
A template for XGBoost models
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@DGrady
DGrady / README.org
Last active March 11, 2019 19:10
Pretty printing delimited text files at the command line

Pretty printing delimited text files at the command line

Sometimes, you’d like to look at delimited files on the command line:

cat test.csv

@DGrady
DGrady / frequency_histogram.py
Last active November 26, 2019 19:12
Histogram based on frequency or count data
import numpy as np
import pandas as pd
def frequency_histogram(
data: pd.DataFrame,
n_bins=20,
bins=None,
log_bins=False,
normalize=False,
@DGrady
DGrady / find_project_dir.py
Created January 30, 2018 23:23
Python snippet to find the nearest parent directory containing .git
import cytoolz.curried as tz
from pathlib import Path
def find_project_dir(here: Path = None) -> Path:
"""
Get the path to the project directory
“Project directory” means the nearest parent directory of the
current directory that contains a `.git` directory. If there
is no such directory, returns this directory.
@DGrady
DGrady / oracle-query.org
Last active March 21, 2024 11:57
Example of querying an Oracle database using Python, SQLAlchemy, and Pandas

Query Oracle databases with Python and SQLAlchemy

N.B. SQLAlchemy now incorporates all of this information in its documentation; I’m leaving this post here, but recommend referring to SQLAlchemy instead of these instructions.

Install requirements

  1. We’ll assume you already have SQLAlchemy and Pandas installed; these are included by default in many Python distributions.
  2. Install the cx_Oracle package in your Python environment, using either pip or conda, for example:
@DGrady
DGrady / flatten_spark_schema.py
Last active October 16, 2019 16:00
Flatten a Spark DataFrame schema
"""
The schemas that Spark produces for DataFrames are typically
nested, and these nested schemas are quite difficult to work with
interactively. In many cases, it's possible to flatten a schema
into a single level of column names.
"""
import typing as T
import cytoolz.curried as tz
@DGrady
DGrady / remove_input_cells.py
Last active September 25, 2017 18:29
A Python script to remove input cells from a Jupyter-notebook-generated HTML file
"""
Remove the input cells from an HTML document generated from a Jupyter notebook
Reads from either STDIN or the named file, and writes to STDOUT
"""
import fileinput
from bs4 import BeautifulSoup
text = "".join(fileinput.input())
@DGrady
DGrady / 2017-09-23-fonts-for-nerds.org
Last active October 10, 2017 00:48
List of coding fonts

Fonts for nerds

One of the things you end up with when you spend too much time reading Hacker News is a folder of very slick monospaced fonts designed for code editors. Are any of these fonts measurably better than whatever’s already installed on your system? Nope! Here’s my list.

Spark by After the Flood

This one is kind of a gimmick, but an incredibly clever one. It translates sequences of characters like 123{30,60,90}456 into spark lines, using some fancy features of the OTF format. See also their source code repository for the project. I haven’t used this nearly enough to tell if it works well in practice, but I will now be on the constant lookout for use cases.

Consolas by Luc(as) de Groot for Microsoft