Skip to content

Instantly share code, notes, and snippets.

@DGrady
DGrady / oracle-query.org
Last active March 21, 2024 11:57
Example of querying an Oracle database using Python, SQLAlchemy, and Pandas

Query Oracle databases with Python and SQLAlchemy

N.B. SQLAlchemy now incorporates all of this information in its documentation; I’m leaving this post here, but recommend referring to SQLAlchemy instead of these instructions.

Install requirements

  1. We’ll assume you already have SQLAlchemy and Pandas installed; these are included by default in many Python distributions.
  2. Install the cx_Oracle package in your Python environment, using either pip or conda, for example:
@DGrady
DGrady / scikit-learn-character-tokenization.ipynb
Created September 18, 2019 17:05
Demonstration of the `char_wb` tokenization strategy in scikit-learn
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@DGrady
DGrady / subprocess_filter.py
Last active December 7, 2022 01:09
Stream data asynchronously through a subprocess in Python
"""
Problem: provide two-way communication with a subprocess in Python.
See also:
- https://kevinmccarthy.org/2016/07/25/streaming-subprocess-stdin-and-stdout-with-asyncio-in-python/
- http://eli.thegreenplace.net/2017/interacting-with-a-long-running-child-process-in-python/
"""
import asyncio
import sys
@DGrady
DGrady / frequency_histogram.py
Last active November 26, 2019 19:12
Histogram based on frequency or count data
import numpy as np
import pandas as pd
def frequency_histogram(
data: pd.DataFrame,
n_bins=20,
bins=None,
log_bins=False,
normalize=False,
@DGrady
DGrady / random_seed.py
Created October 29, 2019 17:16
Randomly generate a seed for Numpy
# To reproduce a random sample, we need a fixed seed.
"{:_}".format(np.random.randint(np.iinfo(np.uint32).max))
@DGrady
DGrady / flatten_spark_schema.py
Last active October 16, 2019 16:00
Flatten a Spark DataFrame schema
"""
The schemas that Spark produces for DataFrames are typically
nested, and these nested schemas are quite difficult to work with
interactively. In many cases, it's possible to flatten a schema
into a single level of column names.
"""
import typing as T
import cytoolz.curried as tz
@DGrady
DGrady / template-xgboost.ipynb
Last active June 24, 2019 23:01
A template for XGBoost models
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@DGrady
DGrady / README.org
Last active March 11, 2019 19:10
Pretty printing delimited text files at the command line

Pretty printing delimited text files at the command line

Sometimes, you’d like to look at delimited files on the command line:

cat test.csv

@DGrady
DGrady / custom.js
Created February 21, 2016 16:59
Automatically hide the toolbar and header in Jupyter Notebook 4.1.0
// Automatically hide the toolbar and header in Jupyter Notebook 4.1.0
// This should go in ~/.jupyter/custom/custom.js
require(
['base/js/namespace', 'base/js/events'],
function(Jupyter, events) {
events.on("notebook_loaded.Notebook", function () {
Jupyter.toolbar.actions.call('jupyter-notebook:toggle-toolbar')
Jupyter.toolbar.actions.call('jupyter-notebook:toggle-header')
})
}
@DGrady
DGrady / describe_population.py
Last active August 18, 2018 04:16
Analyze data frames that contain mainly categorical (string) data
import pandas as pd
def describe_population(df: pd.DataFrame) -> pd.DataFrame:
"""
Report the populated and uniqueness counts for each column of the input.
The ratio columns are given as percents.
"""
N = len(df)