Skip to content

Instantly share code, notes, and snippets.

View lpillmann's full-sized avatar

Lui Pillmann lpillmann

View GitHub Profile
@lpillmann
lpillmann / read_parquet.py
Last active November 16, 2023 05:52
Read partitioned parquet files into pandas DataFrame from Google Cloud Storage using PyArrow
import gcsfs
import pyarrow
def read_parquet(gs_directory_path, to_pandas=True):
"""
Reads multiple (partitioned) parquet files from a GS directory
e.g. 'gs://<bucket>/<directory>' (without ending /)
"""
gs = gcsfs.GCSFileSystem()
arrow_df = pyarrow.parquet.ParquetDataset(gs_directory_path, filesystem=gs)
@lpillmann
lpillmann / python_configs.sh
Created June 13, 2020 18:03
Python useful configs and recipes
# Use ipdb as default debugger (you need to install it with conda install ipdb)
export PYTHONBREAKPOINT=ipdb.set_trace
@lpillmann
lpillmann / matplotlib-readable-jupyter.py
Last active July 13, 2020 20:56
Custom Matplotlib setup to make charts readable, to use with pandas
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
# Heuristics to make labels always readable, for dpi = 100 (default)
fig_w, fig_h = 10, 6 # Figure dimensions (in inches)
font_size = fig_w + fig_h # Font size (in pts.)
matplotlib.rcParams.update({
@lpillmann
lpillmann / altair_theme_readable.py
Last active September 26, 2022 22:07
Custom Altair theme to enlarge font size and make it comfortably readable
import altair as alt
# Custom theme for readability
def readable():
return {
"config" : {
"title": {'fontSize': 16},
"axis": {
"labelFontSize": 14,
"titleFontSize": 14,
#!/usr/bin/env python
'''Crop an image to just the portions containing text.
Usage:
./crop_morphology.py path/to/image.jpg
This will place the cropped image in path/to/image.crop.png.
For details on the methodology, see