Skip to content

Instantly share code, notes, and snippets.

View pkmklong's full-sized avatar

Patrick pkmklong

View GitHub Profile
@joshlk
joshlk / faster_toPandas.py
Last active September 19, 2025 16:11
PySpark faster toPandas using mapPartitions
import pandas as pd
def _map_to_pandas(rdds):
""" Needs to be here due to pickling issues """
return [pd.DataFrame(list(rdds))]
def toPandas(df, n_partitions=None):
"""
Returns the contents of `df` as a local `pandas.DataFrame` in a speedy fashion. The DataFrame is
repartitioned if `n_partitions` is passed.
@CloudCray
CloudCray / colorize_zipcodes.py
Last active August 13, 2018 03:26
Download all US Zip Code Tabulation Area shapefiles (ZCTA) with Python 3
"""
Author: Cloud Cray
Date: 2014-01-04
This is an example script to read Zip Code data from a US shapefile,
then colorize all zip codes, then save the result to a png file.
"""
import shapefile
import matplotlib.patches as patches
@erichurst
erichurst / US Zip Codes from 2013 Government Data
Created December 9, 2013 23:00
All US zip codes with their corresponding latitude and longitude coordinates. Comma delimited for your database goodness. Source: http://www.census.gov/geo/maps-data/data/gazetteer.html
This file has been truncated, but you can view the full file.
ZIP,LAT,LNG
00601,18.180555, -66.749961
00602,18.361945, -67.175597
00603,18.455183, -67.119887
00606,18.158345, -66.932911
00610,18.295366, -67.125135
00612,18.402253, -66.711397
00616,18.420412, -66.671979
00617,18.445147, -66.559696
@hofmannsven
hofmannsven / README.md
Last active October 2, 2025 20:17
Git CLI Cheatsheet
@jboner
jboner / latency.txt
Last active October 29, 2025 05:36
Latency Numbers Every Programmer Should Know
Latency Comparison Numbers (~2012)
----------------------------------
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns 14x L1 cache
Mutex lock/unlock 25 ns
Main memory reference 100 ns 20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy 3,000 ns 3 us
Send 1K bytes over 1 Gbps network 10,000 ns 10 us
Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD