Skip to content

Instantly share code, notes, and snippets.

@swayson
swayson / csvkit-eg.md
Last active February 8, 2024 20:06
CSVKit Examples

1. Ditch Excel (for real)

    in2csv file1.xls > file1.csv

2. Conquer fixed-width formats

    in2csv -f fixed -s schema.csv data.fixed > data.csv

3. Find cells matching a regular expression

csvgrep -c phone_number -r "\d{3}-123-\d{4}" data.csv > matching.csv

@swayson
swayson / sqlite_random_sample.sql
Created February 17, 2016 19:57
Efficient way to do random sampling in SQLite.
SELECT * FROM table
WHERE _ROWID_ >= (abs(random()) % (SELECT max(_ROWID_) FROM table))
LIMIT 1
@swayson
swayson / lsa_hack.r
Created February 26, 2014 06:56 — forked from rpietro/lsa_hack.r
Analyze Text Similarity with R: Latent Semantic Analysis and Multidimentional Scaling
# script stolen from http://goo.gl/YbQyAQ
# install.packages("tm")
# install.packages("ggplot2")
# install.packages("lsa")
# install.packages("scatterplot3d")
#install.packages("SnowballC")
#if !(require('SnowballC')) then install.packages("SnowballC")
library(tm)
library(ggplot2)
@swayson
swayson / kulback_leibler_divergence.py
Last active September 28, 2022 07:21
Numpy and scipy ways to calculate KL Divergence.
"""
Specifically, the Kullback–Leibler divergence from Q to P, denoted DKL(P‖Q), is
a measure of the information gained when one revises one's beliefs from the
prior probability distribution Q to the posterior probability distribution P. In
other words, it is the amount of information lost when Q is used to approximate
P.
"""
import numpy as np
from scipy.stats import entropy
@swayson
swayson / minmax_scaler.R
Created October 7, 2015 14:33
Min-max scaler in R
minmax_scaler <- function(x, a, b) {
"
x: data. numeric vector of values to be scaled
a: desired minimum after scaling takes place
b: desired maximum after scaling takes place
e.g. f(c(1,2,3,4), 1, 17)
[1] 1.000000 6.333333 11.666667 17.000000
"
(((b - a)*(x - min(x))) / (max(x) - min(x))) + a
@swayson
swayson / minimal_barchart.R
Created September 30, 2017 22:08
An example of a minimalistic barchart using ggplot2
library(tidyverse)
airquality %>%
group_by(Month) %>%
summarise(average_temperature = mean(Temp)) %>%
ggplot(aes(x=Month, y=average_temperature)) +
geom_bar(stat='identity', position = 'dodge') +
geom_text(aes(label=round(average_temperature, 0)), position=position_dodge(width=0.9), vjust=-0.5) +
labs(x='Month', y='Average temperature', title='Average temperature per month') +
theme_minimal() +
theme(plot.background = element_blank(),
@swayson
swayson / netrw quick reference.md
Created April 25, 2017 10:48 — forked from t-mart/netrw quick reference.md
A quick reference for Vim's built-in netrw file selector.
Map Action
<F1> Causes Netrw to issue help
<cr> Netrw will enter the directory or read the file
<del> Netrw will attempt to remove the file/directory
- Makes Netrw go up one directory
a Toggles between normal display, hiding (suppress display of files matching g:netrw_list_hide) showing (display only files which match g:netrw_list_hide)
c Make browsing directory the current directory
C Setting the editing window
d Make a directory
@swayson
swayson / xlsx_sheetnames.py
Last active April 9, 2017 07:47
Simple utility command line tool to list the sheet names of an Excel workbook.
"""
Simple utility command line tool to list the sheet names of an Excel workbook.
"""
import argparse
from xlrd import open_workbook
def get_args():
"""This function parses and return command line arguments"""
parser = argparse.ArgumentParser(description='List sheetnames in Excel workbook.')
@swayson
swayson / standard_folders.py
Last active March 12, 2017 10:10
Simple python cli application to construct directories according to specifications.
import os
import click
def read_file(filename):
with open(filename) as in_file:
for line in in_file:
if line.strip() != '':
yield line.strip()
@swayson
swayson / install.sh
Created March 11, 2017 10:44 — forked from wdullaer/install.sh
Install Latest Docker and Docker-compose on Ubuntu
# Ask for the user password
# Script only works if sudo caches the password for a few minutes
sudo true
# Install kernel extra's to enable docker aufs support
# sudo apt-get -y install linux-image-extra-$(uname -r)
# Add Docker PPA and install latest version
# sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 36A1D7869245C8950F966E92D8576A8BA88D21E9
# sudo sh -c "echo deb https://get.docker.io/ubuntu docker main > /etc/apt/sources.list.d/docker.list"