Skip to content

Instantly share code, notes, and snippets.

@gauden
gauden / gapmind-r.R
Created March 18, 2012 19:15
Use data from Gapminder in R
# ------------------------------------------------------------------------------
# PREPARATION
# First download two files from Gapminder, where an extremely rich set of
# data sources are curated in the form of Excel spreadsheets:
# http://www.gapminder.org/data/
#
# I am here using the WHO alcohol consumption data from:
# http://spreadsheets.google.com/pub?key=0AgogXXPMARyldGJqTDRfNHBWODJMRWlZaVhNclhNZXc&output=xls
# And the World Bank GDP data from:
# http://spreadsheets.google.com/pub?key=0ArfEDsV3bBwCdHh3d1FPOVg1WXM3V2huRWc2cjM3TkE&output=xls
@gauden
gauden / geom_tile_example.R
Created May 3, 2012 22:54
Code snippet to make tile plot in ggplot2
# snippet to make a tile plot
library("ggplot2")
x <- rnorm(5000, 0, 1.5)
y <- rnorm(5000, 0, 1.5)
df <- data.frame(cbind(x,y))
p <- ggplot(df,
#!/usr/bin/env python
#
# Converts any integer into a base [BASE] number. I have chosen 62
# as it is meant to represent the integers using all the alphanumeric
# characters, [no special characters] = {0..9}, {A..Z}, {a..z}
#
# I plan on using this to shorten the representation of possibly long ids,
# a la url shortenters
#

Preparation for the Data Visualization Course

Introduction

This will be, to a great extent, a hands-on course. Participants should download and install the applications listed below in order to ensure a common operating environment.

@gauden
gauden / author_extract
Created March 7, 2013 09:25
OpenRefine Script to Extract Authors and EntrezUIDs from a PubMed CSV file
[
{
"op": "core/column-removal",
"description": "Remove column URL",
"columnName": "URL"
},
{
"op": "core/column-removal",
"description": "Remove column Details",
"columnName": "Details"
@gauden
gauden / ioff() and ion()
Created July 9, 2013 14:10
This is a demonstration of the ioff() and ion() switches in matplotlib
{
"metadata": {
"name": "Demo of ioff() and ion()"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
# https://github.com/matplotlib/matplotlib/issues/881
# Several of the ColorBrewer maps are "qualitative", meaning
# they are just a group of colors that can be used together
# for categories of data. So I remapped Accent to segments
# instead of continuous:
# Actually, these should be used with ListedColormap, and
# the number of colors should depend on the number of
# categories in the data, with colors removed from the
# list in a certain order?
# Something in lines of http://stackoverflow.com/questions/348630/how-can-i-download-all-emails-with-attachments-from-gmail
# Make sure you have IMAP enabled in your gmail settings.
# Right now it won't download same file name twice even if their contents are different.
import email
import getpass, imaplib
import os
import sys
detach_dir = '.'
@gauden
gauden / checker.sh
Last active August 29, 2015 14:13
Check for Unix Commands on the System
#!/usr/bin/env bash
curl "http://datascienceatthecommandline.com/" > source.html
< source.html scrape -b -e '//div[@class="sect3"]/h3' |
xml2json -t xml2json |
jq '.html.body.h3[]["#text"]' |
sed 's/"//g' > list.txt
command -V $(cat list.txt) |
@gauden
gauden / pad_axis_matplotlib.py
Created November 15, 2015 11:11
Convenience function to pad the axes in matplotlib
def pad_axis(ax, pct=2.0):
for dim in ('x', 'y'):
getter = ax.get_xlim if dim =='x' else ax.get_ylim
setter = ax.set_xlim if dim =='x' else ax.set_ylim
lo, hi = getter()
pad = (hi-lo) / 100.0 * pct
setter(lo-pad, hi+pad)