Skip to content

Instantly share code, notes, and snippets.

title: "Introduction to dplyr for Faster Data Manipulation in R"
output: html_document
Note: There is a 40-minute [video tutorial]( on YouTube that walks through this document in detail.
## Why do I use dplyr?
* Great for data exploration and transformation
title: 'Going deeper with dplyr: New features in 0.3 and 0.4'
output: html_document
## Introduction
In August 2014, I created a [40-minute video tutorial]( introducing the key functionality of the dplyr package in R, using dplyr version 0.2. Since then, there have been two significant updates to dplyr (0.3 and 0.4), introducing a ton of new features.
This document (created in March 2015) covers the most useful new features in 0.3 and 0.4, as well as other functionality that I didn't cover last time (though it is not necessarily new). My [new video tutorial]( walks through the code below in detail.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
swayson /
Created June 8, 2015 07:16
Basic snippet for searching items
def search_item(dataframe, name, query, na=False, case=False, regex=True):
idx = pd.Series([False]*len(dataframe))
# For each item in the query look for the item and collect the documents ids it pertains to
for q in query:
matches = dataframe[text_column].str.contains(q, na=False, case=False, regex=True)
swayson /
Created September 1, 2015 13:04
Simple function to select (or reorder) columns of a pandas DataFrame
def select(dataframe, columns, keep_others=True):
''' Re-order or select columns. If keep_others, then it is simply re-ordered else it will select columns'''
cols = set(dataframe.columns)
if keep_others:
others = list(cols.difference(columns))
reordered = columns + others
return dataframe[reordered]
return dataframe[columns]
swayson / convert_list_to_df.R
Last active November 4, 2015 09:33
Convert a list of items into a flat dataframe.
mylist <- list(structure(list(Hit = "True", Project = "Blue", Year = "2011",
Rating = "4", Launch = "26 Jan 2012", ID = "19", Dept = "1, 2, 4"), .Names = c("Hit",
"Project", "Year", "Rating", "Launch", "ID", "Dept")), structure(list(
Hit = "False", Error = "Record not found"), .Names = c("Hit",
"Error")), structure(list(Hit = "True", Project = "Green", Year = "2004",
Rating = "8", Launch = "29 Feb 2004", ID = "183", Dept = "6, 8"), .Names = c("Hit",
"Project", "Year", "Rating", "Launch", "ID", "Dept")))
dfs <- lapply(mylist, data.frame, stringsAsFactors = FALSE)
swayson / start_end_str.R
Created November 4, 2015 11:51
Utility functions to easily check if a strings starts or ends with a given pattern
starts_with <- function(vars, match, = TRUE) {
if ( match <- tolower(match)
n <- nchar(match)
if ( vars <- tolower(vars)
substr(vars, 1, n) == match
ends_with <- function(vars, match, = TRUE) {
if ( match <- tolower(match)
swayson /
Last active January 1, 2016 15:52
Grab metadata of starred github repos.
import pandas as pd
from github import Github
g = Github("username", "password")
final = ({'url':r.html_url , 'name':} for r in g.get_user().get_starred())
pd.DataFrame(final).to_excel('Github Stars 20160101.xlsx')
swayson /
Created January 16, 2016 13:12 — forked from baraldilorenzo/
VGG-19 pre-trained model for Keras

##VGG19 model for Keras

This is the Keras model of the 19-layer network used by the VGG team in the ILSVRC-2014 competition.

It has been obtained by directly converting the Caffe model provived by the authors.

Details about the network architecture can be found in the following arXiv paper:

Very Deep Convolutional Networks for Large-Scale Image Recognition

K. Simonyan, A. Zisserman

apt-get update
apt-get upgrade -y
apt-get install -y aria2