Skip to content

Instantly share code, notes, and snippets.

@swayson
swayson / mount-vb.md
Last active August 29, 2015 14:17
How to mount a VirtualBox Shared Folder

Ok this was a little confusing for me but I finally realized what was happening. So I decided to give my 2 cents in hopes that it will be more clear for others and if I forget sometime in the future : ).

I was not using the name of the share I created in the VM, instead I used share or vb_share when the name of my share was wd so this had me confused for a minute.

First add your share directory in the VM Box: enter image description here

Whatever you name your share here will be the name you will need to use when mounting in the vm guest OS. i.e. I named mine "wd" for my western digital passport drive.

Next on the the guset OS make a directory to use for your mount preferably in your home directory.

---
title: "Introduction to dplyr for Faster Data Manipulation in R"
output: html_document
---
Note: There is a 40-minute [video tutorial](https://www.youtube.com/watch?v=jWjqLW-u3hc) on YouTube that walks through this document in detail.
## Why do I use dplyr?
* Great for data exploration and transformation
---
title: 'Going deeper with dplyr: New features in 0.3 and 0.4'
output: html_document
---
## Introduction
In August 2014, I created a [40-minute video tutorial](https://www.youtube.com/watch?v=jWjqLW-u3hc) introducing the key functionality of the dplyr package in R, using dplyr version 0.2. Since then, there have been two significant updates to dplyr (0.3 and 0.4), introducing a ton of new features.
This document (created in March 2015) covers the most useful new features in 0.3 and 0.4, as well as other functionality that I didn't cover last time (though it is not necessarily new). My [new video tutorial](https://www.youtube.com/watch?v=2mh1PqfsXVI) walks through the code below in detail.
@swayson
swayson / csvkit-eg.md
Last active February 8, 2024 20:06
CSVKit Examples

1. Ditch Excel (for real)

    in2csv file1.xls > file1.csv

2. Conquer fixed-width formats

    in2csv -f fixed -s schema.csv data.fixed > data.csv

3. Find cells matching a regular expression

csvgrep -c phone_number -r "\d{3}-123-\d{4}" data.csv > matching.csv

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@swayson
swayson / search_pandas.py
Created June 8, 2015 07:16
Basic snippet for searching items
def search_item(dataframe, name, query, na=False, case=False, regex=True):
idx = pd.Series([False]*len(dataframe))
# For each item in the query look for the item and collect the documents ids it pertains to
for q in query:
matches = dataframe[text_column].str.contains(q, na=False, case=False, regex=True)
@swayson
swayson / pandas_select.py
Created September 1, 2015 13:04
Simple function to select (or reorder) columns of a pandas DataFrame
def select(dataframe, columns, keep_others=True):
''' Re-order or select columns. If keep_others, then it is simply re-ordered else it will select columns'''
cols = set(dataframe.columns)
if keep_others:
others = list(cols.difference(columns))
reordered = columns + others
return dataframe[reordered]
else:
return dataframe[columns]
@swayson
swayson / minmax_scaler.R
Created October 7, 2015 14:33
Min-max scaler in R
minmax_scaler <- function(x, a, b) {
"
x: data. numeric vector of values to be scaled
a: desired minimum after scaling takes place
b: desired maximum after scaling takes place
e.g. f(c(1,2,3,4), 1, 17)
[1] 1.000000 6.333333 11.666667 17.000000
"
(((b - a)*(x - min(x))) / (max(x) - min(x))) + a
@swayson
swayson / convert_list_to_df.R
Last active November 4, 2015 09:33
Convert a list of items into a flat dataframe.
mylist <- list(structure(list(Hit = "True", Project = "Blue", Year = "2011",
Rating = "4", Launch = "26 Jan 2012", ID = "19", Dept = "1, 2, 4"), .Names = c("Hit",
"Project", "Year", "Rating", "Launch", "ID", "Dept")), structure(list(
Hit = "False", Error = "Record not found"), .Names = c("Hit",
"Error")), structure(list(Hit = "True", Project = "Green", Year = "2004",
Rating = "8", Launch = "29 Feb 2004", ID = "183", Dept = "6, 8"), .Names = c("Hit",
"Project", "Year", "Rating", "Launch", "ID", "Dept")))
dfs <- lapply(mylist, data.frame, stringsAsFactors = FALSE)
library(dplyr)
@swayson
swayson / start_end_str.R
Created November 4, 2015 11:51
Utility functions to easily check if a strings starts or ends with a given pattern
starts_with <- function(vars, match, ignore.case = TRUE) {
if (ignore.case) match <- tolower(match)
n <- nchar(match)
if (ignore.case) vars <- tolower(vars)
substr(vars, 1, n) == match
}
ends_with <- function(vars, match, ignore.case = TRUE) {
if (ignore.case) match <- tolower(match)