Skip to content

Instantly share code, notes, and snippets.

---
title: "Introduction to dplyr for Faster Data Manipulation in R"
output: html_document
---
Note: There is a 40-minute [video tutorial](https://www.youtube.com/watch?v=jWjqLW-u3hc) on YouTube that walks through this document in detail.
## Why do I use dplyr?
* Great for data exploration and transformation
@swayson
swayson / mount-vb.md
Last active August 29, 2015 14:17
How to mount a VirtualBox Shared Folder

Ok this was a little confusing for me but I finally realized what was happening. So I decided to give my 2 cents in hopes that it will be more clear for others and if I forget sometime in the future : ).

I was not using the name of the share I created in the VM, instead I used share or vb_share when the name of my share was wd so this had me confused for a minute.

First add your share directory in the VM Box: enter image description here

Whatever you name your share here will be the name you will need to use when mounting in the vm guest OS. i.e. I named mine "wd" for my western digital passport drive.

Next on the the guset OS make a directory to use for your mount preferably in your home directory.

@swayson
swayson / st3-project-settings.json
Created March 14, 2015 17:05
Sublime Text 3 Project configuration for Anaconda and alike
// (Project -> Edit Project)
{
"build_systems":
[
{
"name": "Anaconda Python Builder",
"selector": "source.python",
"shell_cmd": "python -u \"$file\""
}
],
library(ggplot2)
library(gtable)
# create example data
set.seed(42)
dataset_names <- c("Human", "Mouse", "Fly", "Worm")
datasets <- data.frame(name = factor(dataset_names, levels=dataset_names), parity = factor(c(0, 0, 1, 0)), v50 = runif(4, max=0.5), y=1:4)
data <- data.frame( dataset1 = rep(datasets$name, 4), dataset2 = rep(datasets$name, each = 4), z = runif(16,min = 0, max = 0.5) )
pal <- c("#dddddd", "#aaaaaa")
from multiprocessing import Pool
from PIL import Image
SIZE = (75,75)
SAVE_DIRECTORY = 'thumbs'
def get_image_paths(folder):
return (os.path.join(folder, f)
for f in os.listdir(folder)
if 'jpeg' in f)
@swayson
swayson / install-guest-additions.txt
Created March 6, 2015 20:30
Installing Guest Additions on Ubuntu
Follow these steps to install the Guest Additions on your Ubuntu virtual machine:
1. Login as ubuntu;
2. Click on Applications/System/Terminal (or on Applications/Terminal, if you are using the 606.1 Dapper Drake release);
3. Update your APT database with sudo apt-get update, and typing your password, if requested; Install the latest security updates with sudo apt-get upgrade;
4. Install required packages with sudo apt-get install build-essential module-assistant;
5. Configure your system for building kernel modules by running sudo m-a prepare;
6. Click on Install Guest Additions… from the Devices menu, then choose to browse the content of the CD when requested.
7. Run sudo sh /media/cdrom/VBoxLinuxAdditions.run, and follow the instructions on screen.
@swayson
swayson / mongodb-generator-sample
Created March 3, 2015 07:19
MongoDB Find generator
from pymongo.connection import Connection
m = Connection()
db = m.reddit
votes = db.votes
cursor = votes.find().skip(0).limit(50000)
print "Setup cursor: %s" % cursor
"""Kernel K-means"""
# Author: Mathieu Blondel <mathieu@mblondel.org>
# License: BSD 3 clause
import numpy as np
from sklearn.base import BaseEstimator, ClusterMixin
from sklearn.metrics.pairwise import pairwise_kernels
from sklearn.utils import check_random_state
@swayson
swayson / regex_url
Created October 27, 2014 17:15
Regular Expression for URL
(\b((https?://|www\.)|[a-z0-9\.-]+?\.(com|co\.uk|org|net|info|ca)(?=[/ \W\b]))[^ \t\r\n<>]*?(?=(([\'\xe2\x80\x9c".?!,:;]|&(amp|lt|gt|quot);)+?)?(\.\.+|[<>]|\s|$)))
@swayson
swayson / lsa_hack.r
Created February 26, 2014 06:56 — forked from rpietro/lsa_hack.r
Analyze Text Similarity with R: Latent Semantic Analysis and Multidimentional Scaling
# script stolen from http://goo.gl/YbQyAQ
# install.packages("tm")
# install.packages("ggplot2")
# install.packages("lsa")
# install.packages("scatterplot3d")
#install.packages("SnowballC")
#if !(require('SnowballC')) then install.packages("SnowballC")
library(tm)
library(ggplot2)