swayson

## mount-vb.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                swayson
                / mount-vb.md
            
            
              Last active
              August 29, 2015 14:17
            
              
                How to mount a VirtualBox Shared Folder
              
          
    Ok this was a little confusing for me but I finally realized what was happening. So I decided to give my 2 cents in hopes that it will be more clear for others and if I forget sometime in the future : ).
I was not using the name of the share I created in the VM, instead I used share or vb_share when the name of my share was wd so this had me confused for a minute.
First add your share directory in the VM Box: enter image description here
Whatever you name your share here will be the name you will need to use when mounting in the vm guest OS. i.e. I named mine "wd" for my western digital passport drive.
Next on the the guset OS make a directory to use for your mount preferably in your home directory.

  
## dplyr-tut-02.rmd
---
title: "Introduction to dplyr for Faster Data Manipulation in R"
output: html_document
---

Note: There is a 40-minute [video tutorial](https://www.youtube.com/watch?v=jWjqLW-u3hc) on YouTube that walks through this document in detail.

## Why do I use dplyr?

* Great for data exploration and transformation

## dplyr-tut-02.Rmd
---
title: 'Going deeper with dplyr: New features in 0.3 and 0.4'
output: html_document
---

## Introduction

In August 2014, I created a [40-minute video tutorial](https://www.youtube.com/watch?v=jWjqLW-u3hc) introducing the key functionality of the dplyr package in R, using dplyr version 0.2. Since then, there have been two significant updates to dplyr (0.3 and 0.4), introducing a ton of new features.

This document (created in March 2015) covers the most useful new features in 0.3 and 0.4, as well as other functionality that I didn't cover last time (though it is not necessarily new). My [new video tutorial](https://www.youtube.com/watch?v=2mh1PqfsXVI) walks through the code below in detail.

## csvkit-eg.md

      
              2 files
            
          
              2 forks
            
          
              0 comments
            
          
              7 stars
            
          
                swayson
                / csvkit-eg.md
            
            
              Last active
              February 8, 2024 20:06
            
              
                CSVKit Examples
              
          
    1. Ditch Excel (for real)

    in2csv file1.xls > file1.csv

2. Conquer fixed-width formats

    in2csv -f fixed -s schema.csv data.fixed > data.csv

3. Find cells matching a regular expression

csvgrep -c phone_number -r "\d{3}-123-\d{4}" data.csv > matching.csv

  
## test.ipynb

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                swayson
                / test.ipynb
            
            
              Created
              May 26, 2015 09:23
            
              
                test
              
          
      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## search_pandas.py


def search_item(dataframe, name, query, na=False, case=False, regex=True):

    idx = pd.Series([False]*len(dataframe))

    # For each item in the query look for the item and collect the documents ids it pertains to
    for q in query:
        matches = dataframe[text_column].str.contains(q, na=False, case=False, regex=True)

## pandas_select.py
def select(dataframe, columns, keep_others=True):
    ''' Re-order or select columns. If keep_others, then it is simply re-ordered else it will select columns'''
    cols = set(dataframe.columns)
    if keep_others:
        others = list(cols.difference(columns))
        reordered = columns + others
        return dataframe[reordered]
    else:
        return dataframe[columns]

## minmax_scaler.R
minmax_scaler <- function(x, a, b) {
    "
    x: data. numeric vector of values to be scaled
    a: desired minimum after scaling takes place
    b: desired maximum after scaling takes place

    e.g. f(c(1,2,3,4), 1, 17)
    [1]  1.000000  6.333333 11.666667 17.000000
    "
    (((b - a)*(x - min(x))) / (max(x) - min(x))) + a

## convert_list_to_df.R
mylist <- list(structure(list(Hit = "True", Project = "Blue", Year = "2011",
Rating = "4", Launch = "26 Jan 2012", ID = "19", Dept = "1, 2, 4"), .Names = c("Hit",
"Project", "Year", "Rating", "Launch", "ID", "Dept")), structure(list(
Hit = "False", Error = "Record not found"), .Names = c("Hit",
"Error")), structure(list(Hit = "True", Project = "Green", Year = "2004",
Rating = "8", Launch = "29 Feb 2004", ID = "183", Dept = "6, 8"), .Names = c("Hit",
"Project", "Year", "Rating", "Launch", "ID", "Dept")))

dfs <- lapply(mylist, data.frame, stringsAsFactors = FALSE)
library(dplyr)

## start_end_str.R
starts_with <- function(vars, match, ignore.case = TRUE) {
  if (ignore.case) match <- tolower(match)
  n <- nchar(match)

  if (ignore.case) vars <- tolower(vars)
  substr(vars, 1, n) == match
}

ends_with <- function(vars, match, ignore.case = TRUE) {
  if (ignore.case) match <- tolower(match)
	---
	title: "Introduction to dplyr for Faster Data Manipulation in R"
	output: html_document
	---

	Note: There is a 40-minute [video tutorial](https://www.youtube.com/watch?v=jWjqLW-u3hc) on YouTube that walks through this document in detail.

	## Why do I use dplyr?

	* Great for data exploration and transformation
	---
	title: 'Going deeper with dplyr: New features in 0.3 and 0.4'
	output: html_document
	---

	## Introduction

	In August 2014, I created a [40-minute video tutorial](https://www.youtube.com/watch?v=jWjqLW-u3hc) introducing the key functionality of the dplyr package in R, using dplyr version 0.2. Since then, there have been two significant updates to dplyr (0.3 and 0.4), introducing a ton of new features.

	This document (created in March 2015) covers the most useful new features in 0.3 and 0.4, as well as other functionality that I didn't cover last time (though it is not necessarily new). My [new video tutorial](https://www.youtube.com/watch?v=2mh1PqfsXVI) walks through the code below in detail.



	def search_item(dataframe, name, query, na=False, case=False, regex=True):

	idx = pd.Series([False]*len(dataframe))

	# For each item in the query look for the item and collect the documents ids it pertains to
	for q in query:
	matches = dataframe[text_column].str.contains(q, na=False, case=False, regex=True)
	def select(dataframe, columns, keep_others=True):
	''' Re-order or select columns. If keep_others, then it is simply re-ordered else it will select columns'''
	cols = set(dataframe.columns)
	if keep_others:
	others = list(cols.difference(columns))
	reordered = columns + others
	return dataframe[reordered]
	else:
	return dataframe[columns]
	minmax_scaler <- function(x, a, b) {
	"
	x: data. numeric vector of values to be scaled
	a: desired minimum after scaling takes place
	b: desired maximum after scaling takes place

	e.g. f(c(1,2,3,4), 1, 17)
	[1] 1.000000 6.333333 11.666667 17.000000
	"
	(((b - a)*(x - min(x))) / (max(x) - min(x))) + a
	mylist <- list(structure(list(Hit = "True", Project = "Blue", Year = "2011",
	Rating = "4", Launch = "26 Jan 2012", ID = "19", Dept = "1, 2, 4"), .Names = c("Hit",
	"Project", "Year", "Rating", "Launch", "ID", "Dept")), structure(list(
	Hit = "False", Error = "Record not found"), .Names = c("Hit",
	"Error")), structure(list(Hit = "True", Project = "Green", Year = "2004",
	Rating = "8", Launch = "29 Feb 2004", ID = "183", Dept = "6, 8"), .Names = c("Hit",
	"Project", "Year", "Rating", "Launch", "ID", "Dept")))

	dfs <- lapply(mylist, data.frame, stringsAsFactors = FALSE)
	library(dplyr)
	starts_with <- function(vars, match, ignore.case = TRUE) {
	if (ignore.case) match <- tolower(match)
	n <- nchar(match)

	if (ignore.case) vars <- tolower(vars)
	substr(vars, 1, n) == match
	}

	ends_with <- function(vars, match, ignore.case = TRUE) {
	if (ignore.case) match <- tolower(match)