Nico Katzke Nicktz

## Installing from MRO using a specific date
install.packages("rmarkdown", repos = "https://mran.revolutionanalytics.com/snapshot/2016-01-02")

Also see: https://github.com/fairtree/production.src/wiki/R-and-Revolution-R

## mutate_each and summarise_each variation summary
See this post: http://stackoverflow.com/a/27027681/4198868

## Apply lazeyeval over multiple columns
If e.g. a mutate wants to be done on multiple columns, we could use lapply together with lazyeval:

colfilter <-
  lapply( ColumnNames, function(cols) lazyeval::interp(~Funcation(a),
                                                       .values = list(a = as.name(cols))) )
df %>% mutate_( .dots = colfilter )

## Orthogonalizing (orthonormalizing) columns r
library(far)

data <-
  data.frame(x = rnorm(30, 0, 1.5),
             y = rnorm(30, 0, 1.5),
             z = rnorm(30, 0, 1.5))

y <-
  orthonormalization(data,basis=FALSE, norm=TRUE)
# basis = TRUE squares columns.

## select dplyr colnames vector
# Simple:

select_(dataframe, .dots = VectorofNames)

## Drop Columns that have only zeros or only NA
As I answered here: http://stackoverflow.com/a/37939267/4198868

To remove all columns with only zeros:
    dfzeroremoved <- df %>% .[,colSums(. != 0) > 0]

To remove all columns with only NA:
    dfzeroremoved <- df %>% .[,colSums(!is.na(.)) > 0]

## Tester.r
test <- function(x) {
y <- x^2
y
}

## RdsFilesIdentical.r
RdsFilesIdentical <- function(RdsLocation1, RdsLocation2) {

  library(fairtreeR)
  load.packages()

  Rds1 <- read_rds(RdsLocation1)
  Rds2 <- read_rds(RdsLocation2)

  identical(Rds1, Rds2)

## Avoid Loops with Dplyr and Do
dplyr allows the user to avoid loops.

The loops are replaced by group_by() if data is well gathered into a tidy data frame.

E.g.:

  data <- data.frame(
  date = rep(c(1,2,3,4), each=25),
  Tickers = rep(c("A", "B", "C", "D")),
  Returns = rnorm(100),

## using any function in dplyr
Using, e.g., summarise_each, we can plug in ANY function and apply it to all groups!!

E.g., getting a boxplot table, by grouping according to a factor, and calculating the moments:
Data <- tbl_df ( Factors (str) | FactorValues (dbl) )

BoxMoments <-  Data %>%
      group_by(date, Universe, Factors) %>%
      summarise_each( funs( Min = min, Max = max, Mean = mean, Median = median, N = n(),
                          LowHinge = boxplot.stats(.)[[1]][2],  # Even other functions
                          UpHinge = boxplot.stats(.)[[1]][4]
	install.packages("rmarkdown", repos = "https://mran.revolutionanalytics.com/snapshot/2016-01-02")

	Also see: https://github.com/fairtree/production.src/wiki/R-and-Revolution-R
	If e.g. a mutate wants to be done on multiple columns, we could use lapply together with lazyeval:

	colfilter <-
	lapply( ColumnNames, function(cols) lazyeval::interp(~Funcation(a),
	.values = list(a = as.name(cols))) )
	df %>% mutate_( .dots = colfilter )
	library(far)

	data <-
	data.frame(x = rnorm(30, 0, 1.5),
	y = rnorm(30, 0, 1.5),
	z = rnorm(30, 0, 1.5))

	y <-
	orthonormalization(data,basis=FALSE, norm=TRUE)
	# basis = TRUE squares columns.
	As I answered here: http://stackoverflow.com/a/37939267/4198868

	To remove all columns with only zeros:
	dfzeroremoved <- df %>% .[,colSums(. != 0) > 0]

	To remove all columns with only NA:
	dfzeroremoved <- df %>% .[,colSums(!is.na(.)) > 0]
	RdsFilesIdentical <- function(RdsLocation1, RdsLocation2) {

	library(fairtreeR)
	load.packages()

	Rds1 <- read_rds(RdsLocation1)
	Rds2 <- read_rds(RdsLocation2)

	identical(Rds1, Rds2)
	dplyr allows the user to avoid loops.

	The loops are replaced by group_by() if data is well gathered into a tidy data frame.

	E.g.:

	data <- data.frame(
	date = rep(c(1,2,3,4), each=25),
	Tickers = rep(c("A", "B", "C", "D")),
	Returns = rnorm(100),
	Using, e.g., summarise_each, we can plug in ANY function and apply it to all groups!!

	E.g., getting a boxplot table, by grouping according to a factor, and calculating the moments:
	Data <- tbl_df ( Factors (str) \| FactorValues (dbl) )

	BoxMoments <- Data %>%
	group_by(date, Universe, Factors) %>%
	summarise_each( funs( Min = min, Max = max, Mean = mean, Median = median, N = n(),
	LowHinge = boxplot.stats(.)[[1]][2], # Even other functions
	UpHinge = boxplot.stats(.)[[1]][4]