alienzj/Apply.md

## Apply.md

      
    Raw
  

              Apply.md
            
          
    R Cheat Sheet : Applying functions


R Cheat Sheet : Applying functions

apply(x,index,function)
lapply(x,function)
sapply(x,function)
tapply(x,y,function)
mapply(function,x,y,...)
References


apply(x,index,function)

Applying a function to the rows (index=1) or columns (index=2) of a matrix.
   > mat<-matrix(1:9,3,3)
   > mat
        [,1] [,2] [,3]
   [1,]    1    4    7
   [2,]    2    5    8
   [3,]    3    6    9
   > apply(mat,1,sum)
   [1] 12 15 18
   > apply(mat,2,sum)
   [1]  6 15 24
lapply(x,function)

apply a function to each element of the list x
    > x<-list(1:10)
    > lapply(x,sqrt)
    [[1]]
     [1] 1.000000 1.414214 1.732051 2.000000 2.236068 2.449490 2.645751 2.828427 3.000000 3.162278
    > class(lapply(x,sqrt))
    [1] "list"
    > x
    [[1]]
     [1]  1  2  3  4  5  6  7  8  9 10
sapply(x,function)

apply a function to each element of the list x with simplification of result
    > x<-list(1:10)
    > sapply(x,sqrt)
              [,1]
     [1,] 1.000000
     [2,] 1.414214
     [3,] 1.732051
     [4,] 2.000000
     [5,] 2.236068
     [6,] 2.449490
     [7,] 2.645751
     [8,] 2.828427
     [9,] 3.000000
    [10,] 3.162278
    > class(sapply(x,sqrt))
    [1] "matrix"
tapply(x,y,function)

Apply a function to subsets of a vector X and defined the subset by vector Y.
    > x<-1:10
    > y <-rep(c(T,F),5)
    > tapply(x, y, sum)
    FALSE  TRUE
       30    25
    > tapply(x, y, list)
    $`FALSE`
    [1]  2  4  6  8 10

    $`TRUE`
    [1] 1 3 5 7 9

    > tapply(x, y, max)
    FALSE  TRUE
       10     9
    > class(tapply(x, y, list))
    [1] "list"
    > class(tapply(x, y, max))
    [1] "array"
mapply(function,x,y,...)

Apply a function on multiple objects by elements. mapply(*,x,y) return c(x1*y1,x2*y2,x3*y3,...). By default the result is simplified.
References


http://stackoverflow.com/questions/3505701/r-grouping-functions-sapply-vs-lapply-vs-apply-vs-tapply-vs-by-vs-aggrega


## Basics.md

      
    Raw
  

              Basics.md
            
          
    R Cheat Sheet: Basics


R Cheat Sheet: Basics

Functions, conditions and loops
Datatypes
Create Data
Is it...?
Strings
Input and output
References


Functions, conditions and loops

    anExampleFunction <- function(x, ...) {
        aLocalVarable <-x
        if(!is.null(x)) return(x) else message("x is null")
        while(is.null(x)) x=1
        for (i in 0:3) x=seq(1,i)
        ifelse(x%%2==0,TRUE,FALSE)
    }
Other stuffs:

break and next do not return a value as they transfer control within the loop.
do.call(funname, args) executes a function call from the name of the function and a list of arguments to be passed to it

Datatypes


vectors: x=[1:10] (numeric), x=['aaa','bbbb'] (character) only one object type
list: Lists have elements, each of which can contain any type of R object

    > mylist<-list(x='a',y=2,z=1:10,n='Hello world')
    > mylist[1]
    $x
    [1] "a"
    > mylist[[1]]
    [1] "a"
    > mylist["z"]
    $z
    [1]  1  2  3  4  5  6  7  8  9 10
    > mylist$n
    [1] "Hello world"

matrix

   > matrix(seq(1,8),2,4)
     [,1] [,2] [,3] [,4]
   [1,]    1    3    5    7
   [2,]    2    4    6    8

dataframe

   > x<-data.frame(x = 1, y = 1:4, fac = LETTERS[1:4])
   > x
     x y fac
   1 1 1   A
   2 1 2   B
   3 1 3   C
   4 1 4   D
   > class(x$fac)
   [1] "factor"
   > x<-data.frame(x = 1, y = 1:4, fac = LETTERS[1:4],stringsAsFactors = FALSE)
   > class(x$fac)
   [1] "character"
Create Data


seq(from,to) generates a  sequence

	> seq(1,10,by=2)
	[1] 1 3 5 7 9
	> seq(1,10,length=2)
	[1]  1 10
	> seq(1,10,along=1:4)
	[1]  1  4  7 10


rep(x,n) replicate x n times

	> rep(1:3,2)
	[1] 1 2 3 1 2 3
	> rep(1:3,each=2)
	[1] 1 1 2 2 3 3


runif random unif distributed, default 0-1

> runif(5)
[1] 0.4490484 0.5588949 0.2798801 0.8900940 0.7158493

Is it...?

is.na(x), is.null(x), is.array(x), is.data.frame(x),
is.numeric(x), is.complex(x), is.character(x)
Strings


paste(...,sep=" ") concatenate vectors after converting to character;
`substr(x,start,stop)``

> substr("Hello World", 7,10)
[1] "Worl"


strsplit(x,split) split x according to split

> strsplit("Hello World",split = " ")
[[1]]
[1] "Hello" "World"


grep(pattern,x) searches for matches to pattern within x

> grep("[a-e]", letters)
 [1]  1  2  3  4  5


gsub(pattern,replacement,x) replacement of matches to pattern
sub() same as gsub but only replaces the first occurrence.
tolower(x) convert to lowercase
toupper(x) convert to uppercase
match(x,table) or x %in% table a vector of the positions of first matches for the elements of x among table

Input and output


load() load the datasets written with save
read.table(file) reads  a  file  in  table  format  and  creates  a  data
frame from it

default separator sep="" is any whitespace
header=TRUE read the first line as a header of column names
as.is=TRUE prevent character vectors from being converted to factors
skip=n to skip n lines before reading data


read.csv("filename",header=TRUE)
read.fwf(file,widths) read a table of fixed width formatted data into a ’data.frame’;

widths is an integer vector, giving the widths of the fixed-width fields


save(file,...) saves the specified objects (...)  in the XDR platform-
independent binary format

References


https://cran.r-project.org/doc/contrib/Short-refcard.pdf


## DataWrangling.md

      
    Raw
  

              DataWrangling.md
            
          
    R Cheat Sheet: Data wrangling


R Cheat Sheet: Data wrangling

Packages
Subsetting

Variables (columns)
Observations (rows)

Slicing
Filtering
Deduplicate
Sampling


Reshaping Data

Gather & Spread columns into row
Split & unitecolumn


Grouping, summarise and mutate
Merging
Piping
References


Packages

library(dplyr)
library(tidyr)

tbl_df(myDataframe)
Converts data to tbl class. tbl’s are easier to examine than
data frames displays only the data that fits onscreen.
> tbl_df(iris)
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          <dbl>       <dbl>        <dbl>       <dbl>  <fctr>
1           5.1         3.5          1.4         0.2  setosa
2           4.9         3.0          1.4         0.2  setosa
3           4.7         3.2          1.3         0.2  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa
6           5.4         3.9          1.7         0.4  setosa
7           4.6         3.4          1.4         0.3  setosa
8           5.0         3.4          1.5         0.2  setosa
9           4.4         2.9          1.4         0.2  setosa
10          4.9         3.1          1.5         0.1  setosa
... with 140 more rows

Subsetting

Variables (columns)

select (dplyr) Select columns by name or helper function
> select(iris, Sepal.Width, Petal.Length, Species)
    Sepal.Width Petal.Length    Species
1           3.5          1.4     setosa
2           3.0          1.4     setosa
3           3.2          1.3     setosa
4           3.1          1.5     setosa
5           3.6          1.4     setosa

helper:

select(iris, ends_with("Length"))
select(iris, starts_with("Sepal"))
select(iris, contains(".")) contains character
select(iris, matches(".t.")) match Regex
select(iris, num_range("x", 1:5))
select(iris, Sepal.Length:Petal.Width) range between 2 columns
select(iris, -Species) all except specified

Observations (rows)

Slicing

slice (dplyr) selects rows by position.
> slice(iris,1:5)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa

Filtering

filter (dplyr) extracts rows that meet logical criteria on given columns
> filter(iris,Sepal.Length>7.6)
  Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
1          7.7         3.8          6.7         2.2 virginica
2          7.7         2.6          6.9         2.3 virginica
3          7.7         2.8          6.7         2.0 virginica
4          7.9         3.8          6.4         2.0 virginica
5          7.7         3.0          6.1         2.3 virginica

Deduplicate

distinct (dplyr) remove duplicate rows.
> nrow(iris)
[1] 150
> nrow(distinct(iris))
[1] 149

Sampling


sample_frac(iris, 0.5, replace = TRUE) Randomly select fraction of rows.
sample_n(iris, 10, replace = TRUE) Randomly select n rows.

replace = TRUE Sample with replacement of elements in dataframe for subsequent choice.
Reshaping Data

Gather & Spread columns into row


gather (tidyr) Gather columns into rows.

convert If TRUE will automatically run type.convert on the key column. This is useful if the column names are actually numeric, integer, or logical.
factor_key If FALSE, the default, the key values will be stored as a character vector. If TRUE, will be stored as a factor, which preserves the original ordering of the columns.


spread (tidyr) Spread rows into columns.

	> test <- data.frame(Name=c("A","B","C"),M1=c(2.5,3,6),M2=c(5,6,7))
	> test
	  Name  M1 M2
	1    A 2.5  5
	2    B 3.0  6
	3    C 6.0  7
	> gather(test, Param, val, M1, M2)
	Name Param val
	1    A    M1   2.5
	2    B    M1   3.0
	3    C    M1   6.0
	4    A    M2   5.0
	5    B    M2   6.0
	6    C    M2   7.0
	> spread(gather(test,Param, val, M1,M2), Param,val)
	  Name  M1 M2
	1    A 2.5  5
	2    B 3.0  6
	3    C 6.0  7

Split & unitecolumn


separate (tidyr) Separate one column into several.
unite (tidyr) concatenate strings of several column with a sep

	> test <- data.frame(
	+ id = sprintf("x%01d.%02d", c(rep(1,2),rep(2,2),rep(3,2)),rep(1:2,3)),
	+ val= runif(6))
	> test
	     id       val
	1 x1.01 0.4516309
	2 x1.02 0.1182174
	3 x2.01 0.2386353
	4 x2.02 0.4705228
	5 x3.01 0.3523231
	6 x3.02 0.3385752
	> sep <- separate(test,id, into = c("sample","replicate"))
	  sample replicate       val
	1     x1        01 0.4516309
	2     x1        02 0.1182174
	3     x2        01 0.2386353
	4     x2        02 0.4705228
	5     x3        01 0.3523231
	6     x3        02 0.3385752
	> unite(sep,id,sample,replicate,sep = "-")
	     id       val
	1 x1-01 0.4516309
	2 x1-02 0.1182174
	3 x2-01 0.2386353
	4 x2-02 0.4705228
	5 x3-01 0.3523231
	6 x3-02 0.3385752

Grouping, summarise and mutate


mutate create a new column from others
transmute like mutate but drop old columns
summarise summarise a column with a function
summarise_each summarise all columns with a function (note use of funs mandatory)
group_by specify by which column data should be groupped

> mutate(iris, Petal.Surf=Petal.Length*Petal.Width)
    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species Petal.Surf
1            5.1         3.5          1.4         0.2     setosa       0.28
2            4.9         3.0          1.4         0.2     setosa       0.28
3            4.7         3.2          1.3         0.2     setosa       0.26
4            4.6         3.1          1.5         0.2     setosa       0.30
...

> transmute(iris, Petal.Surf=Petal.Length*Petal.Width)
    Petal.Surf
1         0.28
2         0.28
3         0.26
4         0.30
5         0.28
6         0.68
...

> summarise(iris,,avg=mean(Sepal.Length))
       avg
1 5.843333
> summarise_each(iris,funs(mean))
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1     5.843333    3.057333        3.758    1.199333      NA
Warning message:
In mean.default(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,  :
  argument is not numeric or logical: returning NA
> iris %>% group_by(Species) %>% summarise(avg=mean(Sepal.Length))
     Species   avg
1     setosa 5.006
2 versicolor 5.936
3  virginica 6.588

Merging


merge(a,b,all=, by=) merge two data frames by common columns or row names, if all=TRUE, extra rows will be added to the output, one for each row in x that has no matching row in y and reciprocally

	> authors <- data.frame(
	     surname = I(c("Tukey", "Venables", "Tierney")),
	     deceased = c(T, rep(F, 2)))
	> books <- data.frame(
	     name = I(c("Tukey", "Venables", "Tierney",
	                "Ripley",  "R Core")),
	     title = c("Exploratory Data Analysis",
	               "Modern Applied Statistics ...",
	               "LISP-STAT",
	               "Spatial Statistics",
	               "An Introduction to R"))
	> merge(authors, books, by.x = "surname", by.y = "name", all = TRUE)
	   surname deceased                         title
	1   R Core       NA          An Introduction to R
	2   Ripley       NA            Spatial Statistics
	3  Tierney    FALSE                     LISP-STAT
	4    Tukey     TRUE     Exploratory Data Analysis
	5 Venables    FALSE Modern Applied Statistics ...
	> merge(authors, books, by.x = "surname", by.y = "name", all = FALSE)
	   surname deceased                         title
	1  Tierney    FALSE                     LISP-STAT
	2    Tukey     TRUE     Exploratory Data Analysis
	3 Venables    FALSE Modern Applied Statistics ...

Piping

> x %>% f(y) # f(x, y)
> y %>% f(x, ., z) # f(x, y, z )
> iris %>%
   group_by(Species) %>%
   summarise(avg = mean(Sepal.Width)) %>%
   arrange(avg)

References


https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf


## Plot.md

      
    Raw
  

              Plot.md
            
          
http://docs.ggplot2.org/current/
https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf