NobodyXu/R.md

## dataframe.md

      
    Raw
  

              dataframe.md
            
          
rbindlist from data.table is very efficient in binding multiple rows of a data frame.


According to here, the most efficient way to remove a column is to
library("data.table")

# set from data.table
set(my_df, j = "A", value = NULL)


## R.md

      
    Raw
  

              R.md
            
          
Semantics:


assignment:

When assigning a variable to another name, eg, a = b, a new object is created. However, no data is copied due to the copy-on-modify


In order to xor booleans, use xor(a, b).


reminder and quotient

%% for reminder and %/% for quotient.


For accessing list inside list, [[index]] must be used.


For returning a vector from a data.frame or data.table, df[[one_list_index]] must be used.


slicing:

Slicing happens when you [] a container (vector, list, etc) using more than one index, generated by seq or : or c(). The index used can be integers or charaters.
When slicing a list, a shallow copy of the subset of the original container will be created. That is, a new list will be created, but the elements in it will be just reference to the original with the copy-on-modify semantics.See here for more.
Positive integer slicing

When slicing using positive integer(s), only the elements specified by the integers will be in the new subset.


Negative integer slicing

This works the opposite way of positve integer slicing. Only the elements specified by the integers will not be present in the subset. See here for more.


subset(x, sekect) function

subset function can be used to remove column easily:
subset(df, select = -column_name_to_remove) # "column_name_to_remove" is not a character, it is just the name


Compare an array/data frame with a singel value and generate an array/data frame of same dim

Compare each element of it with the value and the result can be indexed in the same way the array/data frame can be indexed. E.g. v == value or dataframe$column_name == value.


Count TRUEs

which(x), where x is a logical vector/array, it returns an integer vector with length equal to sum(x), ie. the number of TRUEs.
sum(x) can also do a similar job, just like which.
It seems that sum(bools) is faster than length(which(bools)) when the bools is considerably long.


Def function:
name_of_function = function(arg1, arg2 = 1) {# There can be default values to arguments
     # expr
     # The return statement is not always necessary. When there is only one expr in the function, the result of it will be
     # returned atomatically by R.
     return (expr) # If expr is omitted, NULL will be returned. expr can even be a funciton


To be precise, I will call it the definition of lambda instead of normal function.


Here, function is stored variable. function can also be used inside of the definition of another function body.


It is also worth noting that a function can access the variable that is defined in the env where the function is defined.


stop:

stop is a class that can be constructed with a message and passed as function arguments. It stops the execution of the current expression and executes and error action.


for loop:
    for (each in collections) {# collections can be vector, list, data frame, matrix, etc)
        expr
    }


Speeding up your R code - vectorisation tricks for beginners shows that loops are exensive on large data compared to apply function family writen in R and the external call to C functions are even quicker.


However, this is not always true. So it is better to do benchmark and understand what is under the hood to use them correctly.


while, if, else works just like in C


switch:

switch in R is like a function. switch(VALUE, COND1_ret_value, ...).


Builtin data structures:

vector and list

vector

vector is a homologous container. Since there is only one type of elements, the elements is stored continously. vector also has lower memory consumption compared to list if length is not too large.
vector(mode = "logical", length = 0) is used to construct an length-long vector storing elements of type mode. For how elements are allocated, see help(vector).
c(...) can be used to initialize a vector. It can also be used to combine vectors, new elements of the same type to become one vector(not vector of vector).


list

list is a heterogenous container, so it stores each elements by storing a pointer to it. It is very usefull since you get make a list of list using list(...).
c(...) can be used to combine list and any other type of new elements together into one list (not list of list).
To make list of list, you need to use list(...) to combine lists.


To append to a list or  vector, you need to use list.append(.data, ...) from pacakge rlist, where .data is the container and ... is the elements.
Insert: using list.insert(.data, index, ...) from rlist.
push_front: using list.prepend(.data, ...) from rlist.


vector of logical

To perform &&, || or ! action on vector of logical: use &, | or !.


Builtin funcitons:

help(x)?x
??x

Provid manual page about x.


object.size(x)

Get the size of an aobject.


rm(x)

Delete the name x and release its release if no other names use it (due to copy-on-modify semantics).


gc()

Do garbage collection immediately. It can be usefull to call after a large object have been removed and return memory to the
operating system. GC happens automatically without any user intervention, so normally a call to gc() isn't necessary and
can hurt the performance if call it after the removal every object. For more, see help(gc) and help(gctorture)`.


help(Memory):

Documents how objects are allocated in R.


Making packages

write DESCRIPTION file at the root of the project:

Package: Helloworld
Title: What The Package Does (one line, title case required)
Version: 0.1
Author: person("First", "Last", email = "first.last@example.com",
Maintainer:
Description: What the package does (one paragraph)
Depends: R (>= 3.1.0)
License: What license is it under?
LazyData: true
ByteCompile: true
RoxygenNote: 6.1.1


Put code into root_of_pack/R/*.R.


Then run roxygenise() from package roxygen2 with current working dir at the root of the project or roxygenise(root_of_project).


The info above is from Creating R packages, the byte compiler and from running vignette("roxygen2", package = "roxygen2") (it does not need library("roxygen2") to work).


Then run R CMD check --check-subdirs=yes root_of_pack and fix any error.


Then run R CMD build root_of_pack to generate a *.tar.gz.


Run R CMD check --check-subdirs=yes *.tar.gz where *.tar.gz is generated by the previous step.


RUn R CMD INSTALL *.tar.gz to install the package.


For more info on packages, check here.