OddExtension5/1. Rnotes.md

## 1. Rnotes.md

      
    Raw
  

              1. Rnotes.md
            
          
    Set the working directory

Enter in the console: setwd("directorypath")
Executing an R file


Press Run/Ctrl+Enter


Press Source/Ctrl+Shift+S  OR Press Ctrl+Shift+Enter (Source with Echo)


Run can be used to execute selected lines


Source/ Source with echo is for a whole file


Add Comments


For single line comment, insert '#' at the start of the line
Multiple line comments can be added in two ways:

Select multiple lines using cursor, then press Ctrl + Shift + C
Select multiple lines using cursor, click on "Code" in menu and select "Comment/Uncomment lines


Clear the console

Ctrl + L
Clear the environment


Single Variable: Enter in console/R Script: rm(variable)
All variables: ENter in console/R Script: rm(list=ls())

Savinf data from workspace

Workspace Data

Workspace information is temporary
Is not retained after the session

If you close the R-session
If you restart the computer


Manual Saving


Can be permanently saved in a file - save command
Can be reloaded for future sessions - load command

save(a, file="sess1.Rdata")  # to save a single variable 'a'

# to save a full workspace with specified file name
save(list=ls(all.names=TRUE), file="sess1.Rdata")

save.image() # shortcut function to save whole workspace

load(file="sess1.Rdata") # to load saved workpace
Variables and datatypes in R

Variables

Rules:

Allowed characters are Alphanumeric, _, .
Always start with alphabets
No special characters like !,@,#,$,.....
Examples: b2=7, Manoj_GDPL="Scientist", Manoj.GDPL = "Scientist"


Predefined Constants

pi, letters, LETTERS, month.name, month.abb

Basic Data Types

Logical, Integer, Numeric, Complex, Character
Find datatype of object typeof(object)
Verify if object is of a certain datatype is.data_type(object)
Coerce or convert data type of object to another as.data_type(object)
Note: Not all coercions are possible and if attempted will return "NA" as output

typeof(l) # double
typeof(("22-01-2001")) # character

is.character("21-11-2001") # TRUE
is.character(as.Date("21-11-2001")) # FALSE

as.complex(2) # 2+0i
as.numeric("a") # NA
Basic Objects


Vector : Ordered collection of same data types
List: Ordered collection of objects
Data Frame: Generic tabular object

Vectors


An ordered collection of basic data types pf given length
All the elements of a vector must be of same data type

X = c(1,2,3,4)
print(X)
Lists


A generic object consisting of an ordered collection of objects
A list could consist of a numeric vector, a logical value, a matrix, a complex vector, a character array, a function and so on

ID = c(1,2,3,4)
emp.name = c("Man", "Rag", "Sha", "Din")
num.emp = 4
emp.list = list(ID,emp.name, num.emp)
print(emp.list)

---------
OUTPUT

[[1]]
[1] 1 2 3 4

[[2]]
[1] "Man" "Rag" "Sha" "Din"

[[3]]
[1] 4
Accessing components (by names)


All the components of a list can be named
These components can be accessed using the given names

emp.list = list("Id" = ID, "Names"= emp.name, "Total staff" = num.emp)
print(emp.list$Names)

------
OUTPUT
[1] "Man" "Rag" "Sha" "Din"
Accessing components (indices)


To access top level components, use double slicing operator [[]] or [] and for lower/inner level componets use [] along with [[]]

print(emp.list[1])
print(emp.list[2])
print(emp.list[[1]][1])
print(emp.list([[2]][1])

------
OUTPUT

$Id
[1] 1 2 3 4

$Names
[1] "Man" "Rag" "Sha" "Din"

[1] 1

[1] "Man"
Manipulating Lists


A list can be modified by accessing components & replacing them

emp.list["Total staff"] = 5
emp.list[[2]][5] = "Nir"
emp.list[[1]][5] = 5
print(emp.list)

--------
OUTPUT

$Id
[1] 1 2 3 4 5

$Names
[1] "Man" "Rag" "Sha" "Din" "Nir"

$'Total Staff'
[1] 5
Concatenation of lists


Two lists can be concatenated using the concatenation function, c(list1, list2)

emp.ages = list("ages" = c(23,48,54,30,32))
emp.list = c(emp.list, emp.ages)

print(emp.list)

-------------
OUTPUT
$Id
[1] 1 2 3 4 5

$Names
[1] "Man" "Rag" "Sha" "Din" "Nir"

$'Tota; Staff'
[1] 5

$ages
[1] 23 48 54 30 32
Data Frames


Data frames are generic data objects of R, used to store tabular data

vec1 = c(1,2,3)
vec2 = c("R", "Scilab", "Java")
vec3 = c("For prototyping", "for prototyping", "For Scaleup")

df = data.frame(vec1,vec2,vec3)
print(df)

------------
OUTPUT

  vec1     vec2             vec3
1    1        R  For prototyping
2    2    Scilab For prototyping
3    3    Java   For Scaleup
Create a dataframe using data from a file


A dataframe can also be created by reading data from a file using the following command
newDF = read.table(path="Path of the file")


In the path, please use / instead **
*Example: “C:/Users/hii/Documents/R/R-Workspace/”


A separator can also be used to distinguish between entries. Default separator is space
newDf = read.table(file="path of the file", sep=' ')


Accessing rows and columns


df[val1,val2] refers to row "va1", column "val2". Can be number or string
"val1" or "val2" can also be array of values like "1:2" or "c(1,3)"
df[val2] (no commas) - just refer to column "val2" only

# accessing first & second row
print(df[1:2,])
# accessing first & second column:
print(df[,1:2])
# OR
print(df[1:2])
Subset


subset() which extracts subset of data based on conditions

pd = data.frame("Name"=c("Senthil","Senthil","Sam","Sam"), "Month"=c("Jan","Feb","Jan","Feb"),
   "BS" = c(141.2,139.3,135.2,160.1),
   "BP" = c(90,78,80,81))

pd2 = subset(pd, Name== "Senthil" | BS>150)
print("new subset pd2")
print(pd2)

----------
OUTPUT

   Name     Month     BS   BP
1   Senthil    Jan  141.2   90
2   Senthil    Feb  139.3   78
4       Sam    Feb  160.1   81
Editing dataframe


Dataframes can be edited by direct assignment

df[[2]][2] = "R"

A dataframe can also be edited using the edit() command
Create an instance of data frame and use edit command to open a table editor, change can be manually made

  myTable = data.frame()
  myTable = edit(myTable)
Adding extra rows and columns


Extra row can be added with rbind function and extra column with cbind

df = data.frame(df, data.frame(vec1=4, vec2="C", vec3="For Scale Up"))
print("adding extra row")
print(df)

df = cbind(df, vec4=c(10,20,30,40))
print("adding extra col")
print(df)
Deleting rows and columns


There are several ways to delete arow/column, some cases are shown

df2 = df[-3,-1]
df3 = df[, !names(df)%in%c("vec3")]
df4 = df[!df$vec1==3,]
Manipulating rows - the factor issue


When character columns are created in a data.frame, they become factors
Factor variables are those where the character column is split into categories or factor levels
New entries need to be consistent with factor levels which are fixed when the dataframe is first created

vec1 = c(1,2,3)
vec2 = c("R","Scilab","java")
vec3 = c("For prototyping:, "For prototyping", "For ScaleUp")
df = data.frame(vec1,vec2, vec3, stringAsFactors= F)
df[3,3] = "Others"
print(df)
Recasting and joining of dataframes

Recasting dataframes


Recasting is the process of manipulating a dataframe in terms of its variables
Reshaping the data

pd=data.frame("Name"=c("Senthil","Senthil","Sam","Sam"),
"Month"=c("Jan","Feb","Jan","Feb"),
"BS" = c(141.2,139.3,135.2,160.1),
"BP" = c(90,78,80,81))

Recast in two steps:

Melt
Cast


Identifier(Discrete type variables)
Measurements (numeric variables)
Categorical and Data variables can be not be measurements

Step 1: Melt

Call the library reshape2 using the library() command
melt(data, id, vars, measure.vars, variable.name="variable", value.name="value")

install.packages("reshape2")
library(reshape2)

Df = melt(pd, id.vars=c("Names","Month"),measure.vars=c("BS","BP"))
print(Df)

Step 2: cast

Applying the dcast() function
dcast(data, formula, value.var=col.with values)

 
 Df2 = dcast(Df, variable+month ~Name, value.var = "value")
 print(df2)
Recasting in single step


Applying the recast() function performs melt and cast in one command
**recast(data, formula,..., id.var, measure.var)

recast(pd, variable+Month~Name, id.var=c("Name","month"))

Add new variable to dataframe based on existing ones


Call the library dplyr command using the ;ibrary() command
mutate() command will add extra variable columns based on existing ones

library(dplyr)
pd2 <- mutate(pd, log_BP = log(BP))
print(pd2)

Joining of two frames


Comnining two dataframes - dplyr package


The common syntax for "dplyr" functions used to combine dataframes:

function(dataframe1, datafrme2, by = id.variable)

where : + Id.variable" is common tho both dataframes
+ This variable provides the identifier for combining the 2 dataframes
+ The nature of combination depends on the function to be used


Combining two dataframes


Call the library 'dplyr' command using the library() command


The following commands would be used to combine datasets:
left_join(),right_join(),inner_join(),full_join(),semi_join(),anti_join()


## Creating first dataframe
pd = data.frame("Name" = c("Senthil", "Senthil", "Sam", "Sam"),
"Month" = c("Jan", "Feb", "Jan", "Feb"),
"BS" = c(141.2, 139.3, 135.2, 160.1),
"BP" = c(90,78,80,81))

## creating another dataframe

pd_new = data.frame("Name" = c("Snethil", "Ramesh","Sam"),
"Department"=c("PSE","Data Analytics", "PSE"))
print(pd_new)

## left_join() --> n(A)
pd_left_join1 <- left_join(pd, pd_new, by="Name")

##right_join() ..> n(B)
pd_right_join1 <- right_join(pd,pd_new, by="Name")

##inner_join()
pd_inner_join1 <- inner_join(pd_new, pd, by="Name")
Looping over objects


apply : Apply a function over the margins of an array or matrix
lapply: Apply a function over a list or a vector
tapply: Apply a function over a ragged array
mapply: Multivariate version of lapply
xxply : (plyr package)

apply function


Applies a given function over the margins of a given array
Syntax: apply(array, margins, function,..)
Here margins refer to the dimension of the array along which the function need to be applied.

A <- matrix(1:9, 3,3)
apply(A,1,sum) # along rows
apply(A,2,sum) # along columns

lapply function


lapply is used to apply a function over a list
lapply always returns a list of the same length as the input list
Syntax: lapply(list, function, ...)

A = matrix(1:9, 3,3)
B = matrix)10:18, 3,3)
Mylist = list(A,B)
determinant = lapply(Mylist, det)
determinant
mapply function


mapply is a multivariate version of lapply
A function can be applied over several lists simultaneously
Syntax: mapply(fun, list1, list2, ..)

source('~/volcylinder.R')
dia = c(1,2,3,4)
len = c(7,4,3,2)
vol = mapply(volcylinder,dia, len)
vol

tapply function


tapply is used to apply a function over subset of vectors given by a combination of factors
Sytnax: tapply(vector, factors, function, ..)

Id = c(1,1,1,1,2,2,2,3,3)
Values = c(1,2,3,4,5,6,7,8,9)
tapply(Values, Id, sum)