Enter in the console: setwd("directorypath")
-
Press Run/Ctrl+Enter
-
Press Source/Ctrl+Shift+S OR Press Ctrl+Shift+Enter (Source with Echo)
-
Run can be used to execute selected lines
-
Source/ Source with echo is for a whole file
- For single line comment, insert '#' at the start of the line
- Multiple line comments can be added in two ways:
- Select multiple lines using cursor, then press Ctrl + Shift + C
- Select multiple lines using cursor, click on "Code" in menu and select "Comment/Uncomment lines
Ctrl + L
- Single Variable: Enter in console/R Script: rm(variable)
- All variables: ENter in console/R Script: rm(list=ls())
Workspace Data
- Workspace information is temporary
- Is not retained after the session
- If you close the R-session
- If you restart the computer
- Can be permanently saved in a file - save command
- Can be reloaded for future sessions - load command
save(a, file="sess1.Rdata") # to save a single variable 'a'
# to save a full workspace with specified file name
save(list=ls(all.names=TRUE), file="sess1.Rdata")
save.image() # shortcut function to save whole workspace
load(file="sess1.Rdata") # to load saved workpace
Variables
- Rules:
- Allowed characters are Alphanumeric, _, .
- Always start with alphabets
- No special characters like !,@,#,$,.....
- Examples: b2=7, Manoj_GDPL="Scientist", Manoj.GDPL = "Scientist"
Predefined Constants
- pi, letters, LETTERS, month.name, month.abb
Basic Data Types
- Logical, Integer, Numeric, Complex, Character
- Find datatype of object typeof(object)
- Verify if object is of a certain datatype is.data_type(object)
- Coerce or convert data type of object to another as.data_type(object)
- Note: Not all coercions are possible and if attempted will return "NA" as output
typeof(l) # double
typeof(("22-01-2001")) # character
is.character("21-11-2001") # TRUE
is.character(as.Date("21-11-2001")) # FALSE
as.complex(2) # 2+0i
as.numeric("a") # NA
- Vector : Ordered collection of same data types
- List: Ordered collection of objects
- Data Frame: Generic tabular object
- An ordered collection of basic data types pf given length
- All the elements of a vector must be of same data type
X = c(1,2,3,4)
print(X)
- A generic object consisting of an ordered collection of objects
- A list could consist of a numeric vector, a logical value, a matrix, a complex vector, a character array, a function and so on
ID = c(1,2,3,4)
emp.name = c("Man", "Rag", "Sha", "Din")
num.emp = 4
emp.list = list(ID,emp.name, num.emp)
print(emp.list)
---------
OUTPUT
[[1]]
[1] 1 2 3 4
[[2]]
[1] "Man" "Rag" "Sha" "Din"
[[3]]
[1] 4
- All the components of a list can be named
- These components can be accessed using the given names
emp.list = list("Id" = ID, "Names"= emp.name, "Total staff" = num.emp)
print(emp.list$Names)
------
OUTPUT
[1] "Man" "Rag" "Sha" "Din"
- To access top level components, use double slicing operator [[]] or [] and for lower/inner level componets use [] along with [[]]
print(emp.list[1])
print(emp.list[2])
print(emp.list[[1]][1])
print(emp.list([[2]][1])
------
OUTPUT
$Id
[1] 1 2 3 4
$Names
[1] "Man" "Rag" "Sha" "Din"
[1] 1
[1] "Man"
- A list can be modified by accessing components & replacing them
emp.list["Total staff"] = 5
emp.list[[2]][5] = "Nir"
emp.list[[1]][5] = 5
print(emp.list)
--------
OUTPUT
$Id
[1] 1 2 3 4 5
$Names
[1] "Man" "Rag" "Sha" "Din" "Nir"
$'Total Staff'
[1] 5
- Two lists can be concatenated using the concatenation function, c(list1, list2)
emp.ages = list("ages" = c(23,48,54,30,32))
emp.list = c(emp.list, emp.ages)
print(emp.list)
-------------
OUTPUT
$Id
[1] 1 2 3 4 5
$Names
[1] "Man" "Rag" "Sha" "Din" "Nir"
$'Tota; Staff'
[1] 5
$ages
[1] 23 48 54 30 32
- Data frames are generic data objects of R, used to store tabular data
vec1 = c(1,2,3)
vec2 = c("R", "Scilab", "Java")
vec3 = c("For prototyping", "for prototyping", "For Scaleup")
df = data.frame(vec1,vec2,vec3)
print(df)
------------
OUTPUT
vec1 vec2 vec3
1 1 R For prototyping
2 2 Scilab For prototyping
3 3 Java For Scaleup
-
A dataframe can also be created by reading data from a file using the following command
newDF = read.table(path="Path of the file")
-
In the path, please use / instead **
*Example: “C:/Users/hii/Documents/R/R-Workspace/”
-
A separator can also be used to distinguish between entries. Default separator is space
newDf = read.table(file="path of the file", sep=' ')
- df[val1,val2] refers to row "va1", column "val2". Can be number or string
- "val1" or "val2" can also be array of values like "1:2" or "c(1,3)"
- df[val2] (no commas) - just refer to column "val2" only
# accessing first & second row
print(df[1:2,])
# accessing first & second column:
print(df[,1:2])
# OR
print(df[1:2])
- subset() which extracts subset of data based on conditions
pd = data.frame("Name"=c("Senthil","Senthil","Sam","Sam"), "Month"=c("Jan","Feb","Jan","Feb"),
"BS" = c(141.2,139.3,135.2,160.1),
"BP" = c(90,78,80,81))
pd2 = subset(pd, Name== "Senthil" | BS>150)
print("new subset pd2")
print(pd2)
----------
OUTPUT
Name Month BS BP
1 Senthil Jan 141.2 90
2 Senthil Feb 139.3 78
4 Sam Feb 160.1 81
- Dataframes can be edited by direct assignment
df[[2]][2] = "R"
- A dataframe can also be edited using the edit() command
- Create an instance of data frame and use edit command to open a table editor, change can be manually made
myTable = data.frame()
myTable = edit(myTable)
- Extra row can be added with rbind function and extra column with cbind
df = data.frame(df, data.frame(vec1=4, vec2="C", vec3="For Scale Up"))
print("adding extra row")
print(df)
df = cbind(df, vec4=c(10,20,30,40))
print("adding extra col")
print(df)
- There are several ways to delete arow/column, some cases are shown
df2 = df[-3,-1]
df3 = df[, !names(df)%in%c("vec3")]
df4 = df[!df$vec1==3,]
- When character columns are created in a data.frame, they become factors
- Factor variables are those where the character column is split into categories or factor levels
- New entries need to be consistent with factor levels which are fixed when the dataframe is first created
vec1 = c(1,2,3)
vec2 = c("R","Scilab","java")
vec3 = c("For prototyping:, "For prototyping", "For ScaleUp")
df = data.frame(vec1,vec2, vec3, stringAsFactors= F)
df[3,3] = "Others"
print(df)
- Recasting is the process of manipulating a dataframe in terms of its variables
- Reshaping the data
pd=data.frame("Name"=c("Senthil","Senthil","Sam","Sam"),
"Month"=c("Jan","Feb","Jan","Feb"),
"BS" = c(141.2,139.3,135.2,160.1),
"BP" = c(90,78,80,81))
- Recast in two steps:
- Melt
- Cast
- Identifier(Discrete type variables)
- Measurements (numeric variables)
- Categorical and Data variables can be not be measurements
Step 1: Melt
- Call the library reshape2 using the library() command
- melt(data, id, vars, measure.vars, variable.name="variable", value.name="value")
install.packages("reshape2")
library(reshape2)
Df = melt(pd, id.vars=c("Names","Month"),measure.vars=c("BS","BP"))
print(Df)
Step 2: cast
- Applying the dcast() function
- dcast(data, formula, value.var=col.with values)
Df2 = dcast(Df, variable+month ~Name, value.var = "value")
print(df2)
- Applying the recast() function performs melt and cast in one command
- **recast(data, formula,..., id.var, measure.var)
recast(pd, variable+Month~Name, id.var=c("Name","month"))
- Call the library dplyr command using the ;ibrary() command
- mutate() command will add extra variable columns based on existing ones
library(dplyr)
pd2 <- mutate(pd, log_BP = log(BP))
print(pd2)
-
Comnining two dataframes - dplyr package
-
The common syntax for "dplyr" functions used to combine dataframes:
function(dataframe1, datafrme2, by = id.variable)
where : + Id.variable" is common tho both dataframes + This variable provides the identifier for combining the 2 dataframes + The nature of combination depends on the function to be used
-
Call the library 'dplyr' command using the library() command
-
The following commands would be used to combine datasets:
left_join(),right_join(),inner_join(),full_join(),semi_join(),anti_join()
## Creating first dataframe
pd = data.frame("Name" = c("Senthil", "Senthil", "Sam", "Sam"),
"Month" = c("Jan", "Feb", "Jan", "Feb"),
"BS" = c(141.2, 139.3, 135.2, 160.1),
"BP" = c(90,78,80,81))
## creating another dataframe
pd_new = data.frame("Name" = c("Snethil", "Ramesh","Sam"),
"Department"=c("PSE","Data Analytics", "PSE"))
print(pd_new)
## left_join() --> n(A)
pd_left_join1 <- left_join(pd, pd_new, by="Name")
##right_join() ..> n(B)
pd_right_join1 <- right_join(pd,pd_new, by="Name")
##inner_join()
pd_inner_join1 <- inner_join(pd_new, pd, by="Name")
- apply : Apply a function over the margins of an array or matrix
- lapply: Apply a function over a list or a vector
- tapply: Apply a function over a ragged array
- mapply: Multivariate version of lapply
- xxply : (plyr package)
- Applies a given function over the margins of a given array
- Syntax: apply(array, margins, function,..)
- Here margins refer to the dimension of the array along which the function need to be applied.
A <- matrix(1:9, 3,3)
apply(A,1,sum) # along rows
apply(A,2,sum) # along columns
- lapply is used to apply a function over a list
- lapply always returns a list of the same length as the input list
- Syntax: lapply(list, function, ...)
A = matrix(1:9, 3,3)
B = matrix)10:18, 3,3)
Mylist = list(A,B)
determinant = lapply(Mylist, det)
determinant
- mapply is a multivariate version of lapply
- A function can be applied over several lists simultaneously
- Syntax: mapply(fun, list1, list2, ..)
source('~/volcylinder.R')
dia = c(1,2,3,4)
len = c(7,4,3,2)
vol = mapply(volcylinder,dia, len)
vol
- tapply is used to apply a function over subset of vectors given by a combination of factors
- Sytnax: tapply(vector, factors, function, ..)
Id = c(1,1,1,1,2,2,2,3,3)
Values = c(1,2,3,4,5,6,7,8,9)
tapply(Values, Id, sum)