Google's R Style Guide https://google-styleguide.googlecode.com/svn/trunk/Rguide.xml
# This is a comment
# Get current working directory
> getwd()
# Set working directory
> setwd("/User/Foo/example")
# Directory doesn't exist?
> dir.create("/User/Foo/example")
# List directory
> dir()
# Read a .csv file (operates on your CWD)
> read.csv("example.csv")
# List objects
> ls() # alternatively
> objects()
# Import Source .R file
> source("example.R")
Below the Integer value 1
is assigned <-
to the value x
. Printing the value x
using the print()
method yields the result [1] 1
: a vector with 1 element which is an Integer of 1
.
> x <- 1
> print(x)
[1] 1
There are 5 basic of Atomic
object types:
- Character
- Numberic (real numbers) e.g
1
andINF
andNaN
- Integer e.g
1L
- Complex
- Logical (True, False, T, F)
The most basic a vector
. An empty vector is created using the vector()
function e.g. v <- vector("numeric", 10)
will create an empty vector (of type Numeric) with the length of 10.
The list
is a type of vector
with the exception that it may contain elements of different types.
Objects have attributes, for example:
- name, dimname (dimension-name)
- dimensions (arrays, matrices)
- class (use
class()
function to determine an object's class) - length (use
length()
function on an object to determine it's length) - other user-defined
User attributes()
function on an object to determine it's attributes
The function c()
can be used to create a vector
by concatenating objects e.g.
> x <- c(1,2,3) ## numeric
> x <- c(TRUE, FALSE) ## logicial
> x <- c(T, F) ## logical
> x <- c("a", "b", "c") ## characters
> x <- c(9:12) ## using sequence ':' keyword
> x <- c(1+0i, 2+4i) ## complex
NB Creating a vector of mixed elements, will result in the elements being (implicitly) coerced to a common type (remember: vectors can only by of a single-type) e.g.
> c(1.7, "a") ## becomes c("1.7", "a")
> c(TRUE, 2) ## becomes c(1, 2)
> c("a", TRUE) ## becomes c("a", "TRUE")
To explicitly coerce an object, us the as.*()
function e.g. as.logical(1)
will yield a value TRUE
.
Lists are created using the list(...)
function e.g. x <- list(2, "f", TRUE, 9L, 1+2i)
where
[[1]]
is 2
[[2]]
is "f"
[[3]]
is TRUE
etc...
Matrices are vectors with a dimension attribute. The dimension attribute is an integer vector of nrow
and ncol
.
> m <- matrix( nrow = 2, ncol = 3 ) ## create a 2x3 matrix
## Matrices are created _column-wise_ (popularing from [1,1])
> m <- matrix( 1:6, nrow=2, ncol=3)
> [,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
## Creating matrix from vector
> i <- 1:10
> dim(i) <- c(2,5)
> m
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
The following opertors are used to extract subsets of objects:
[
returns an object of the same type as the original. May be used to return more than one element.
[[
used to return elements of a list or data frame. It can only be used to extract one element.
$
used to extract elements of a list or data frame by name. Similar semantics are [[
. e.g. data$Ozone
.
- Use the
names()
function to list the column name. - Use the
complete.cases()
function return a subset that is void ofNA
andNaN
elements. - Data frame subsets can be extracted using indices using
[row,col]
. Indices can be missing e.gdata[1,]
returns the 1st row for the data frame/matrix, ordata[,5]
returns the 5th col.data[3,"Temp"]
returns theTemp
column of the 3rd row. - Partial matching can be applied to
[[
and$
operators e.g.dataframe$a
ordataframe[[a, exact = FALSE]]
will a subset that includes elements with a name startinga
.
> df <- read.csv("hw1_data.csv") ## read csv file
> names(df) ## returns the names of the data frame
> df[1:3,] ## subset first 3 rows
> tail(df, 4) ## subset last 4-rows of data frame
> nrow(df) ## get number of rows
> df[47,] ## returns subset row 47
> df[is.na(df$Ozone)] ## returns subset where 'Ozone' is NA
> mean(df[complete.cases(df$Ozone),]$Ozone) ## returns the Mean of Ozone (exclude NA elements)
> mean(ss[ss$Ozone > 31 & ss$Temp > 90,]$Solar.R) ## returns the mean of Solar.R where Ozone > 31 and Temp > 90
## Max Ozone for May
> may <- df[df$Month == 5]
> cleanMay <- complete.cases(may) ##remove NA Ozone elements
> may <- may[cleanMay]
> max(may$Ozone) ## returns the max Ozone for May
- load
dplyr
package ->library(dplyr)
- check package version ->
packageVersion("dplyr
) - load data into a data frame table ->
cran <- tbl_df(mydf)
dplyr
fundamental tasks:select(), filter(), arrange(), mutate()
andsummarize()
select() keeps only the vectors(columns) listed e.g select(cran, ip_id, package, country)
.
- display vectors using the range notation:
select(cran, r_arch:country)
- exclude vectors using the '-' symbol:
select(cran, -time)
or rangeselect(cran, -(r_arch:country))
# Downloading a file from the web
> download.file([fileURL], destfile=[local.path], method="curl")
# Reading local flat files
> read.csv() # or read.csv2()
# install package
> install.packages("RMySQL")
# Reading from MySQL
> hg19 <- dbConnect(MySQL(), user="genome", db="hg19", host="genome-mysql.cse.ucse.edu") # [1] 10949
> allTables <- dbListTables(hg19) # allTables[1:5]
> length(allTables) # length of tables
> result <- dbGetQuery(hg19, "show databases;") # list databases
> dbListFields(hg19, "affyU133Plus2") # list table fields
> dbGetQuery([db],[sql-query]) # dbGetQuery(hg19, "select count(*) from affyU133Plus2")
> dataframe <- dbReadTable([db],[table])
> [query] <- dbSendQuery([db],[sql-SELECT-query])
[dataframe] <- fetch([query]);
[sub dataframe] <- fetch([query],n=[numrows])
dbClearResult([query]) # clear result - mandatory
> dbDisconnectt(hg19)
# install RHD5 package
> source("http://bioconductor.org/biocLite.R")
> biocLite("fhdf5")
> library(rhdf5)
> created = h5createFile("example.h5") # created
# create groups
> created = h5createGroup("example.h5","foo")
> created = h5createGroup("example.h5","bar")
> created = h5createGroup("example.h5","foo/bar")
> h5ls("example.h5")
# write to groups
A = matrix(1:10,nr=5,nc=2) # create a matrix
h5write([matrix, [file], [group]) # e.g h5write(A, "example.h5", "foo/A")
Are you sure that "df[is.na(df.$Ozone)]" returns subset where 'Ozone' is NA? Doesn't work for me