Skip to content

Instantly share code, notes, and snippets.

@raphaelrk
Last active October 28, 2015 15:11
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save raphaelrk/bee57fa0731ad7de101a to your computer and use it in GitHub Desktop.
Save raphaelrk/bee57fa0731ad7de101a to your computer and use it in GitHub Desktop.
cs50 R seminar by Connor Harris notes
more info:
https://cran.r-project.org/doc/manuals/R-intro.pdf
http://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/
Types: numeric(float), character(string), logicalbool), coercion(scanf())
Vectors (1d array), matrices, high-dim arrays of above types
Lists: associate array. Vecs of lists bevahve oddle
No real pure atomic types. Single values are arrays of length one
No mixed-type arrays- otherwise it'll all become a string
Weak typing, no variable decs
assign with <-
comment with #
%% modular division
%/% integer division
Range with colon (2:5 = [2 3 4 5])
One-indexing
For loops: for(value in vector) { ... }
Function: foo <- function(args) { ... }
Vectors
constructed with c(datum_1, ..., datum_n)
args can be vectors, but array is flattened
cannot be mixed type
behave as if padded infinitely with value NA
unary functions map over arrays
binary functions applied entry by entry
access with square brackets containing one-indexed indices
Can pass vector of indices
summary()
Matrix
matrix(data, nrow=rows, ncol=columns), data is a vector, fills first up->down then left->right
multiple a %*% b
spectral decomp eigen(a)
initialize array(dim(dim_1),...,dim(dim_n))
access row(rownum) col(colnum)?
list
list(key=val,...,key=val)
access/set vals with foo$key
access individual key-val pairs with foo["key"]
nonexistent keys return NULL
Data frame
subclass of list
every value is a vector of the same length
use for representing data table
data.fram([column-name=]col1data...)
Functions
foo <- function(arg[=default],...,arg[=default])
Called as foo([arg1=]val1,...,[argn=]valn)
Don't need return: last statement by default
Args in function call don't need to be in specific order
Data import and export
read.table()
read.xls()
read.csv()
Multilinear regress
model <- lm(y ~ x1[+x2[...[+xn]...]][, dataframe])
t dep, x's indep and can be vecs or colum heads of data frame in second arg
model <- lm(y^2 + 1 ~ log(x))
summaries
cor(vec1, vec2[, method=method]) for correletions
Plotting
plot(x, y, ...)
takes to vecs of smae length
precede with attach(dataframe) to use column headers instead of separate vectors
other args
"p" for points
"l" for lines
main
xlab, ylab labels
col color
best-fit lines and local reg curves abline(regression-model) lines(lowess(xx, y))
png(filename)
FFmpeg ImageMagick for animation
/* spent time using R, copied to the bottom of this doc */
Foreign function interface
for R to call C functions
C function must take all args as pointers
For arrays this is a pointer to the first elem
floating-point type is double
void dotprod(double* vec1, double* vec2, double* out)
*out = dotprod_internal(*vec1..)
R CMD SHLIB foo.c
dyn.load("foo.so")
type coercion: as.type
returns list (assoc array) of param names and modded vals
result <- .C("dotprod", as.double(vec1), as.double(vec2), as.integer(length(vec1)), as.double(0))
product <- result$out
don't use explicit loops. use map, reduce, find, filter
reduce: pass two-param func. applies it to first two elems, then with that result and next elem..
can be used for sum
don't append to vecs:
v[length(vec)+1] <- newvalue
vec <- c(vec, newvalue)
bad because reallocation is super slow
pre-allocate vecs to necessary size
vec <- vector(length=1000)
Error Handling
easy mistakes: vector vals where single nums expected, and NULL values- funcs behave strangley, don't throw clean errors
Sanity-check: stopifnot(), like C's assert()
/**********/
/* R-time */
/**********/
R version 3.2.2 (2015-08-14) -- "Fire Safety"
Copyright (C) 2015 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin13.4.0 (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> a <- c(1,2,4)
> a
[1] 1 2 4
> 2:200
[1] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
[19] 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
[37] 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55
[55] 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73
[73] 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91
[91] 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109
[109] 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127
[127] 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145
[145] 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163
[163] 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181
[181] 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199
[199] 200
> a
[1] 1 2 4
> a+1
[1] 2 3 5
> b <- c(20,40,80
+ )
> a+b
[1] 21 42 84
> c <- 6*1:10
> c
[1] 6 12 18 24 30 36 42 48 54 60
> c <- 10*c
> a + c
[1] 61 122 184 241 302 364 421 482 544 601
Warning message:
In a + c : longer object length is not a multiple of shorter object length
> c
[1] 60 120 180 240 300 360 420 480 540 600
> summary(c)
Min. 1st Qu. Median Mean 3rd Qu. Max.
60 195 330 330 465 600
> m <- matrix(log(1:9) nrow=3, ncol=3)
Error: unexpected symbol in "m <- matrix(log(1:9) nrow"
> m <- matrix(log(1:9), nrow=3, ncol=3)
> m
[,1] [,2] [,3]
[1,] 0.0000000 1.386294 1.945910
[2,] 0.6931472 1.609438 2.079442
[3,] 1.0986123 1.791759 2.197225
> m/log(1)
[,1] [,2] [,3]
[1,] NaN Inf Inf
[2,] Inf Inf Inf
[3,] Inf Inf Inf
> m/log(10)
[,1] [,2] [,3]
[1,] 0.0000000 0.6020600 0.8450980
[2,] 0.3010300 0.6989700 0.9030900
[3,] 0.4771213 0.7781513 0.9542425
> n -< matrix(2:4, nrow=3, ncol=4)
Error: unexpected '<' in "n -<"
> n <- matrix(2:4, nrow=3, ncol=4)
> n
[,1] [,2] [,3] [,4]
[1,] 2 2 2 2
[2,] 3 3 3 3
[3,] 4 4 4 4
> m %*% n
[,1] [,2] [,3] [,4]
[1,] 11.94252 11.94252 11.94252 11.94252
[2,] 14.53237 14.53237 14.53237 14.53237
[3,] 16.36140 16.36140 16.36140 16.36140
> eigen(m)
$values
[1] 4.533528795 -0.717104893 -0.009761412
$vectors
[,1] [,2] [,3]
[1,] -0.4643796 -0.91571234 0.1622601
[2,] -0.5837295 -0.07927824 -0.8040451
[3,] -0.6660417 0.39393637 0.5719993
> vec <- eigen(m$vectors[,3]
+ )
Error in m$vectors : $ operator is invalid for atomic vectors
> vec <- eigen(m)$vectors[,3]
> vec
[1] 0.1622601 -0.8040451 0.5719993
>
> m %*% vec
[,1]
[1,] -0.001583887
[2,] 0.007848615
[3,] -0.005583521
> eigen(m)$values[3] * vec
[1] -0.001583887 0.007848615 -0.005583521
> func <- function(a,b)(a^2 + b)
> func(5,1)
[1] 26
> func(b=1,a=5)
[1] 26
> func <- function(a,b=2)(a^2 + b)
> func(4)
[1] 18
> func(4, 1)
[1] 17
> mtcars
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
> head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
> rownames(mtcars)
[1] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710"
[4] "Hornet 4 Drive" "Hornet Sportabout" "Valiant"
[7] "Duster 360" "Merc 240D" "Merc 230"
[10] "Merc 280" "Merc 280C" "Merc 450SE"
[13] "Merc 450SL" "Merc 450SLC" "Cadillac Fleetwood"
[16] "Lincoln Continental" "Chrysler Imperial" "Fiat 128"
[19] "Honda Civic" "Toyota Corolla" "Toyota Corona"
[22] "Dodge Challenger" "AMC Javelin" "Camaro Z28"
[25] "Pontiac Firebird" "Fiat X1-9" "Porsche 914-2"
[28] "Lotus Europa" "Ford Pantera L" "Ferrari Dino"
[31] "Maserati Bora" "Volvo 142E"
> colnames(mtcars)
[1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear"
[11] "carb"
> mtcars[2]
cyl
Mazda RX4 6
Mazda RX4 Wag 6
Datsun 710 4
Hornet 4 Drive 6
Hornet Sportabout 8
Valiant 6
Duster 360 8
Merc 240D 4
Merc 230 4
Merc 280 6
Merc 280C 6
Merc 450SE 8
Merc 450SL 8
Merc 450SLC 8
Cadillac Fleetwood 8
Lincoln Continental 8
Chrysler Imperial 8
Fiat 128 4
Honda Civic 4
Toyota Corolla 4
Toyota Corona 4
Dodge Challenger 8
AMC Javelin 8
Camaro Z28 8
Pontiac Firebird 8
Fiat X1-9 4
Porsche 914-2 4
Lotus Europa 4
Ford Pantera L 8
Ferrari Dino 6
Maserati Bora 8
Volvo 142E 4
> mtcars[,2]
[1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
> mtcars[2,]
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4
> c <- 2*1:10
> c
[1] 2 4 6 8 10 12 14 16 18 20
> c[2]
[1] 4
> vec <- c
> vec[c(4, 5, 7)]
[1] 8 10 14
> vec[-4]
[1] 2 4 6 10 12 14 16 18 20
> vec[2:6]
[1] 4 6 8 10 12
> attach(mtcars)
> model <- lm(mpg ~ wt, mtcars)
> summary(model)
Call:
lm(formula = mpg ~ wt, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-4.5432 -2.3647 -0.1252 1.4096 6.8727
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.2851 1.8776 19.858 < 2e-16 ***
wt -5.3445 0.5591 -9.559 1.29e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.046 on 30 degrees of freedom
Multiple R-squared: 0.7528, Adjusted R-squared: 0.7446
F-statistic: 91.38 on 1 and 30 DF, p-value: 1.294e-10
> model <- lm(mpg ~ log(wt), mtcars)
> model <- lm(mpg ~ wt, mtcars)
> model2 <- lm(mpg ~ log(wt), mtcars)
> summary(model2)
Call:
lm(formula = mpg ~ log(wt), data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-3.7440 -2.0954 -0.3672 1.0709 6.6150
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 39.257 1.758 22.32 < 2e-16 ***
log(wt) -17.086 1.510 -11.31 2.39e-12 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.669 on 30 degrees of freedom
Multiple R-squared: 0.8101, Adjusted R-squared: 0.8038
F-statistic: 128 on 1 and 30 DF, p-value: 2.391e-12
> model3 <- lm(mpg ~ log(wt) + qsec, mtcars)
> summary(model3)
Call:
lm(formula = mpg ~ log(wt) + qsec, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-4.0729 -1.3876 -0.4368 0.7493 5.4694
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 22.2967 4.4603 4.999 2.54e-05 ***
log(wt) -16.1783 1.2519 -12.923 1.47e-13 ***
qsec 0.8932 0.2224 4.016 0.000384 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.177 on 29 degrees of freedom
Multiple R-squared: 0.878, Adjusted R-squared: 0.8696
F-statistic: 104.3 on 2 and 29 DF, p-value: 5.661e-14
> plot(wt, mpg)
> help("plot")
> plot(wt, mpg, type="p")
> help("plot")
> plot(wt, mpg, type="p")
> plot(wt, mpg, main="Vehical fuel efficiency versus weight", ylab="miler per gallon", type="p")
> plot(wt, mpg, main="Vehical fuel efficiency versus weight", ylab="miler per gallon", xlab="Weight in tons", type="p")
> model <- lm(mpg ~ wt, mtcars)
> abline(model)
Error in int_abline(a = a, b = b, h = h, v = v, untf = untf, ...) :
plot.new has not been called yet
> abline(model)
Error in int_abline(a = a, b = b, h = h, v = v, untf = untf, ...) :
plot.new has not been called yet
>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment