- Setting the stage
- Basics of the language
- Demo in RStudio
- Strategies for writing code
What is R?
- Public-domain programming language for statistics
Why write code?
- Precise and replicable
- Efficient and powerful for large datasets
Comparing tools
- There's a range of statistics tools from full programming languages to spreadsheets
- Python -> R -> Stata -> Excel
- General tradeoff of flexibility vs. ease of use
- Some are public domain, some are commercial ($$)
- Different specializations, different communities of users
Basics
- Calculations
3 * 4 + 6
- Variables and assignment
a = 3 * 4 + 6
a <- 3 * 4 + 6
a<-3*4+6
b = a / 3
b = c / 3
b = b ^ 2
Functions
- Syntax, arguments, return values
c = round(5.27)
bigNumber = exp(200)
big.number = exp(200)
bigNumber2 = exp(200)
- Optional arguments
c = round(5.27, digits=1)
c = round(5.27, 1)
- Nested functions
c = round(sqrt(2) + 1)
- Side effects
print("Arrr")
Data types
- Integer, numeric (a.k.a. real number), logical (boolean), character (string)
city = "Berkeley"
address = paste("College Ave, ", city)
print(typeof(city))
- Vector (a.k.a. array)
vec = c(10, 20, 30, 40, 50)
vec[2]
vec[1:3]
- Data frame = matrix with named columns
Control structures
- If/then statement
if (rent > 1000) {
print("too expensive")
}
- Loop
for (i in 1:10) {
print(i^2)
}
for (city in c("Berkeley", "Oakland", "San Francisco")) {
print(paste(city, " is the best city"))
}
Where do functions come from?
- Some are built in and some are part of packages you can install
- You can write your own
Orientation
- code window, console, data window, documentation window
Demo walkthrough
- downloading packages
- reading in data
- subsetting data frames
- descriptive statistics
- histogram, scatter plot
- generating new variables
- linear regression
- reading regression output
Pitfalls
- avoid special characters and keywords
- missing values
- CSV formatting errors
- incorrect data type inference
How does this function work??
- Search the reference documentation
- Focus on required arguments, optional arguments, return values
What other functions exist??
- Look through narrative documentation like the "Short Intro" document
- Search for the concept (like "R percentile") in reference documentation or online
How to decipher code that someone else wrote
- Trace through the logic line by line
- Pay special attention to variable creation and to function calls
Diagnosing problems
- Error messages vs. unexpected results
- Most problems are caused by typos or unanticipated data values (wrong type, zero, too large, missing, etc)
- For unexpected results, check the variable contents after each line of code to isolate the problem
Dealing with error messages
- How to interpret them (see "Short Intro" document)
- Find which line generates the error
- Look for syntax mistakes on that line
- If there are no obvious mistakes, trace backward through the logic that created each of the variables used on that line
Google, Google, Google
- ...but pay attention to which code resource websites are helpful and which are spammy
- Stack Overflow is excellent
stat.ethz.ch
is official R documentation