Skip to content

Instantly share code, notes, and snippets.

@smmaurer
Last active August 29, 2015 14:15
Show Gist options
  • Save smmaurer/fe9366d5abe7e01eca96 to your computer and use it in GitHub Desktop.
Save smmaurer/fe9366d5abe7e01eca96 to your computer and use it in GitHub Desktop.

Introduction to R for Hedonic Regression

  1. Setting the stage
  2. Basics of the language
  3. Demo in RStudio
  4. Strategies for writing code

1. Setting the stage

What is R?

  • Public-domain programming language for statistics

Why write code?

  • Precise and replicable
  • Efficient and powerful for large datasets

Comparing tools

  • There's a range of statistics tools from full programming languages to spreadsheets
  • Python -> R -> Stata -> Excel
  • General tradeoff of flexibility vs. ease of use
  • Some are public domain, some are commercial ($$)
  • Different specializations, different communities of users

2. Basics of the language

Basics

  • Calculations

3 * 4 + 6

  • Variables and assignment

a = 3 * 4 + 6
a <- 3 * 4 + 6
a<-3*4+6
b = a / 3
b = c / 3
b = b ^ 2

Functions

  • Syntax, arguments, return values

c = round(5.27)
bigNumber = exp(200)
big.number = exp(200)
bigNumber2 = exp(200)

  • Optional arguments

c = round(5.27, digits=1)
c = round(5.27, 1)

  • Nested functions

c = round(sqrt(2) + 1)

  • Side effects

print("Arrr")

Data types

  • Integer, numeric (a.k.a. real number), logical (boolean), character (string)

city = "Berkeley"
address = paste("College Ave, ", city)
print(typeof(city))

  • Vector (a.k.a. array)

vec = c(10, 20, 30, 40, 50)
vec[2]
vec[1:3]

  • Data frame = matrix with named columns

Control structures

  • If/then statement

if (rent > 1000) {
print("too expensive")
}

  • Loop

for (i in 1:10) {
print(i^2)
}
for (city in c("Berkeley", "Oakland", "San Francisco")) {
print(paste(city, " is the best city"))
}

Where do functions come from?

  • Some are built in and some are part of packages you can install
  • You can write your own

3. Demo in RStudio

Orientation

  • code window, console, data window, documentation window

Demo walkthrough

  • downloading packages
  • reading in data
  • subsetting data frames
  • descriptive statistics
  • histogram, scatter plot
  • generating new variables
  • linear regression
  • reading regression output

Pitfalls

  • avoid special characters and keywords
  • missing values
  • CSV formatting errors
  • incorrect data type inference

4. Strategies for writing code

How does this function work??

  • Search the reference documentation
  • Focus on required arguments, optional arguments, return values

What other functions exist??

  • Look through narrative documentation like the "Short Intro" document
  • Search for the concept (like "R percentile") in reference documentation or online

How to decipher code that someone else wrote

  • Trace through the logic line by line
  • Pay special attention to variable creation and to function calls

Diagnosing problems

  • Error messages vs. unexpected results
  • Most problems are caused by typos or unanticipated data values (wrong type, zero, too large, missing, etc)
  • For unexpected results, check the variable contents after each line of code to isolate the problem

Dealing with error messages

  • How to interpret them (see "Short Intro" document)
  • Find which line generates the error
  • Look for syntax mistakes on that line
  • If there are no obvious mistakes, trace backward through the logic that created each of the variables used on that line

Google, Google, Google

  • ...but pay attention to which code resource websites are helpful and which are spammy
  • Stack Overflow is excellent
  • stat.ethz.ch is official R documentation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment