Skip to content

Instantly share code, notes, and snippets.

@ryanburge
Last active April 9, 2017 18:21
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ryanburge/13b6d2f9bf3fc9ab32c4656659044b40 to your computer and use it in GitHub Desktop.
Save ryanburge/13b6d2f9bf3fc9ab32c4656659044b40 to your computer and use it in GitHub Desktop.
Classroom Instructions for 3/27/2017 - Correlations and Scatterplots
## RUN ALL THIS SYNTAX BEFORE WE START
## This will install and load all the packages you need for class today.
ipak <- function(pkg){
new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
if (length(new.pkg))
install.packages(new.pkg, dependencies = TRUE)
sapply(pkg, require, character.only = TRUE)
}
packages <- c("ggplot2", "dplyr", "car", "highcharter", "ggcorrplot")
ipak(packages)
## STOP AND WAIT
## Take a look at this dataset
head(mtcars)
### What is the relationship between mpg and wt?
### Here's link for how to interpret correlation coefficients: http://i.imgur.com/OtwLQH6.png
### Now visualize that
+ geom_point() Will create a scatterplot of your values
+ geom_smooth() Will add a trend line to your values
+ geom_smooth(method = lm) Will make that line straight
### Load our height and weight data.
pop <- read.csv("https://raw.githubusercontent.com/ryanburge/pls2003_sp17/master/population.csv")
What is the relationship between height and weight in the general population?
### Read is some syntax, tell me what you think it does. Run each line and see what you get.
team <- read.csv("https://raw.githubusercontent.com/ryanburge/pls2003_sp17/master/teamstats.csv")
bball <- read.csv(url("https://raw.githubusercontent.com/ryanburge/pls2003_sp17/master/dplyr.csv"))
sal <- bball %>% group_by(team_id, year) %>% summarise(salary = mean(salary))
### Now, run this line:
merge <- merge(team, sal, by=c("team_id", "year"))
## Let's answer some questions.
## What is the relationship between how much you pay your players and how much your team wins?
## Can you tell me the difference between AL and NL teams?
## Is there a relationship between salaries paid and total attendance?
## Let's say we wanted to look for a whole bunch of correlations at one time.
## We use the select command to create pick out some columns that could be related.
bbcor <- select(merge, w, l, r, ab, h, double, triple, hr, bb, so, sb, cs, attendance, salary)
## Then we create a new dataset that just contains correlation coefficients, rounded to one decimal place
cor <- round(cor(bbcor), 1)
## Then we plot!
ggcorrplot(cor, hc.order = TRUE,
type = "lower",
lab = TRUE,
lab_size = 3,
method="circle",
colors = c("tomato2", "white", "springgreen3"),
title="Correlogram of Baseball Stats",
ggtheme=theme_bw)
## What do you see? What is related?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment