Last active
April 9, 2017 18:21
-
-
Save ryanburge/13b6d2f9bf3fc9ab32c4656659044b40 to your computer and use it in GitHub Desktop.
Classroom Instructions for 3/27/2017 - Correlations and Scatterplots
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
## RUN ALL THIS SYNTAX BEFORE WE START | |
## This will install and load all the packages you need for class today. | |
ipak <- function(pkg){ | |
new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])] | |
if (length(new.pkg)) | |
install.packages(new.pkg, dependencies = TRUE) | |
sapply(pkg, require, character.only = TRUE) | |
} | |
packages <- c("ggplot2", "dplyr", "car", "highcharter", "ggcorrplot") | |
ipak(packages) | |
## STOP AND WAIT | |
## Take a look at this dataset | |
head(mtcars) | |
### What is the relationship between mpg and wt? | |
### Here's link for how to interpret correlation coefficients: http://i.imgur.com/OtwLQH6.png | |
### Now visualize that | |
+ geom_point() Will create a scatterplot of your values | |
+ geom_smooth() Will add a trend line to your values | |
+ geom_smooth(method = lm) Will make that line straight | |
### Load our height and weight data. | |
pop <- read.csv("https://raw.githubusercontent.com/ryanburge/pls2003_sp17/master/population.csv") | |
What is the relationship between height and weight in the general population? | |
### Read is some syntax, tell me what you think it does. Run each line and see what you get. | |
team <- read.csv("https://raw.githubusercontent.com/ryanburge/pls2003_sp17/master/teamstats.csv") | |
bball <- read.csv(url("https://raw.githubusercontent.com/ryanburge/pls2003_sp17/master/dplyr.csv")) | |
sal <- bball %>% group_by(team_id, year) %>% summarise(salary = mean(salary)) | |
### Now, run this line: | |
merge <- merge(team, sal, by=c("team_id", "year")) | |
## Let's answer some questions. | |
## What is the relationship between how much you pay your players and how much your team wins? | |
## Can you tell me the difference between AL and NL teams? | |
## Is there a relationship between salaries paid and total attendance? | |
## Let's say we wanted to look for a whole bunch of correlations at one time. | |
## We use the select command to create pick out some columns that could be related. | |
bbcor <- select(merge, w, l, r, ab, h, double, triple, hr, bb, so, sb, cs, attendance, salary) | |
## Then we create a new dataset that just contains correlation coefficients, rounded to one decimal place | |
cor <- round(cor(bbcor), 1) | |
## Then we plot! | |
ggcorrplot(cor, hc.order = TRUE, | |
type = "lower", | |
lab = TRUE, | |
lab_size = 3, | |
method="circle", | |
colors = c("tomato2", "white", "springgreen3"), | |
title="Correlogram of Baseball Stats", | |
ggtheme=theme_bw) | |
## What do you see? What is related? | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment