mrb/code_and_data.markdown Secret

## code_and_data.markdown

      
    Raw
  

              code_and_data.markdown
            
          
    The Code

You should be able to paste the code below into RStudio to plot this data and play with it yourself.
The first scatterplot graph (thanks to Allen Goodman):
library("ggplot2") # install.packages("ggplot2")
library("RCurl")   # install.packages("RCurl")

observations <- getURL("https://gist.githubusercontent.com/mrb/ea9a2aa3f41e36f37035/raw/159380f9658e47569fd048a3c6baee858e65ce8d/gistfile1.txt")

observations <- read.csv(text = observations)

observations$GPA <- round(as.numeric(as.character(observations$GPA)), 2)

observations <- observations[which(observations$AuthorCount >  0), ]
observations <- observations[which(observations$AuthorCount < 10), ]

ggplot(observations, aes(x = AuthorCount, y = GPA)) + geom_point(shape = 1) + geom_smooth(method = lm, se = FALSE) + labs(title="GPA of Repos by Author Count") + xlab("Author Count")

The line graph (thanks to JD Maturen):
library(ggplot2)
library(RCurl)
 
observations <- getURL("https://gist.githubusercontent.com/mrb/2975281ff4e5306f2955/raw/53e35ef87c6df3e8dd366471a5638b7f7e448f75/binned_data.csv")
observations <- read.csv(text = observations)
 
observations$bucket <- factor(observations$bucket, levels=c("1+", "2+", "3+", "5+", "10+"), labels=c("1", "2", "3-4", "5-9", "10+"))

ggplot(observations, aes(gpa, color=bucket)) + geom_density(size=2) + scale_x_reverse() + labs(title="Density of GPAs per Team Size") + xlab("GPA") + ylab("Density") + guides(color=guide_legend(title="Team Size"))

The Data

Is available here in raw form:
https://gist.githubusercontent.com/mrb/ea9a2aa3f41e36f37035/raw/159380f9658e47569fd048a3c6baee858e65ce8d/gistfile1.txt
And here in the binned form:
https://gist.githubusercontent.com/mrb/2975281ff4e5306f2955/raw/53e35ef87c6df3e8dd366471a5638b7f7e448f75/binned_data.csv
enjoy! Please let us know if you do anything cool with it!