Skip to content

Instantly share code, notes, and snippets.

@mrb
Last active August 29, 2015 14:01
Show Gist options
  • Star 7 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mrb/149758f776d42a35416e to your computer and use it in GitHub Desktop.
Save mrb/149758f776d42a35416e to your computer and use it in GitHub Desktop.
Code and Data for Code Climate Blog Post "Does Team Size Impact Code Quality?"

The Code

You should be able to paste the code below into RStudio to plot this data and play with it yourself.

The first scatterplot graph (thanks to Allen Goodman):

library("ggplot2") # install.packages("ggplot2")
library("RCurl")   # install.packages("RCurl")

observations <- getURL("https://gist.githubusercontent.com/mrb/ea9a2aa3f41e36f37035/raw/159380f9658e47569fd048a3c6baee858e65ce8d/gistfile1.txt")

observations <- read.csv(text = observations)

observations$GPA <- round(as.numeric(as.character(observations$GPA)), 2)

observations <- observations[which(observations$AuthorCount >  0), ]
observations <- observations[which(observations$AuthorCount < 10), ]

ggplot(observations, aes(x = AuthorCount, y = GPA)) + geom_point(shape = 1) + geom_smooth(method = lm, se = FALSE) + labs(title="GPA of Repos by Author Count") + xlab("Author Count")

The line graph (thanks to JD Maturen):

library(ggplot2)
library(RCurl)
 
observations <- getURL("https://gist.githubusercontent.com/mrb/2975281ff4e5306f2955/raw/53e35ef87c6df3e8dd366471a5638b7f7e448f75/binned_data.csv")
observations <- read.csv(text = observations)
 
observations$bucket <- factor(observations$bucket, levels=c("1+", "2+", "3+", "5+", "10+"), labels=c("1", "2", "3-4", "5-9", "10+"))

ggplot(observations, aes(gpa, color=bucket)) + geom_density(size=2) + scale_x_reverse() + labs(title="Density of GPAs per Team Size") + xlab("GPA") + ylab("Density") + guides(color=guide_legend(title="Team Size"))

The Data

Is available here in raw form:

https://gist.githubusercontent.com/mrb/ea9a2aa3f41e36f37035/raw/159380f9658e47569fd048a3c6baee858e65ce8d/gistfile1.txt

And here in the binned form:

https://gist.githubusercontent.com/mrb/2975281ff4e5306f2955/raw/53e35ef87c6df3e8dd366471a5638b7f7e448f75/binned_data.csv

enjoy! Please let us know if you do anything cool with it!

@jvns
Copy link

jvns commented May 22, 2014

Here's some more analysis I did.

Mainly I was interested in the GPA distribution (why is there such a high proportion of GPAs of 4.0?)

and how team size affects GPA. It turns out that team size affects whether your GPA is 4.0 really strongly, but if your GPA isn't 4.0, it doesn't matter so much.

http://nbviewer.ipython.org/urls/gist.githubusercontent.com/jvns/f33f96a7a3a6f833a36c/raw/1f4dd8d72d0b9878495d69ad025741168f3b2ab1/gpa_vs_team_size.ipynb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment