Skip to content

Instantly share code, notes, and snippets.

Last active August 29, 2015 14:01
What would you like to do?
Code and Data for Code Climate Blog Post "Does Team Size Impact Code Quality?"

The Code

You should be able to paste the code below into RStudio to plot this data and play with it yourself.

The first scatterplot graph (thanks to Allen Goodman):

library("ggplot2") # install.packages("ggplot2")
library("RCurl")   # install.packages("RCurl")

observations <- getURL("")

observations <- read.csv(text = observations)

observations$GPA <- round(as.numeric(as.character(observations$GPA)), 2)

observations <- observations[which(observations$AuthorCount >  0), ]
observations <- observations[which(observations$AuthorCount < 10), ]

ggplot(observations, aes(x = AuthorCount, y = GPA)) + geom_point(shape = 1) + geom_smooth(method = lm, se = FALSE) + labs(title="GPA of Repos by Author Count") + xlab("Author Count")

The line graph (thanks to JD Maturen):

observations <- getURL("")
observations <- read.csv(text = observations)
observations$bucket <- factor(observations$bucket, levels=c("1+", "2+", "3+", "5+", "10+"), labels=c("1", "2", "3-4", "5-9", "10+"))

ggplot(observations, aes(gpa, color=bucket)) + geom_density(size=2) + scale_x_reverse() + labs(title="Density of GPAs per Team Size") + xlab("GPA") + ylab("Density") + guides(color=guide_legend(title="Team Size"))

The Data

Is available here in raw form:

And here in the binned form:

enjoy! Please let us know if you do anything cool with it!

Copy link

jvns commented May 22, 2014

Here's some more analysis I did.

Mainly I was interested in the GPA distribution (why is there such a high proportion of GPAs of 4.0?)

and how team size affects GPA. It turns out that team size affects whether your GPA is 4.0 really strongly, but if your GPA isn't 4.0, it doesn't matter so much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment