Last active Aug 29, 2015

Code and Data for Code Climate Blog Post "Does Team Size Impact Code Quality?"

The Code

You should be able to paste the code below into RStudio to plot this data and play with it yourself.

The first scatterplot graph (thanks to Allen Goodman):

library("ggplot2") # install.packages("ggplot2")
library("RCurl")   # install.packages("RCurl")

observations <- getURL("")

observations <- read.csv(text = observations)

observations$GPA <- round(as.numeric(as.character(observations$GPA)), 2)

observations <- observations[which(observations$AuthorCount >  0), ]
observations <- observations[which(observations$AuthorCount < 10), ]

ggplot(observations, aes(x = AuthorCount, y = GPA)) + geom_point(shape = 1) + geom_smooth(method = lm, se = FALSE) + labs(title="GPA of Repos by Author Count") + xlab("Author Count")

The line graph (thanks to JD Maturen):

observations <- getURL("")
observations <- read.csv(text = observations)
observations$bucket <- factor(observations$bucket, levels=c("1+", "2+", "3+", "5+", "10+"), labels=c("1", "2", "3-4", "5-9", "10+"))

ggplot(observations, aes(gpa, color=bucket)) + geom_density(size=2) + scale_x_reverse() + labs(title="Density of GPAs per Team Size") + xlab("GPA") + ylab("Density") + guides(color=guide_legend(title="Team Size"))

The Data

Is available here in raw form:

And here in the binned form:

enjoy! Please let us know if you do anything cool with it!

jvns commented May 22, 2014

Here's some more analysis I did.

Mainly I was interested in the GPA distribution (why is there such a high proportion of GPAs of 4.0?)

and how team size affects GPA. It turns out that team size affects whether your GPA is 4.0 really strongly, but if your GPA isn't 4.0, it doesn't matter so much.

