Skip to content

Instantly share code, notes, and snippets.

@wbt
Created May 21, 2012 13:41
Show Gist options
  • Save wbt/2762383 to your computer and use it in GitHub Desktop.
Save wbt/2762383 to your computer and use it in GitHub Desktop.
Do more active coders leave longer or shorter comments than others?
This is a query that, when scatter-plotted, will show that more active coders
(as measured by those who have a higher number of pushes announced on the GitHub timeline)
do not necessarily leave much longer or shorter comments than others
(in my run, the correlation was positive and statistically significant: r=.166, df=15997, p<.0005,
statistically meaning that more active coders leave slightly longer comments),
but have lower variance in their comment length.
The scatter plot looks sort of like Bullet Bill from Mario brothers.
This is not the original query that I ran,
but modified to be a working query doing all the transformations
and data selection that I did.
SELECT actor,
LOG10(COUNT(actor)) LogCount,
LOG10(AVG(LENGTH(payload_commit_msg))) LogCommentSizeMean
FROM [githubarchive:github.timeline]
WHERE repository_private='false'
AND type ='PushEvent'
GROUP BY actor
LIMIT 16000;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment