Created
May 21, 2012 13:41
-
-
Save wbt/2762383 to your computer and use it in GitHub Desktop.
Do more active coders leave longer or shorter comments than others?
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This is a query that, when scatter-plotted, will show that more active coders | |
(as measured by those who have a higher number of pushes announced on the GitHub timeline) | |
do not necessarily leave much longer or shorter comments than others | |
(in my run, the correlation was positive and statistically significant: r=.166, df=15997, p<.0005, | |
statistically meaning that more active coders leave slightly longer comments), | |
but have lower variance in their comment length. | |
The scatter plot looks sort of like Bullet Bill from Mario brothers. | |
This is not the original query that I ran, | |
but modified to be a working query doing all the transformations | |
and data selection that I did. | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
SELECT actor, | |
LOG10(COUNT(actor)) LogCount, | |
LOG10(AVG(LENGTH(payload_commit_msg))) LogCommentSizeMean | |
FROM [githubarchive:github.timeline] | |
WHERE repository_private='false' | |
AND type ='PushEvent' | |
GROUP BY actor | |
LIMIT 16000; |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment