EDIT (November 6th 2015): It has actually been barely functional, if that, since the GitHub API rate limits me, disallowing me from collecting the volume of repository data that this needs to be able to serve more than one query in 10 minutes. That being said, since this probably doesn't get visited, it might end up working for you as long as no one used it within 10 minutes of you.
goContribute is my entry to the Third Annual GitHub Data Challenge.
In essence, it takes two primary factors:
- Your language of proficiency/preference
- Your prioritization of
- no. of stars
- frequency of activity
- no. of issues
- the size of the contributor base
and gives you the ten best repositories on GitHub for you to contribute to.
Why did I choose the number of stars, frequency of activity, number of issues, and the size of the contributor base as the determining factors?
The number of stars criteria was because many people are looking to add credibility to their GitHub profiles. One way to do this successfully is to be a contributor to very popular repositories on GitHub; having played a part in the development of a library of piece of software used by hundreds, thousands, or even millions is quite the credential to have in the realm of open source.
A large part of contributing to open source is the activity around the software. It may not be worth your while to try to contribute to a repository where not much has been going on for a while. It can be beneficial to contribute to a repository where changes are constantly being pushed, which will increase the likeliness of the administrators seeing your pull request and getting your changes merged into the repository at all.
Rather than simply making changes that are helpful but not necessarily of priority, it's generally more productive to address existing issues in the software. In addition, the more issues that exist, the more issues that will match your skillset and proficiency.
Having a large number of contributors is reflective on a repository's willingness to take outside contribution. Some open source projects are generally closed to most outsiders, whereas some tend to accept a large number of contributions from outsiders; it's more beneficial to try to contribute to the latter.
Upon using the tool, you'll find that each repository listed is given a certain composite score. Here's how I calculated them.
- I first took the top 100 repositories of a certain language, via the GitHub Search API.
- Then, I calculated the means and standard deviations of:
- stars
- number of issues
- contributors
- the difference between the current date and the last push
- After that, I calculated a z-score for the stars, issues, contributors, and activity of each repository. For anyone who may not recall what a z score is: http://en.wikipedia.org/wiki/Standard_score
- I then collected the percentages of influence the user allotted to stars, contributors, issues, and activity.
- With that data, I then used the following formula to get the final score of a repository:
(
( percentageForStars * starsZScore ) +
( percentageForIssues * issuesZScore ) +
( percentageForActivity * activityZScore ) +
( percentageForContributors * contributorsZScore )
) * 100
If anyone has any questions, comments, or suggestions, please feel free to let me know :)