Skip to content

Instantly share code, notes, and snippets.

@krrishd
Last active November 7, 2015 00:22
Show Gist options
  • Save krrishd/59503c11673f06e97e9c to your computer and use it in GitHub Desktop.
Save krrishd/59503c11673f06e97e9c to your computer and use it in GitHub Desktop.
How goContribute Works

How goContribute Works

EDIT (November 6th 2015): It has actually been barely functional, if that, since the GitHub API rate limits me, disallowing me from collecting the volume of repository data that this needs to be able to serve more than one query in 10 minutes. That being said, since this probably doesn't get visited, it might end up working for you as long as no one used it within 10 minutes of you.

goContribute is my entry to the Third Annual GitHub Data Challenge.

In essence, it takes two primary factors:

  • Your language of proficiency/preference
  • Your prioritization of
    • no. of stars
    • frequency of activity
    • no. of issues
    • the size of the contributor base

and gives you the ten best repositories on GitHub for you to contribute to.

Reasoning behind the four factors of prioritization

Why did I choose the number of stars, frequency of activity, number of issues, and the size of the contributor base as the determining factors?

No. of stars

The number of stars criteria was because many people are looking to add credibility to their GitHub profiles. One way to do this successfully is to be a contributor to very popular repositories on GitHub; having played a part in the development of a library of piece of software used by hundreds, thousands, or even millions is quite the credential to have in the realm of open source.

Frequency of activity

A large part of contributing to open source is the activity around the software. It may not be worth your while to try to contribute to a repository where not much has been going on for a while. It can be beneficial to contribute to a repository where changes are constantly being pushed, which will increase the likeliness of the administrators seeing your pull request and getting your changes merged into the repository at all.

Number of issues

Rather than simply making changes that are helpful but not necessarily of priority, it's generally more productive to address existing issues in the software. In addition, the more issues that exist, the more issues that will match your skillset and proficiency.

Number of contributors

Having a large number of contributors is reflective on a repository's willingness to take outside contribution. Some open source projects are generally closed to most outsiders, whereas some tend to accept a large number of contributions from outsiders; it's more beneficial to try to contribute to the latter.

The math.

Upon using the tool, you'll find that each repository listed is given a certain composite score. Here's how I calculated them.

  1. I first took the top 100 repositories of a certain language, via the GitHub Search API.
  2. Then, I calculated the means and standard deviations of:
    • stars
    • number of issues
    • contributors
    • the difference between the current date and the last push
  3. After that, I calculated a z-score for the stars, issues, contributors, and activity of each repository. For anyone who may not recall what a z score is: http://en.wikipedia.org/wiki/Standard_score
  4. I then collected the percentages of influence the user allotted to stars, contributors, issues, and activity.
  5. With that data, I then used the following formula to get the final score of a repository:
( 
     ( percentageForStars * starsZScore ) + 
     ( percentageForIssues * issuesZScore ) + 
     ( percentageForActivity * activityZScore ) + 
     ( percentageForContributors * contributorsZScore )
) * 100
@krrishd
Copy link
Author

krrishd commented Jul 24, 2014

If anyone has any questions, comments, or suggestions, please feel free to let me know :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment