Skip to content

Instantly share code, notes, and snippets.

@briandoll
Created April 30, 2012 17:12
Show Gist options
  • Save briandoll/e0637fff9c8eec988528 to your computer and use it in GitHub Desktop.
Save briandoll/e0637fff9c8eec988528 to your computer and use it in GitHub Desktop.

Dataset: Programming Language Correlations

This dataset explores the relationships between programming languages.

Example: How likely is it that a programmer who writes in Objective-C also programs in Java? (31%)

How is the data collected?

GitHub identifies the programming languages used in each repository as well as discerning what the primary programming language is. Active GitHub.com users have a list of programming languages that they have used which is based on the language information in their repositories.

How was the data analyzed?

These relationships between programming languages are asymmetrical. To determine the relationship from language A to B, we count the number of times the pair were seen together and divide by the total number of A. We divide the pair count by the total number of B to get the relationship from B to A.

Example data:

  • Nine people have repositories written in Ruby only
  • Two people have repositories written in Ruby and PHP
  • One person has repositories written in PHP only

Example results:

  • The correlation between PHP to Ruby is 66.7% (2/3 of people who use PHP also use Ruby)
  • The correlation between Ruby to PHP is 20% (1/5 of people who use Ruby also use PHP)

When was the data published?

The data was gathered on March 2nd, 2012 and was published on April 9th, 2012.

What format is the data in?

The dataset is in JSON format.

The correlation from CoffeeScript to Ruby:

{
  "from": "CoffeeScript",
  "correlation": "87.9",
  "to": "Ruby"
}

The correlation from Ruby to CoffeeScript:

{
  "from": "Ruby",
  "correlation": "17.7",
  "to": "CoffeeScript"
}
@rmattsampson
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment