Skip to content

Instantly share code, notes, and snippets.

@campeterson
Last active November 1, 2022 23:59
Show Gist options
  • Star 13 You must be signed in to star a gist
  • Fork 8 You must be signed in to fork a gist
  • Save campeterson/5946446 to your computer and use it in GitHub Desktop.
Save campeterson/5946446 to your computer and use it in GitHub Desktop.
Data sets

Aviation

National Flight Data Center (NFDC)

https://nfdc.faa.gov/xwiki/bin/view/NFDC/WebHome

FAA Data & Research

http://www.faa.gov/data_research/

Flight Delay Information

http://www.fly.faa.gov/flyfaa/usmap.jsp

FAA Aviation Safety Information Analysis and Sharing (ASIAS)

http://www.asias.faa.gov/pls/apex/f?p=100:1:

Aircraft Situation Display to Industry (ASDI)

http://www.fly.faa.gov/ASDI/asdi.html

NTSB Accident Database & Synopses

http://www.ntsb.gov/aviationquery/index.aspx

OpenFlights.org

http://openflights.org/data.html

The Center for Innovation in Engineering and Science Education Real time data sites

http://www.k12science.org/realtimedatasites.html

MIT Airline Data Project

http://web.mit.edu/airlinedata/www/Revenue&Related.html

Space

Real-Time Space Weather Data Sources

http://space.rice.edu/ISTP/#RT

Politics

Data on the U.S. Congress – A Joint Effort from Brookings and the American Enterprise Institute

http://www.brookings.edu/research/reports/2013/07/vital-statistics-congress-mann-ornstein

Sports

Open Sports Data/API

http://www.openligadb.de/

Football (Soccer) Stats

http://www.11v11.com/

Government

Public Government Data Sets

https://explore.data.gov/catalog/raw/

U.S. Department of Homeland Security Data

http://www.dhs.gov/data

Public Data for the State of Utah

http://www.utah.gov/data/

Compilations by others

Finding Data on the Internet - Inside-R

http://www.inside-r.org/howto/finding-data-internet

Nathan Yau's collection of data sets

http://flowingdata.com/2009/10/01/30-resources-to-find-the-data-you-need/

Dr. Jerry A. Smith's Favorite Data sets

http://datascientistinsights.com/2013/02/02/data-monetization-road-paved-on-top-of-data-sets/

Hilary Mason's "Research Quality" Data-sets

https://bitly.com/bundles/hmason/1
This is a bundle that gathers public data sets that might be interesting to researchers in a variety of fields in one place.

Peter Skomoroch's list of data sets on Delicious

https://delicious.com/pskomoroch/dataset

Data Wrangling blog data set list

http://www.datawrangling.com/some-datasets-available-on-the-web

Other

DonorsChoose.org - Hacking Education: A Contest for Developers and Data Crunchers

http://developer.donorschoose.org/the-data

Datasets for "The Elements of Statistical Learning"

http://www-stat.stanford.edu/~tibs/ElemStatLearn/data.html

Enron Email Dataset

http://www.cs.cmu.edu/~enron/
CALO Project (A Cognitive Assistant that Learns and Organizes). It contains data from about 150 users, mostly senior management of Enron, organized into folders. The corpus contains a total of about 0.5M messages.

Yandex

http://imat-relpred.yandex.ru/en/datasets

The Data Page

http://pages.stern.nyu.edu/~adamodar/New_Home_Page/data.html

Public Data Sets on Amazon

http://aws.amazon.com/publicdatasets/

Miami School of Business Statistical Data Sets

http://www.bus.miami.edu/research-library/free-internet-sites/Statistical%20Data%20Sets.html

Public data put to good use

http://datamob.org/datasets

ASU GeoDA Center Data

http://geodacenter.asu.edu/sdata

UC Irvine Machine Learning Repository

http://archive.ics.uci.edu/ml/index.html http://archive.ics.uci.edu/ml/datasets.html

European Cities 1M Data Sets

http://image.ntua.gr/iva/datasets/ec1m/

University of Edinburgh School of Informatics Data Sets for Data Mining

http://www.inf.ed.ac.uk/teaching/courses/dme/html/datasets0405.html

Opinion Mining, Sentiment Analysis, and Opinion Spam Detection

http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html

Quandl - Intelligenct search for numerical data

http://www.quandl.com/

Gephi Graph Visualization Sample Data Sets

https://wiki.gephi.org/index.php?title=Datasets

CitiBike, by NYC Bike Share - Station data

http://citibikenyc.com/stations/json

http://www.quora.com/Data/Where-can-I-get-large-datasets-open-to-the-public

Air Quality Notifications

http://feeds.enviroflash.info/

The GDELT Project - Global Database of Events, Language, and Tone

http://gdeltproject.org/data.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment