Skip to content

Instantly share code, notes, and snippets.

View rgjohnson98's full-sized avatar

Robert Johnson rgjohnson98

View GitHub Profile
@rgjohnson98
rgjohnson98 / .README.md
Last active October 3, 2021 18:29
Baseball Batting Table

Background: This dataset is a table from a comprehensive database of baseball statistics, which contains data from 1871 to 2015. In order to narrow it down for this project, I subsetted the data for only 1901 to the present (considered the "modern" era), and took out some columns. This table is for batting.
Variables: Player's ID, Year, stint, team, league, Games, At Bats, Runs, Hits, doubles, triples, homeruns, Runs Batted In, Stolen Bases, Walks
Tasks: How has baseball changed in regards to homeruns over time (or any statistic tracked here)? We can use these data to compare players over time, and look for trends in players, teams, and the league overall.
Source: This dataset comes from Kaggle

@rgjohnson98
rgjohnson98 / README.md
Last active September 24, 2021 21:40
US Post Office Locations

Background: This large dataset contains information such as geographical coordinates for every post office that the US has ever had (where the coordinates are known)
Variables: Post Office Name, State, Year Established, Year Discontinued (blank if it is still in operation), Latitude, Longitude
Tasks: Can we identify trends in where people are living based on the locations of post offices? For example, we would expect to see more post offices in California following the gold rush, etc.
Source: This dataset comes from the Harvard Dataverse.

@rgjohnson98
rgjohnson98 / .README.md
Last active September 26, 2021 17:28
NASCAR Driver Data

Background: This data table contains information on every NASCAR driver's finishes in races between 1975 and 2003. There is a second data table that contains information on the tracks they ran those races on. So, Race Num,Year, and Race Num of Year act as a composite key that points to the track and information about it in that table.
Variables: Race Number in the set, Year, Race number of year, Finishing position, Starting position, Laps completed, Payout, Number of cars in field, Car make, Driver
Tasks: You could use these data to obtain information on specific drivers or all drivers during a specific date range. Information such as most wins, most top 5s, most 2nd place finishes without a win, etc. Combining the data in this table with the data in the tracks table, we can get information like: Who was best driver at track "X" of all time, during a specific period, and so on.
Source: This dataset come