You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This dataset is updated daily and contains data on all games, all teams, and all players within the NBA including:
60,000+ games (every game since the first NBA season in 1946-47) including for the games in which the statistics were recorded:
Box scores, Game summaries, Officials, Inactive players, Linescores, Last face-off stats, Season series info, Game video availability
30 teams with information including:
General team details (stadium, head coach, general manager, social media links, etc), Franchise history information (name changes, location changes, etc)
4500 players with:
Basic draft data, Prior affiliations, Career statistics. Anatomical data (height & weight)
Mixed Integer Linear Programming for Fair Division Problems
The goal of this project is to find optimally fair allocations of divisible and non-divisible goods for a group of people under three different definitions of fairness under envy-freeness with certain assumptions. Mixed integer linear programming (MILP) formulations are created in AMPL and solved using CPLEX resulting in the generation of datasets consisting of the minimal approximate envy value and solver elapsed time for different combinations of number of people and number of goods. Interactive 3D visualizations of this dataset are created in Python and analysis of results is conducted. The project consists of two main outcomes, paper.pdf, which is a full, compiled research paper, and report_nb.ipynb which hosts the results datasets and visualizations. Click below to load the project notebook in your browser using the Binder service, or continue reading for more information on the project.
Interact with the project notebook in your web browser using the Binder service
This folder contains input and output data from the project as well as .LP and .MPS files for all problem instances. The input and output directories have subdirectories pertaining to the specific problem of interest.
environment.yml
File
This is a anaconda virtual envirronment replication file that ensures consistent versions of software packages.
report_np.ipynb
File
This is a Jupyter Python Notebook that contains the results of solving the generated examples. This notebook also contains visualizations, both two-dimensional and three-dimensional, that should help to provide a better understanding of the results.
src
Folder
Contains all source code for solving the examples. This folder contains the commands used to perform actions like normaization, a file to create all of the dynthetic data, and then a '.mod' and '.run' ampl file for each subtype of problem. Assuming that the necessary data files have been generated, the '.run' files for each sub-question can be run from the console of AMPL, and all of the examples will be solved, and subsequent output files will be created within the data folder.
visualizations
Folder
A collection of the different visualizations created in the Jupyter Notebook in the form of .png files. These visualizations are sorted by type and can be found in the sub-folders.
Project Summary
Fair division problems are a significant class of problems with considerable multidisciplinary involvement ranging from social science to computer science. Currently there exist many specificies of envy-freeness, applied to a multitude of scenarios, solved through assorted methodologies. To guide the work in this project, three particular definitions of envy-freeness are analyzed for a particular situation. These are envy-freeness, envy-freeness up to one item, and envy-freeness with the inclusion of a divisible subsidy in the form of a cash amount. We apply these definitions to the situation where items are indivisible and valuations are both additive and normalized.
These three definitions were modeled in the AMPL programming language and then solved using the IBM CPLEX solver for two simple examples and a collection of generated data for different combinations of number of people and number of items to be allocated.
The results for the two simple examples serve to validate the accuracy of the formulationsa nd the results for the collection of generated data allow for analysis to be conducted on the complexity of these problem types. Furthermore, strategies are devised and implemented to reduce the runtime of the envy-freeness instance including: upper-bounding the objective function, initializing CPLEX with a feasible starting solution, the combination of both upper-bounding the objective function and initializing CPLEX with a feasible starting solution, and finally tuning various CPLEX parameters.
Instructions for Usage
environment.yml can be found in the repository's root directory for your version of interest and used to install necessary project dependencies. If able to successfully configure your computing environment, then launch Jupyter Notebook from your command prompt and navigate to report_nb.ipynb. If unable to successfully configure your computing environment refer to the sections below to install necessary system tools and package dependencies. The following sections may be cross-platform compatibile in several places, however is geared towards macOS.
Do you have the Conda system installed?
Open a command prompt (i.e. Terminal) and run: conda info.
This should display related information pertaining to your system's installation of Conda. If this is the case, you should be able to skip to the section regarding virtual environment creation (updating to the latest version of Conda could prove helpful though: conda update conda).
If this resulted in an error, then install Conda with the following section.
Install Conda
There are a few options here. To do a general full installation check out the Anaconda Download Page. However, the author strongly recommends the use of Miniconda since it retains necessary functionality while keeping resource use low; Comparison with Anaconda and Miniconda Download Page.
Windows users: please refer to the above links to install some variation of Conda. Once installed, proceed to the instructions for creating and configuring virtual environments [found here](#Configure-Local-Environment
macOS or Linux users: It is recommended to use the Homebrew system to simplify the Miniconda installation process. Usage of Homebrew is explanained next.
Do you have Homebrew Installed?
In your command prompt (i.e. Terminal) use a statement such as: brew help
If this errored, move on to the next section.
If this returned output (e.g. examples of usage) then you have Homebrew installed and can proceed to install conda found here.
Install Homebrew
In your command prompt, call: /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Install Miniconda with Homebrew
In your command prompt, call: brew install --cask miniconda
When in doubt, calling in the brew doctor might help 💊
List all environments (current environment as marked by the *): conda env list
Create a new environment: conda create --name myenv
Activate an environment: conda activate myenv
Deactivate an environment and go back to system base: conda deactivate
List all installed packages for current environment: conda list
Configure Local Environment
Using the command prompt, navigate to the local project repository directory -- On macOS, I recommend typing cd in Terminal and then dragging the project folder from finder into Terminal.
In your command prompt, call: conda env create -f environment.yml. This will create a new Conda virtual environment with the name: explorations-in-envy-free-allocations.
Activate the new environment by using: explorations-in-envy-free-allocations
Access Project
After having activated your environment, use jupyter notebook to launch a Jupyter session in your browser.
Within the Jupyter Home page, navigate and click on report_nb.ipynb in the list of files. This will launch a local kernel running the project notebook in a new tab.
.
├── README.md This file
├── paper.pdf Project Write-Up
├── environment.yml Conda environment configuration file (ssed to load project dependencies)
├── nb.ipynb Jupyter Notebook used for data analysis and modelling (hosted at the above Binder link)
├── .gitignore Git file used to ignore non-repo local files
└── srcDirectory containing custom scripts
├── __init__.py
├── agent.py Agent class definition (agent instantiation and opinion variation)
├── data_functions.py Helpful functions to manipulate data
├── data_operations.py Main data file used to prouduce data (utilizes Apache Spark)
├── data_processing.py Short script to fix time data writing issue in simulation
├── environment.py Environment class definiton (establishes agents, holds data, increments time, conducts group negotiations)
├── main.py Script to run collection of experiments
├── model.py Model class definition (sets enviroment, generates collection of experiment parameters, conducts experiments)
├── utilities.py Helpful functions used throughout simulation
└── visualization.md Mermaid markdown snippet dump for flowcharts
Instructions for Usage
environment.yml can be found in the repository's root directory and used to install necessary project dependencies. If able to successfully configure your computing environment, then launch Jupyter Notebook from your command prompt and navigate to nb.ipynb. If unable to successfully configure your computing environment refer to the sections below to install necessary system tools and package dependencies. The following sections may be cross-platform compatibile in several places, however is geared towards macOS1.
Do you have the Conda system installed?
Open a command prompt (i.e. Terminal) and run: conda info.
This should display related information pertaining to your system's installation of Conda. If this is the case, you should be able to skip to the section regarding virtual environment creation (updating to the latest version of Conda could prove helpful though: conda update conda).
If this resulted in an error, then install Conda with the following section.
Install Conda
There are a few options here. To do a general full installation check out the Anaconda Download Page. However, the author strongly recommends the use of Miniconda since it retains necessary functionality while keeping resource use low; Comparison with Anaconda and Miniconda Download Page.
Windows users: please refer to the above links to install some variation of Conda. Once installed, proceed to the instructions for creating and configuring virtual environments [found here](#Configure-Local-Environment
macOS or Linux users: It is recommended to use the Homebrew system to simplify the Miniconda installation process. Usage of Homebrew is explanained next.
Do you have Homebrew Installed?
In your command prompt (i.e. Terminal) use a statement such as: brew help
If this errored, move on to the next section.
If this returned output (e.g. examples of usage) then you have Homebrew installed and can proceed to install conda found here.
Install Homebrew
In your command prompt, call: /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Install Miniconda with Homebrew
In your command prompt, call: brew install --cask miniconda
When in doubt, calling in the brew doctor might help 💊
List all environments (current environment as marked by the *): conda env list
Create a new environment: conda create --name myenv
Activate an environment: conda activate myenv
Deactivate an environment and go back to system base: conda deactivate
List all installed packages for current environment: conda list
Configure Local Environment
Using the command prompt, navigate to the local project repository directory -- On macOS, I recommend typing cd in Terminal and then dragging the project folder from finder into Terminal.
In your command prompt, call: conda env create -f environment.yml. This will create a new Conda virtual environment with the name: higher-education-simulation.
Activate the new environment by using: conda activate higher-education-simulation
Access Project
After having activated your environment, use jupyter notebook to launch a Jupyter session in your browser.
Within the Jupyter Home page, navigate and click on nb.ipynb in the list of files. This will launch a local kernel running the project notebook in a new tab.
1: This project was created on macOS version 11.0.1 (Big Sur) using Conda version 4.9.2, and Python 3.8 (please reach out to me if you need further system specifications).
Machine Learning for NBA Game Attendance Prediction
This project seeks to provide a tool to accurately predict the attendance of NBA games in order to better inform the business decisions of different stakeholders across the organization. Predicting game attendance is crucial to making optimized managerial decisions such as planning necessary staffing needs or procuring the proper level of supplies (janitorial, food services, etc). The project is currently being worked on in its second version, version_2. In version 1, an entire machine learning pipeline is established throughout a host of modules ranging from web scraping for data collection to neural-network regression modeling for prediction. These efforts resulted in a high accuracy model with mean absolute error values for attendance around 800 people. However, improvements in data sources and modeling paradigms for improved accuracy are being sought in a few ways in the upcoming version. Click the link below to view the analysis and modeling version 1.0 notebook or continue reading for more about the project.
Interact with the project notebook in your web browser using the Binder service
data contains both raw and processed data. There are game, search popularity, and stadium wiki raw datasets. These three datasets are processed and compiled resulting in the file dataset.csv within the processed directory. However, numerous other datasets can be found here which are the accumulation of different feature selection and data sampling strategies for use in modeling.
features contains results derived from statistical testing and principal components analysis across the datasets
models contains datasets of the error results across all the models applied as well as tuning parameter values
src is where all the project source code can be found. A host of modules and functions for data web scraping, feature selection, visualization, modeling, and Jupyter configuration are here.
version_2 is where all files related to the second iteration of this project can be found. Its structure generally mirrors that of repository root directory with sub-directories for data, source code, etc.
visualizations holds .png images of the visualizations created on the datasets
nb.ipynb is the associated data analysis and modeling notebook (this notebook can also be found and interacted with via the Binder link found above.
r_modeling.ipynb is an R notebook used for further data modeling with more exotic models.
environment.yml and requirements.txt are environment setup files to properly configure an environment and load necessary dependendicies for the project (a further explanation of how to use these can be found at the bottom)
Version 2.0
Development Roadmap
The goal of this version is to create another implementation of this machine learning pipeline leveraging knowledge gained from the first version to improve overall predictive accuracy and utilize new tools and modeling techniques.
To avoid any potential data cleanliness problems with scraping data from basketball-reference.com as in the first version, stats.nba.com will be queried through an open source API for sport related data enabling more seasons and features to be gathered. Furthermore, a wider range of data sources will be considered taking into consideration factors such as regional socioeconomics, weather, etc. New pre-processing scripts will be used to combine and clean the data from these different sources in order to make a dataset apt for modeling. Core modeling assumptions leveraged in the first version, such as data distribution will be re-evaluated. Furthermore, a new portfolio of modeling techniques based on more current research will be applied. A few models to be included are linear regression with the Huber loss, a long short-term memory neural network, and ensemble methodologies.
In future versions, a full Kubernetes cluster of the pipeline deployed via distributed cloud-computing resources would be a wonderful addition. This would allow for automated model updates, fully parallelized modeling (as every model can be containerized), and prediction delivery.
Progress Updates
Game data and especially attendance data was successfully retrieved for all seasons since 1946 using nba_api. This is awesome as version 1.0 only included seasons since 1999. The package was discovered on Github and leveraged to query the numerous stats.nba.com endpoints.
Functions to query different types of datasets as well as functions to combine and clean the results have been created for league team data, game overiew data for all seasons, and game box score summary data for all games.
Version 1.0
Project Summary
As briefly discussed in the introduction, the project's aim is to create an NBA game attendance prediction tool in order to improve the business decisions of NBA stadium managers. These managers have to make dynamic decisions reacting to fluctuating demand in a constrained, complex environment. Staff scheduling, food services, and entertainment are just a few of these decision areas. Game attendance predictions can be used as a tool to gain insights on customer demand and help to better inform these manager's decisions. Operating expenses can be minimized if waste is minimized and properly assessing demand helps to ensure fewer overages. Assessing demand can further impact the bottom line of the stadium through helping to ensure there are proper supply levels to meet customer demand.
Game attendance prediction can serve to underlie many of the tools and processes found across the different facets of the organization. As an example, vendors can use attendance predictions along with their own demand metrics and analytics, to better assess how many soft goods to purchase. Simililar foundational relationships can be found for most vendors, as well as janitorial supplies, process timing, and facility operations.
The flowchart below details the five different stages within the pipeline architecture used here.
A: Stadium data (e.g. location) is scraped from wikipedia using the Pandas library. Game and sport data is scraped from basketball-reference.com using the Beautiful Soup framework and requests library. The pytrends Google Trends API is used to gather search popularity data.
B:
Results and Discussion
Installation Instructions
environment.yml can be found in the repository's root directory for your version of interest and used to install necessary project dependencies. If able to successfully configure your computing environment, then launch Jupyter Notebook from your command prompt and navigate to nb.ipynb. If unable to successfully configure your computing environment refer to the sections below to install necessary system tools and package dependencies. The following sections may be cross-platform compatibile in several places, however is geared towards macOS1.
Do you have the Conda system installed?
Open a command prompt (i.e. Terminal) and run: conda info.
This should display related information pertaining to your system's installation of Conda. If this is the case, you should be able to skip to the section regarding virtual environment creation (updating to the latest version of Conda could prove helpful though: conda update conda).
If this resulted in an error, then install Conda with the following section.
Install Conda
There are a few options here. To do a general full installation check out the Anaconda Download Page. However, the author strongly recommends the use of Miniconda since it retains necessary functionality while keeping resource use low; Comparison with Anaconda and Miniconda Download Page.
Windows users: please refer to the above links to install some variation of Conda. Once installed, proceed to the instructions for creating and configuring virtual environments [found here](#Configure-Local-Environment
macOS or Linux users: It is recommended to use the Homebrew system to simplify the Miniconda installation process. Usage of Homebrew is explanained next.
Do you have Homebrew Installed?
In your command prompt (i.e. Terminal) use a statement such as: brew help
If this errored, move on to the next section.
If this returned output (e.g. examples of usage) then you have Homebrew installed and can proceed to install conda found here.
Install Homebrew
In your command prompt, call: /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Install Miniconda with Homebrew
In your command prompt, call: brew install --cask miniconda
When in doubt, calling in the brew doctor might help 💊
List all environments (current environment as marked by the *): conda env list
Create a new environment: conda create --name myenv
Activate an environment: conda activate myenv
Deactivate an environment and go back to system base: conda deactivate
List all installed packages for current environment: conda list
Configure Local Environment
Using the command prompt, navigate to the local project repository directory -- On macOS, I recommend typing cd in Terminal and then dragging the project folder from finder into Terminal.
In your command prompt, call: conda env create -f environment.yml. This will create a new Conda virtual environment with the name: NBA-attendance-prediction.
Activate the new environment by using: conda activate NBA-attendance-prediction
Access Project
After having activated your environment, use jupyter notebook to launch a Jupyter session in your browser.
Within the Jupyter Home page, navigate and click on nb.ipynb in the list of files. This will launch a local kernel running the project notebook in a new tab.
1: This project was created on macOS version 11.0.1 (Big Sur) using Conda version 4.9.2, and Python 3.8 (please reach out to me if you need further system specifications).
Regularized Linear Regression Deep Dive: Application to Wine Quality Regression Dataset
This project consists of a deep dive on multiple linear regression (OLS) and its regularized variants (Ridge, the Lasso, and the Elastic Net) as well as Python implementations for exploratory data analysis, K-Fold cross-validation and modeling functions as applied to regression of a wine quality dataset. This examination applies optimization theory to either derive model estimator (for OLS and Ridge) or derive the update rule for Pathwise Coordinate Descent (the discrete optimization algorithm chosen and implemented to solve the Lasso and the Elastic Net). These derivations have accompanying Python implementations, which are leveraged to predict wine quality ratings within a supervised learning context.
A three-part series of blog posts on this topic was published in Towards Data Science Read them here:
Interact with the project notebook in your web browser using the Binder service
The entirety of this project is written in Python (version 3.8) with a majority of functions depending on NumPy and several on pandas. Matplotlib and Seaborn are used for visualization. Furthermore, there are a few other simple dependencies used like the time or math libraries.
Implementations can be found for train-test data splitting, variance inflation factor calculation, K-Fold cross-validation, ordinary least squares (OLS), Ridge, the Lasso, and the Elastic Net as well as several other functions used to produce the notebook.
Installation Instructions
environment.yml can be found in the repository's root directory for your version of interest and used to install necessary project dependencies. If able to successfully configure your computing environment, then launch Jupyter Notebook from your command prompt and navigate to nb.ipynb. If unable to successfully configure your computing environment refer to the sections below to install necessary system tools and package dependencies. The following sections may be cross-platform compatibile in several places, however is geared towards macOS1.
Do you have the Conda system installed?
Open a command prompt (i.e. Terminal) and run: conda info.
This should display related information pertaining to your system's installation of Conda. If this is the case, you should be able to skip to the section regarding virtual environment creation (updating to the latest version of Conda could prove helpful though: conda update conda).
If this resulted in an error, then install Conda with the following section.
Install Conda
There are a few options here. To do a general full installation check out the Anaconda Download Page. However, the author strongly recommends the use of Miniconda since it retains necessary functionality while keeping resource use low; Comparison with Anaconda and Miniconda Download Page.
Windows users: please refer to the above links to install some variation of Conda. Once installed, proceed to the instructions for creating and configuring virtual environments [found here](#Configure-Local-Environment
macOS or Linux users: It is recommended to use the Homebrew system to simplify the Miniconda installation process. Usage of Homebrew is explanained next.
Do you have Homebrew Installed?
In your command prompt (i.e. Terminal) use a statement such as: brew help
If this errored, move on to the next section.
If this returned output (e.g. examples of usage) then you have Homebrew installed and can proceed to install conda found here.
Install Homebrew
In your command prompt, call: /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Install Miniconda with Homebrew
In your command prompt, call: brew install --cask miniconda
When in doubt, calling in the brew doctor might help 💊
List all environments (current environment as marked by the *): conda env list
Create a new environment: conda create --name myenv
Activate an environment: conda activate myenv
Deactivate an environment and go back to system base: conda deactivate
List all installed packages for current environment: conda list
Configure Local Environment
Using the command prompt, navigate to the local project repository directory -- On macOS, I recommend typing cd in Terminal and then dragging the project folder from finder into Terminal.
In your command prompt, call: conda env create -f environment.yml. This will create a new Conda virtual environment with the name: regularized-regression-from-scratch.
Activate the new environment by using: regularized-regression-from-scratch
Access Project
After having activated your environment, use jupyter notebook to launch a Jupyter session in your browser.
Within the Jupyter Home page, navigate and click on nb.ipynb in the list of files. This will launch a local kernel running the project notebook in a new tab.
1: This project was created on macOS version 11.0.1 (Big Sur) using Conda version 4.9.2, and Python 3.8 (please reach out to me if you need further system specifications).
See here for the different sources utilized to synthesize this project.