Skip to content

Instantly share code, notes, and snippets.

View jrgamez's full-sized avatar

Raul Gamez jrgamez

View GitHub Profile
@jrgamez
jrgamez / README.md
Last active September 23, 2021 03:10
PS4 Games Sales Dataset

This is a dataset containing information about sales of PS4 games (in millions of dollars) from different regions around the world, multiple genres and publishers in different years.

It was obtained from Kaggle: Video Games Sales Dataset. The original dataset contained 1,031 observations and 9 features, but the columns containing the sales by region where unpivoted and replaced by a column indicating the region of the sales instead of having a single column of sales per region and the rows with zero sales were also removed. Finally, this modified version contains only 2,710 rows and 6 columns. The details about each column is presented below:

# Name Type Description
1 Game Categorical Name of the game
2 Year Ordinal Release year
3 Genre Categorical Genre of the game
4 Publisher Categorical Name of the publisher
@jrgamez
jrgamez / README.md
Last active July 17, 2022 11:08
Bank Customers Dataset

This is a dataset containing a wide variety of variables about the customers of a bank and their relationship with it. First some demographic features are presented like age, gender, education level, marital status, etc; then some variables that capture the patterns of use of the credit cards like transaction amounts, utilization ratio, month on book, collection contacts and credit limit and their current status with the bank.

It was obtained from Kaggle: Credit Card customers. The original dataset contained 10,128 observations and 23 features, but the rows with "unknown" values for the categorical variables were removed as well as the last 2 columns that contained irrelevant information. This modified version contains only 7,081 rows and 21 columns. The details about each column is presented below:

# Variable Type Description
1 Clientnum Categorical Unique identif
@jrgamez
jrgamez / README.md
Last active September 23, 2021 03:10
Insurance Expenses Dataset

This is a dataset containing several demographic features of individuals such as age, sex, number of children and region, features related to their health status like smoking situation and BMI, and their existing medical expenses. It's original purpose is to be used for predicting future medical expenses of individuals to help medical insurance making the decision on the premium to charge.

It was obtained from Kaggle: Insurance Premium Prediction. It contains 1,338 observations and 7 features. The details about each column is presented below:

# Name Type Description
1 age Quantitative age in years
2 sex Categorical male or female
3 bmi Quantitative body mass index
4 children Quantitative number of children