DUE TO VIETTEL MILITARY INDUSTRY AND TELECOMS GROUP REGULATIONS, THE DATA COLLECTED FOR THIS PROJECT MUST BE KEPT STRICTLY PRIVATE AND CONFIDENTIAL. HOWEVER, PARTICIPANTS OF THIS PROJECT, INCLUDING BUT NOT LIMITED TO THE STUDENTS WORKING ON THE PROJECT, MENTORS, AND INSTRUCTORS, ARE ALLOWED TO USE THE DATA FOR THE PURPOSES OF THIS PROJECT ONLY AND MUST REFRAIN FROM SHARING ANY OF THE DATA WITH THIRD PARTIES WITHOUT PRIOR CONSENT FROM THE COMPANY.
This project focuses on a trending subject in the field of finance, which involves forecasting a user’s loan demand by utilizing their personal information and historical financial data. The dataset is preprocessed with some common and advanced techniques. Linear regression, support vector machine, neural network, random forest and its variation are the machine learning techniques used to learn the dataset.
This project focuses on the historical data of customers in Viettel Money to predict the binary classification problem of loan demands. The project used some techniques of categorical data encoding, missing data handling, data preprocessing, data exploration, dimensionality reduction, balanced sampling, and some machine learning approaches, including linear regression, support vector machines, random forests, neural networks, and a new random-forest-based model that has the ability to learn imbalanced data effectively.
The dataset has 40,000 samples of 146 features (customer identifier ”id” excluded). The dataset has two equal time frames: two previous months (namely ”n_2”) and one previous month (namely ”n_1”). They have the same 20,000 distinct customers. As the requirement is prediction, the whole data in n_2 is used to predict the labels of the respective customers in n_1.