The Machine Learning Scorecard Phase - 5 is a project listed as a part of the GSoC’22 program. The project involves making significant improvements to the existing loan and credit scoring system by exploiting the use of machine learning, pipelines and modern statistical models. It also aims to demonstrate the use of federated learning which will make the suite future-compliant with data privacy rules. With the completion of this project, the credit scoring system in the fineract suite will be transformed into a very strong part of the loan application module.
Traditional Machine Learning models relied on collecting data from users or edge devices such as mobile phones, sensors and smart devices. This data collected is then stored on a centralized data storage unit for training models and further analysis. The main disadvantages with this approach are -
- High risk of leak of sensitive data.
- High usage of bandwidth and battery for sending data and receiving trained models
- Strengthening of data privacy laws can render traditional approaches useless.
Federated learning is a novel approach, still actively under research which promises to deal with problems of traditional ML workflow and offers solutions which are already proving to be transformative.
“Federated learning allows for edge devices to work collaboratively in order to learn a shared prediction model while > keeping all data which belongs to the device within itself.”
Instead of transmitting and training data to a centralized data center, the weights used for prediction are communicated between the devices and the central federated server. This reduces the use of bandwidth and is more scalable.
As a part of this technology demonstrator, we researched working with many frameworks available.
- Pysyft
- Flower
- FATE
PySyft is a Python library for secure and private Deep Learning. It uses Federated Learning, Differential Privacy, and Encrypyted Computation. Pysyft is a very accessible and useful framework for implementing federated learning workflows. We observed that pysyft was more geared towards private datascience and needed a lot of prework which might not be feasible.
You can find the simulation of pysyft here:
https://github.com/Zavier-opt/fineract-federatedLearning-research/tree/main/FL_Simulation
Flower is another federated learning framework which is extremely user friendly. Using the flower framework we were able to successfully build a simulated local federated learning workflow using 2 clients and a server.
The simulation works as shown in the picture below -
You can find the code for simulation using flower here: https://github.com/Zavier-opt/fineract-federatedLearning-research/tree/main/flower
My partner @zavier-opt was able to develop a two phase rest api for training and predicting. The code can be found here: https://github.com/Zavier-opt/fineract-credit-scorecard/tree/develop/fl_learning
We can also implement custom stratergies in the flower framework to initialize custom parameters, define training rounds and other parameters including the minimum number of clients, the maximum number of clients, the use of aggregator methods.
A custom stratergy is given below:
FATE is another very well designed framework which helps in enterprise level deployment of end to end federated learning solutions. FATE comes with built advanced algorithms like secureboost and advanced encryption techniques. FATE can also handle cross-silo applications which can be of keen interest to mifos.
This makes working with FATE extremely easy, but also increases the scale and the potential applications of the product.
We successfully managed to use both the command line and the dsl(domain specific language) to upload data, train our models and get back the results.
You can find the DAG for the FATE simulation here
- Finalizing the data schema and the models which can be used for the project.
- Implementing robust training schedules