Student | Anshuman Singh |
---|---|
Github | @rimijoker |
Organisation | OpenMined |
Project | Implement Auto-Scaling of PyGrid servers on Google Cloud |
The audience of PySyft mainly consists of people who would like to train their model on private data that reside on other devices/locations. Right now one has to manually spin-up Google Cloud Machines, load a PyGrid instance, queue and run training jobs deposit the results to another long-running instance(Master) and tear-down the created instances(Workers). This project aims to automate the process mentioned above. Another primary the feature would be to run "hyperparameter sweep" to figure out the best parameters for the model. Basic functionalities include:
- Provision Machines on the Cloud(Using Terraform)
- Install and start PyGrid Servers(Using docker)
- Run training scripts(Using PySyft, PyTorch)
- Adding the functionality effectively runs a hyper-parameter sweep.
- Tear-down, the worker instances the training, is done.
Provision Machines on the Cloud(Using Terraform) Install and start PyGrid Servers(Using docker) Run training scripts(Using PySyft, PyTorch) Setting up a useful cross-validation framework and adding the functionality effectively runs a hyper-parameter sweep. So, in other words, one should be easily able to create any number of PyGrid servers, train on a model on them, select the best hyperparameters, send the models to the master, take the average and the workers PyGrid nodes should tear-down or keep running(as specified) automatically.
PySyft: Github Repository
Project Issue: Implement Auto-Scaling on Google Cloud
-
Initial part of Auto-Scaling on Google Cloud:
-
Implement a part of Auto-Scaling on GCP, now one can spin-up a compute instance just by using a few lines of code.
-
Added a README for further instructions.
-
-
Added support for notebooks in Auto-scale API:
- Added support for autoscale on notebooks, previously it could only be run using a python script. Now it can be run using both script (.py) and notebook(.ipynb).
-
Added functionality to create a cluster using autoscale API:
-
Added functionality to spin-up a MIG of N workers + 1 master using the autoscale API.
-
Added enums for easy auto-complete.
-
-
Added eviction policy & updated README.md with instructions to add Budget Alerts:
-
Added the feature to add eviction policy, which will tear-down the worker instances after Cluster.sweep() is called.
-
Added instructions to setup budget alert on Google Cloud Platform.
-
-
Added functionality to launch PyGrid network and nodes upon creation of a cluster:
-
Added startup-scripts and other necessary changes which will enable the autoscale API to launch PyGrid servers(network and nodes) automatically. Now using the autoscale API can launch PyGird cluster(1 network + N nodes) with one function call or create a PyGrid network and add PyGrid nodes to it one at a time as per the requirement.
-
Added tests and an example tutorial training script for the demonstration of the same.
-
-
-
Updated startup scripts used in the autoscale API to use the latest PyGridNode and PyGridNetwork docker production images.
-
I have also added a 30sec sleep time at the start of each startup script to avoid docker hub connection issues.
-
Added functionality to send the final model to the master instance in the cluster.sweep() using model_serve()
-
Before you start to spin-up instances, we encourage to set a budget alert on GCP to avoid surprise costs.
Setup Budget and Budget Alerts
You can find sample code in test.py
and test.ipynb
- Import enums from the
gcloud_configurations.py
import syft.grid.autoscale.utils.gcloud_configurations as configs
`"
- Initialize using:
```python
instance_name = gcloud.GoogleCloud(
credentials="GCP Login/terraf.json",
project_id="terraform",
region=configs.Region.us_central1,
)
- Reserve IP address using:
instance_name.reserve_ip("grid")
- Create Instances using:
instance_name.compute_instance(
name="new-12345",
machine_type=configs.MachineType.f1_micro,
zone=configs.Zone.us_central1_a,
image_family=configs.ImageFamily.ubuntu_2004_lts,
)
- Create PyGrid Network instance using:
instance_name.create_gridnetwork(
name="new-network",
machine_type=configs.MachineType.f1_micro,
zone=configs.Zone.us_central1_a,
)
- Create PyGrid Node instance using:
instance_name.create_gridnode(
name="new-node",
machine_type=configs.MachineType.f1_micro,
zone=configs.Zone.us_central1_a,
gridnetwork_name="new-network",
)
- Create Clusters using:
c1 = instance_name.create_cluster(
name="my-cluster1",
machine_type=configs.MachineType.f1_micro,
zone=configs.Zone.us_central1_a,
reserve_ip_name="grid"
target_size=3,
eviction_policy="delete",
)
- Run a parameter sweep to figure out the best parameters using:
c1.sweep(
model,
hook,
model_id="new_model",
mpc=False,
allow_download=False,
allow_remote_inference=False,
apply=True,
)
- Destroy all the created resources using:
instance_name.destroy()
You can find more details on this here
- Test out various libraries and add the functionality of hyperparameter search to pick the best model.
- Add support for other cloud platforms and contribute more to PyGrid and PySyft.
- Answered doubt of students interested in doing GSoC next year on my twitter handle here.
Feel free to reach out to me if you have any doubts about my projects or GSoC. You can find me on twitter @rimijoker