otrack/ml.md

## ml.md

      
    Raw
  

              ml.md
            
          
    Serverless Machine Learning

Serverless computing is a paradigm that removes much of the complexity to use the cloud by abstracting away the provisioning of compute resources.
This fairly new model was started by services such as Google BigQuery, and it has evolved today into Function-as-a-Service (FaaS) computing platforms such as AWS Lambda.
In these services, a user-defined function and its dependencies are deployed to the cloud, where they are managed by the provider and executed on demand at scale.
This new paradigm handles virtually all the system administration operations needed to program the cloud.
It represents an evolution that parallels the transition from assembly language to high-level programming languages.
The CloudButton project [1] aims at leveraging serverless computing to bring Big Data processing to the mass.
The core idea behind CloudButton is to tap into serverless computing to allow the everyday programmer to move transparently its single machine code to the cloud.
If successful, CloudButton will allow to port such programs effortlessly while bringing key cloud properties like elasticity, fault-tolerance and cost-efficiency.
In this context, a first effort was made to create Crucial, a middleware allowing the portage of single machine multithreaded Java programs to serverless.
Crucial [2] was successfully used to adapt a handful of machine learning (ML) algorithms, such as k-means, logistic regression and random forest, to serverless.
Initial results show that in such tasks, Crucial can rival, and even outperform, Apache Spark [3] running on a dedicated cluster at comparable costs.
The purpose of this project is to pursue this initial effort, bringing more ML Java codes to serverless.
The project is open to students from either IP Paris Masters (PDS, HPDA), or VAP ASR at Telecom SudParis.
Work plan

For starters, the student(s) will study the Crucial middleware and its implementation [2,3].
Then, they will look at the existing portage to serverless of (part of) the Smile ML library[4].
The ML algorithm of their choice will be then picked, and later ported to serverless using Crucial.
This portage will be evaluated with the help of existing FaaS platform (e.g., AWS Lambda, Google Functions) and large data sets.
If possible, a comparison with Apache Spark (MLlib) will be drawn.
Contact

Prof. Pierre Sutra
References

[1] https://blogrecherche.wp.imt.fr/en/2019/11/20/cloudbutton-big-data-in-one-click/

[2] D. Barcelona-Pons, M. Sánchez-Artigas, G. París, P. Sutra, Pedro García-López, On the FaaS Track: Building Stateful Distributed Applications with Serverless Architectures, In International Middleware Conference (Middleware), 2019.

[3] https://spark.apache.org/

[4] https://github.com/crucial-project/

[5] https://github.com/crucial-project/smile