charlieforward9/A Design Pattern for deploying ML models to Production Applications.md

## A Design Pattern for deploying ML models to Production Applications.md

      
    Raw
  

              A Design Pattern for deploying ML models to Production Applications.md
            
          
    Design Pattern: Deploying ML models to production apps

An abstraction to guide ML system design and operation. MLOps can use this pattern as a discipline to deploy ML pipelines to production quickly and effectively.

The fact is ML applications introduce a new culture; their deployment and operations require
new discipline and processes different from the existing DevOps practices" -Xu

Summarized notes from this thesis

Common Problems


Lack of environment that mirrors production for data scientists. Data scientists use local machine to develop models; the environment is completely different from production resulting the need to re-implement from scratch for production
Programming style conflict. Data scientists tend to develop models with a monolithic program, not following software engineering best practices. ML pipelines should provide a framework of pre-defined canonical unit of operations as components such that ML code can follow ML engineering best practices, as opposed to free-form flexibility
System design anti-patterns. Glue code and pipeline jungles, causing integration issues. Interfaces between components—both code and data—should be made explicit and simple enough so that implementing such interfaces is easy to use for ML code authors.


ML systems have a special capacity for incurring technical debt, because they have all of the maintenance problems of traditional code plus an additional set of ML-specific issues; and it is unfortunately common for systems that incorporate machine learning methods to end up with many anti-patterns.

MLOps Responsibilities


Bringing ML applications to production quickly and reliably, and
Ensure ML applications operational 24x7 while meeting all the functional and nonfunctional requirements

Design Pattern: Model Service Client + Retraining (MSC/R)

Principles, abstractions of reusable/repeatable paradigms, collaboration approach and guidelines for separation of concerns and team collaboration

Model


data collection → data cleaning → feature engineering → model training

Service


Serving ML models and meeting all the functional and non-functional system requirements
Front Controller, Model-Serving, and Dynamic Infrastructure Platform

Client


user interaction logic, a client to display predictions, API endpoint, mobile front end, IoT edge device

Retraining


Composite pattern consisting of an Observer and a Trigger
Performance metrics of ML models, and the threshold used to trigger retraining for the next generation/version of the model

Connector (Interface Pattern)


MS Connector: Define the type and format of artifacts passed, development stack, model training pipeline
MR Connector: Define the rules of how retrained models are to be tested, versioned and released.
SR Connector: Define the metrics for ML model performance monitoring, retraining threshold, and retraining data source and code
SC Connector: Define the type, format and protocol of data exchange between Client application and Service entry point.

Roles (Startups will have overlap)


Data scientists are typically responsible for the model layer; data collecting & cleaning feature engineering, model training
Client developers are responsible for developing the front end for users to access the ML model.
MLOps engineers are responsible for building and operationalizing the infrastructure for serving models.


Case study: Deploying YOLOv3 with RESTful API on AWS ECS

A common best practice of serving ML models is to expose them as RESTful APIs for the benefit of platform independence and service evolution.
Tracks


Design infrastructure for the Service layer based on functional, non-functional requirements


Hosting service -- ECS provides managed container service.
Storage -- Amazon S3 bucket is used to store ML artifacts, Elastic Container Registry (ECR) is used to store ML container images
Security -- IAM is used to assign permissions to access AWS services,Security Group is used to control inbound and outbound traffic
Auto Scaling -- Auto Scaling Group manages the auto scaling
CloudWatch -- AWS CloudWatch is used to monitor the ML infrastructure
Load Balancing -- Elastic Load Balancer is used to balance network traffic and provides URL
Front Controller -- RESTful API


Design all the connectors that interface with Service


Best practices for components to integrate is through modularization, well defined interfaces, separation of concerns, and design by contracts. (See paper for code implementation)

MS Connector

Convert to Microservices to take advantage of container technology

Define protocol for microservice-enabled code


Containerization

Offer the benefits of isolation, portability, agility, scalability, and fast deployment. It also raises new challenges; each service runs in its own process and communicates with other processes using protocols such as HTTP or AMQP
Put each microservice in a separate container and use HTTP or AMQP to exchange parameters between the services


SC Connector

Determine the protocol between Front Controller and client endpoint.
ML model is exposed as a RESTful API, the client needs to invoke the service by sending HTTP requests and get responses in json file format.


MR Connector

Define the rules of how retrained models are to be tested, versioned and released
When a new version is produced and deposited to GitHub repository, it can trigger CI/CD process before the model is deployed to the cloud


SR Connector

How to initiate retraining, and what parameters to dynamically set at runtime.


Design retraining pipeline


The model needs to be automatically retrained in production using fresh data


Conclusions, insights, recommendations

Effectiveness

Good abstraction and insight of ML system structure that help teams to quickly
identify their tasks, roles and responsibilities
Solving Common Problems


Lack of prototype stack mirroring production environment: With separation of concerns, data scientists can rely on MLOps to build the prototype stack that mirrors the production environment
Programming style conflict: Guidelines for teams to follow best practice to create well defined interface, to design by contract, and to heed modularization
System design anti-patterns: Test more often, get rid of experimental code and dead code paths, and deal with technical debt and anti-pattern practice quickly to reduce integration issues.

Success Criteria


Reduced time and difficulty to deploy ML models to production
Capability to scale up/down horizontally and automatically.
Live model monitoring, tracking and retraining

Serverless Comparison


Platform Specific Cloud Services


Bottom Line

With ML finding its way into all facets of software development, it is critical to use design patterns to create realiable and scalable systems. The MSC/R design pattern serves this purpose.
Gained value from these notes? Leave a comment sharing what you took away from it!