Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save charlieforward9/e47505e2038388aaee2dfed4ae1fe0a7 to your computer and use it in GitHub Desktop.
Save charlieforward9/e47505e2038388aaee2dfed4ae1fe0a7 to your computer and use it in GitHub Desktop.
A Design Pattern for deploying ML models to production applications

Design Pattern: Deploying ML models to production apps

An abstraction to guide ML system design and operation. MLOps can use this pattern as a discipline to deploy ML pipelines to production quickly and effectively.

The fact is ML applications introduce a new culture; their deployment and operations require new discipline and processes different from the existing DevOps practices" -Xu

Summarized notes from this thesis

Common Problems

  • Lack of environment that mirrors production for data scientists. Data scientists use local machine to develop models; the environment is completely different from production resulting the need to re-implement from scratch for production
  • Programming style conflict. Data scientists tend to develop models with a monolithic program, not following software engineering best practices. ML pipelines should provide a framework of pre-defined canonical unit of operations as components such that ML code can follow ML engineering best practices, as opposed to free-form flexibility
  • System design anti-patterns. Glue code and pipeline jungles, causing integration issues. Interfaces between components—both code and data—should be made explicit and simple enough so that implementing such interfaces is easy to use for ML code authors.

ML systems have a special capacity for incurring technical debt, because they have all of the maintenance problems of traditional code plus an additional set of ML-specific issues; and it is unfortunately common for systems that incorporate machine learning methods to end up with many anti-patterns.

MLOps Responsibilities

  1. Bringing ML applications to production quickly and reliably, and
  2. Ensure ML applications operational 24x7 while meeting all the functional and nonfunctional requirements

Design Pattern: Model Service Client + Retraining (MSC/R)

Principles, abstractions of reusable/repeatable paradigms, collaboration approach and guidelines for separation of concerns and team collaboration

See source for image of MSC/R Design Pattern

Model

  • data collection → data cleaning → feature engineering → model training

Service

  • Serving ML models and meeting all the functional and non-functional system requirements
  • Front Controller, Model-Serving, and Dynamic Infrastructure Platform

Client

  • user interaction logic, a client to display predictions, API endpoint, mobile front end, IoT edge device

Retraining

  • Composite pattern consisting of an Observer and a Trigger
  • Performance metrics of ML models, and the threshold used to trigger retraining for the next generation/version of the model

Connector (Interface Pattern)

  • MS Connector: Define the type and format of artifacts passed, development stack, model training pipeline
  • MR Connector: Define the rules of how retrained models are to be tested, versioned and released.
  • SR Connector: Define the metrics for ML model performance monitoring, retraining threshold, and retraining data source and code
  • SC Connector: Define the type, format and protocol of data exchange between Client application and Service entry point.

Roles (Startups will have overlap)

  • Data scientists are typically responsible for the model layer; data collecting & cleaning feature engineering, model training
  • Client developers are responsible for developing the front end for users to access the ML model.
  • MLOps engineers are responsible for building and operationalizing the infrastructure for serving models.

See source for image of MSC/R based ML System Architecture

Case study: Deploying YOLOv3 with RESTful API on AWS ECS

A common best practice of serving ML models is to expose them as RESTful APIs for the benefit of platform independence and service evolution.

Tracks

  1. Design infrastructure for the Service layer based on functional, non-functional requirements
  • See source for image of Architecture on AWS
    • Hosting service -- ECS provides managed container service.
    • Storage -- Amazon S3 bucket is used to store ML artifacts, Elastic Container Registry (ECR) is used to store ML container images
    • Security -- IAM is used to assign permissions to access AWS services,Security Group is used to control inbound and outbound traffic
    • Auto Scaling -- Auto Scaling Group manages the auto scaling
    • CloudWatch -- AWS CloudWatch is used to monitor the ML infrastructure
    • Load Balancing -- Elastic Load Balancer is used to balance network traffic and provides URL
    • Front Controller -- RESTful API
  1. Design all the connectors that interface with Service
  • Best practices for components to integrate is through modularization, well defined interfaces, separation of concerns, and design by contracts. (See paper for code implementation)
    • MS Connector
      • Convert to Microservices to take advantage of container technology
        • Define protocol for microservice-enabled code
      • Containerization
        • Offer the benefits of isolation, portability, agility, scalability, and fast deployment. It also raises new challenges; each service runs in its own process and communicates with other processes using protocols such as HTTP or AMQP
        • Put each microservice in a separate container and use HTTP or AMQP to exchange parameters between the services
    • SC Connector
      • Determine the protocol between Front Controller and client endpoint.
      • ML model is exposed as a RESTful API, the client needs to invoke the service by sending HTTP requests and get responses in json file format.
    • MR Connector
      • Define the rules of how retrained models are to be tested, versioned and released
      • When a new version is produced and deposited to GitHub repository, it can trigger CI/CD process before the model is deployed to the cloud
    • SR Connector
      • How to initiate retraining, and what parameters to dynamically set at runtime.
  1. Design retraining pipeline
  • The model needs to be automatically retrained in production using fresh data See source for image of Request Fail Rate

Conclusions, insights, recommendations

Effectiveness

Good abstraction and insight of ML system structure that help teams to quickly identify their tasks, roles and responsibilities

Solving Common Problems

  • Lack of prototype stack mirroring production environment: With separation of concerns, data scientists can rely on MLOps to build the prototype stack that mirrors the production environment
  • Programming style conflict: Guidelines for teams to follow best practice to create well defined interface, to design by contract, and to heed modularization
  • System design anti-patterns: Test more often, get rid of experimental code and dead code paths, and deal with technical debt and anti-pattern practice quickly to reduce integration issues.

Success Criteria

  1. Reduced time and difficulty to deploy ML models to production
  2. Capability to scale up/down horizontally and automatically.
  3. Live model monitoring, tracking and retraining

Serverless Comparison

See source for image of Serverless Comparison

Platform Specific Cloud Services

See source for image of Cloud Services

Bottom Line

With ML finding its way into all facets of software development, it is critical to use design patterns to create realiable and scalable systems. The MSC/R design pattern serves this purpose.

Gained value from these notes? Leave a comment sharing what you took away from it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment