Skip to content

Instantly share code, notes, and snippets.

@HusseinLezzaik
Created August 2, 2023 02:57
Show Gist options
  • Save HusseinLezzaik/d925e919e9b9f9204c4974746c5ecfa8 to your computer and use it in GitHub Desktop.
Save HusseinLezzaik/d925e919e9b9f9204c4974746c5ecfa8 to your computer and use it in GitHub Desktop.
mlops tools

List of Tools

Resources

Cloud

Data

Sources

  • Amazon S3 Simple Storage Service
  • SingleStore Modern relational database for multi-cloud, hybrid and on-premises — bringing you speed, scale and immediate insights
  • Active Loop AI Database for AI
  • Appen World-Class Training Data Prodiver, data sourcing/annotation, and model evaluation

Data Lake/Warehouse

  • DataBricks
  • Snowflake useful for datascience
  • AirByte data ingestion, ELT platform that helps replicate your data in your warehouses, lakes, and databases
  • Big Query BigQuery is a fully-managed, serverless data warehouse that enables scalable analysis over petabytes of data

Databases

Processing

  • Dataiku enterprise AI apps platform
  • Zilliz OSS for processing unstructured datasets
  • Cloud Factory Human Powered Data Processing for AI and Automation
  • DataRobot enterprise AI apps platform
  • Relyance AI manage privacy, data governance, and compliance operations seamlessly, al on a single, intuitive platform.
  • Orchest Build data pipelines. No frameworks. No YAML. Just write your data processing code directly in Python, R, Julia, or Bash
  • Apache Spark
  • Dagster
  • GraphQL query language for APIs and a runtime for fulfilling those queries with your existing data
  • DataFlow is a managed service for executing a wide variety of data processing patterns
  • Kumo AI Unleash the predictive power of enterprise data.

Streaming

  • Beam Apache unified model for defining both batch and streaming data-parallel processing pipelines
  • Redis is an OS in-memory data structure store used as a database, cache, message broker, and streaming engine
  • Tableau data visualization for business intelligence
  • Kafka it stores data as it streams, streaming small packets of data from A to B at high frequency and volume
  • Materialize stream processing
  • Apache Flink Filtering and transforming streaming data in-flight via stream processing tool like Flink
  • Oryx for working with real-time data
  • Refuel AI Data Engine

Visualization

  • Pair Visualizations for ML Datasets
  • Altair declarative statistical visualization library in Python
  • Matplotlib

Exploration

Versioning

  • Data Version Control
  • Pachydern
  • Git-lfs Git Large File Storage (LFS) replaces large files such as audio samples, videos, datasets, and graphics with text pointers inside Git, while storing the file contents on a remote server like GitHub.com or GitHub Enterprise.

Labelling

Augmentation & Datasets

Training and Evalutaion

Frameworks

Distributed Training

Resource Management

  • Continual AI
  • Slurm highly scalable cluster managment and job scheduling system for large and small Linux clusters
  • Determined AI distributed training, hyperparameter tuning, experiment tracking, resource management
  • Docker containerize the software to be able to run it anywhere on the cloud or peoples computer
  • MetaFlow Metaflow provides a unified API to the infrastructure stack that is required to execute data science projects, from prototype to production
  • Prefect workflow orchestration

Experiment Managment

Hyperparameter Tuning

Software Engineering

  • PyEnv is a virtual env manager that allows you to easily install and manage different Python versions in different environments
  • Python Threading
  • Python Poetry Python Packaging and Dependency Management Made Easy, Environment Manager
  • Numba for CUDA GPUs
  • DeepNote
  • River ML Python Package for Online/Streaming Machine Learning
  • Spock Framework for parameters configurations managemet in Python
  • DeepChecks build test suites for ML Models and Data with DeepChecks
  • Skip a programming language to skip the things you have already computed
  • Conda + Pip Tools Python Env using Conda and Pip-tools
  • Python Infery is a Python runtime engine that lets you quickly run inference locally with only 3 simple commands
  • SHAP model interpretability
  • Cython
  • PyTest
  • Python UnitTest unit testing framework
  • Python UnSilence to silence video's
  • NbDev
  • GStreamer a Python framework for streaming media
  • Graphiz open source graph visualization software.
  • Undo like gdb & records
  • Valgrind framework for building dynamic analysis tools, memory management
  • Gitpod web-based IDE for entreprise software
  • Replit

DataScience Libraries

Compute

AutoML

  • Neural Network Intelligence OS for hyperparameter optimization, neural architecture search, model compression, and feature engineering
  • Auto-Sklearn is an automated ml toolkit and a drop-in replacement for scikit-learn estimator
  • Tpot Python AutoML tool that optimizes ml pipelines using genetic programming
  • PocketFlow An Automatic Model Compression (AutoMC) framework for developing smaller and faster AI applications
  • Masterful AI The Training Platform for Automated Computer Vision
  • Rasa

Deployment

Feature Store

  • Tecton AI Real-time or online machine learning models that continuously adapt and learn from data and generate predictions on the fly
  • Feast

Model Optimization

  • Neural Magic helps developers in accelerating deep learning performance using automated model sparsification technologies and a CPU inference engine.
  • Mosaicml Making ML Training Efficient

Monitoring

  • Gantry continous ML monitoring solution
  • Fiddler AI
  • Data Dog Cloud Monitoring as a Service
  • Evidently AI
  • Prometheus OS monitoring and alerting toolkit time series database
  • Honey Comb observability for Distributed Services
  • LabML.ai monitor deep learning model training and hardware usage from your mobile phone
  • Aporia tool for data-logging, monitoring, integration, and deployment
  • Anodot enterprise data monitoring
  • Arthur AI AI monitoring platform to manage ML
  • Superwise AI streamline your model observability SaaS platform
  • WhyLogs
  • Aporia Monitor, Explain, & Improve your ML Models

Edge

App

Web

  • Seldon deploy ml models at scale with more accuracy and faster
  • Gradio is the fastest way to demo your ml with a friendly web interface

CI/Testing

Model Serving

  • Torch Serve
  • TensorFlow Extended TFX
  • BentoML is a flexible, high-performance framework for serving, managing, and deploying ml models
  • OctoML
  • Verta AI open source ML model versioning, meta-data, and experiment management
  • OpenVINO OS toolkit for optimizing and deploying AI inference high-level Runtine C++ and Python APIs
  • Triton Inference Server provides an optimized cloud and edge inferencing solution
  • Cortex for serving predictions coming from your models
  • Replicate run ML models in the cloud from your own code

Deployment API

  • Streamlit turns data scripts into shareable web apps in minutes, all in-python
  • Dash from Plotly converts python scripts to productive-grade apps for your business
  • Flask Tool to serve your model on the web, to use weights and run inference
  • FastAPI is a modern, fast (high-performance), web-framework for building APIs with Python 3.6+ based on standard Python type hints
  • REST API
  • GraphQL POST/graphql
  • CerebiumAI deploy in 1 line of code
  • Banana ship ML to prod instantly
  • PipelineAI serverless GPUs inference for ML models

All-in-one Solutions

Web

FrontEnd

BackEnd

Package Manager

Frameworks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment