- List of Tools for MLOps - Chip Huyen
- LF AI & Data Foundation Interactive Landscape
- The AI and Data Landscape
- Data50: The Worlds Top Data Startups
- My MLOps Stack
- Production Level Deep Learning
- ApplyingML
- Awesome MLOps
- MLOps Basics
- MLOps ZoomCamp
- Machine Learning Ops
- Data-Centric AI Resource Hub
- AWS
- DynamoDB fast, flexible NoSQL database service for single-digit ms performance at any scale
- EC2
- Batch Fully managed batch processing at any scale
- S3 Simple Storage Service
- Lambda Serverless Computing
- SageMaker
- AWS ECR Docker Containers
- AWS RDS Relational Database Services
- AWS Training
- AWS Kinesis
- AWS Architecture Center
- AWS Datalakes and Analytics
- AWS Opensearch Service
- Amazon Mechanical Turk crowdsourcing marketplace
- Google Cloud
- Azure
- Cloudera
- Elastic Cluster aims to provide a user-friendly CLI tool to create, manage, and setup computing clusters hosted on cloud infrastructures like Amazon EC2, Google Compute Engine
- Amazon S3 Simple Storage Service
- SingleStore Modern relational database for multi-cloud, hybrid and on-premises — bringing you speed, scale and immediate insights
- Active Loop AI Database for AI
- Appen World-Class Training Data Prodiver, data sourcing/annotation, and model evaluation
- DataBricks
- Snowflake useful for datascience
- AirByte data ingestion, ELT platform that helps replicate your data in your warehouses, lakes, and databases
- Big Query BigQuery is a fully-managed, serverless data warehouse that enables scalable analysis over petabytes of data
- SQL is a standard language for storing, manipulating, and retrieving data in databases like MySQL, SQL Server, MS Access, Oracle, Sybase, Informix, Postgres, and other database systems
- QuestDB fast SQL for time series, fastest open source time series database
- MySQL
- PostgreSQL open source relational database
- Oracle
- Microsoft SQL Server
- SQLite
- Sybase
- Drizzle
- Firebird
- JSON JavaScript Object Notation is a lightweight data-interchange format
- Apache Hive data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL
- PostgreSQL the worlds most advanced OS relational database
- R for DataScience data manipulation
- MongoDB used for storing product info and details by finance and ecommerce companies
- Starburst Data distributed SQL query engine
- Google FireStore
- Dataiku enterprise AI apps platform
- Zilliz OSS for processing unstructured datasets
- Cloud Factory Human Powered Data Processing for AI and Automation
- DataRobot enterprise AI apps platform
- Relyance AI manage privacy, data governance, and compliance operations seamlessly, al on a single, intuitive platform.
- Orchest Build data pipelines. No frameworks. No YAML. Just write your data processing code directly in Python, R, Julia, or Bash
- Apache Spark
- Dagster
- GraphQL query language for APIs and a runtime for fulfilling those queries with your existing data
- DataFlow is a managed service for executing a wide variety of data processing patterns
- Kumo AI Unleash the predictive power of enterprise data.
- Beam Apache unified model for defining both batch and streaming data-parallel processing pipelines
- Redis is an OS in-memory data structure store used as a database, cache, message broker, and streaming engine
- Tableau data visualization for business intelligence
- Kafka it stores data as it streams, streaming small packets of data from A to B at high frequency and volume
- Materialize stream processing
- Apache Flink Filtering and transforming streaming data in-flight via stream processing tool like Flink
- Oryx for working with real-time data
- Refuel AI Data Engine
- Pair Visualizations for ML Datasets
- Altair declarative statistical visualization library in Python
- Matplotlib
- dbt
- Pandas
- Dtale visualizer for pandas data structure
- RAPIDS
- Elastic Search
- Pandera A data validation library for scientists, engineers, and analysts seeking correctness
- Data Version Control
- Pachydern
- Git-lfs Git Large File Storage (LFS) replaces large files such as audio samples, videos, datasets, and graphics with text pointers inside Git, while storing the file contents on a remote server like GitHub.com or GitHub Enterprise.
- Label Studio
- Label Box
- Snorkel AI data pipelines and labelling with Snorkel Flow
- PlainSight AI
- Prodigy annotation tool powered by active learning
- Tasq.ai data annotation
- Tonic AI Fake Data
- Data Augmentation for NLP
- Gretel AI the developer stack for Synthetic Data
- Defined AI training data generation
- DataHub Meta data platform for the modern data stack
- Decile Data Efficient Learning with Less Data
- Synthesis AI Synthetic Data Generation
- Parallel Domain Accelerating autonomy with synthetic data, the new data pipeline for computer vision
- Hugging Face
- Fast ai
- Keras
- Scikit-Learn
- MXNet
- Theano
- XGBoost
- Catboost
- LightGBM
- JAX
- TensorFlow
- PyTorch
- PyTorch Lightning
- Hydra provides an amazing framework to intuitively structure your configurations and gives you the freedom to build arbitrary experiments with minimal redundancy
- Cohere NLP Tools
- Explosion.ai developer tools for NLP and AI
- RAY for Distributed Compute
- AnyScale
- Grid AI
- Flyte is a Kubernetes-native workflow automation platform to unify data and ML processes
- Union AI
- Kubernetes OS container orchestration engine for automating deployment, scaling and management of containerized applications
- AutoScaler: autoscaling components for Kubernetes
- KServe is a standard Model Inference Platform on Kubernetes, built for highly scalable use cases
- KubeFlow is dedicated to making deployments of ML workflows on Kubernetes simple, portable, and scalable
- Dask flexible library for parallel computing in Python, provides advanced parallelism for analytics, enabling performance at scale
- OpenGL the industry standard for High Performance Graphics
- Hadoop
- Joblib.Parallel
- rntop runtop, a top-like tool for monitoring GPUs across a cluster
- Continual AI
- Slurm highly scalable cluster managment and job scheduling system for large and small Linux clusters
- Determined AI distributed training, hyperparameter tuning, experiment tracking, resource management
- Docker containerize the software to be able to run it anywhere on the cloud or peoples computer
- MetaFlow Metaflow provides a unified API to the infrastructure stack that is required to execute data science projects, from prototype to production
- Prefect workflow orchestration
- Weights & Biases
- TensorBoard
- Neptune AI
- MlFlow OS platform to manage the ML lifecycle including experimentation, reproducibility, deployment, and central model registry
- Airflow is a platform to porgammatically author, schedule, and monitor workflows
- Comet
- XManager platform for managing ml experiments
- AimStack ML Experiment Tracker
- ZenML is an extensible OS MLOps framework to create reproducible pipelines
- PyEnv is a virtual env manager that allows you to easily install and manage different Python versions in different environments
- Python Threading
- Python Poetry Python Packaging and Dependency Management Made Easy, Environment Manager
- Numba for CUDA GPUs
- DeepNote
- River ML Python Package for Online/Streaming Machine Learning
- Spock Framework for parameters configurations managemet in Python
- DeepChecks build test suites for ML Models and Data with DeepChecks
- Skip a programming language to skip the things you have already computed
- Conda + Pip Tools Python Env using Conda and Pip-tools
- Python Infery is a Python runtime engine that lets you quickly run inference locally with only 3 simple commands
- SHAP model interpretability
- Cython
- PyTest
- Python UnitTest unit testing framework
- Python UnSilence to silence video's
- NbDev
- GStreamer a Python framework for streaming media
- Graphiz open source graph visualization software.
- Undo like gdb & records
- Valgrind framework for building dynamic analysis tools, memory management
- Gitpod web-based IDE for entreprise software
- Replit
- SymPy Python library for symbolic mathematics. It aims to become a full-featured computer algebra system (CAS).
- SciPy
- NumPy
- Pillow
- Seaborn statistical data visualization
- Plotly Dash Enterprise
- OpenCV
- Scikit-Image
- Gym
- PyCaret OS low-code ml library in Python that automates ml workflows
- Scrapy framework for extracting the data you need from websites
- Beautiful Soup is a Python library for pulling data out of HTML and XML files
- SNAP for Python is a general purpose, high performance systems for analysis and manipulation of large networks
- Jupyter
- Python Regular Expressions
- Lineapy LineaPy creates a frictionless path for taking your data science artifact from development to production.
- NLTK
- Cerebras
- GraphCore AI chips for ml
- UntetherAI inference chip
- Groq
- Tenstorrent
- Habana Labs
- DeepLite AI
- H2O.ai solves complex business problems and accelerates the discovery of new ideas with results you can understand and trust
- Cortex Cloud Infrastructure for machine learning at Scale
- Lambda Labs gpu clustering
- Deci AI
- Plaid ML OS tensor compiler from Intel
- CoreWeave is a modern cloud infra empowering creators with the GPU resources they need to work more efficiently
- Nvidia
- Hailo chips for machine learning, top performing AI processor for edge devices
- Neural Network Intelligence OS for hyperparameter optimization, neural architecture search, model compression, and feature engineering
- Auto-Sklearn is an automated ml toolkit and a drop-in replacement for scikit-learn estimator
- Tpot Python AutoML tool that optimizes ml pipelines using genetic programming
- PocketFlow An Automatic Model Compression (AutoMC) framework for developing smaller and faster AI applications
- Masterful AI The Training Platform for Automated Computer Vision
- Rasa
- Tecton AI Real-time or online machine learning models that continuously adapt and learn from data and generate predictions on the fly
- Feast
- Neural Magic helps developers in accelerating deep learning performance using automated model sparsification technologies and a CPU inference engine.
- Mosaicml Making ML Training Efficient
- Gantry continous ML monitoring solution
- Fiddler AI
- Data Dog Cloud Monitoring as a Service
- Evidently AI
- Prometheus OS monitoring and alerting toolkit time series database
- Honey Comb observability for Distributed Services
- LabML.ai monitor deep learning model training and hardware usage from your mobile phone
- Aporia tool for data-logging, monitoring, integration, and deployment
- Anodot enterprise data monitoring
- Arthur AI AI monitoring platform to manage ML
- Superwise AI streamline your model observability SaaS platform
- WhyLogs
- Aporia Monitor, Explain, & Improve your ML Models
- ONNX Onnnx runtime web, package that let's us run the ml model using javascript
- TensorRT Is a C++ library for high performance inference on Nvidia GPUs and deep learning accelerators
- TensorFlow Lite
- ML Kit for Mobile Developers - Google
- Seldon deploy ml models at scale with more accuracy and faster
- Gradio is the fastest way to demo your ml with a friendly web interface
- Lucid Lucid is the Visual Collaboration Suite for Entreprise and Hybrid Teams of all Sizes
- Jenkins
- Github Actions
- CircleCI
- Ansible Red Hat
- Travis CI
- TerraForm Provision, change, and version resources on any environment
- Dagger.io A Portable DevKit for CI/CD Pipielines
- Argo WorkFlows is an OS container-native workflow enginer for orchestrating parallel jobs on Kubernetes
- Torch Serve
- TensorFlow Extended TFX
- BentoML is a flexible, high-performance framework for serving, managing, and deploying ml models
- OctoML
- Verta AI open source ML model versioning, meta-data, and experiment management
- OpenVINO OS toolkit for optimizing and deploying AI inference high-level Runtine C++ and Python APIs
- Triton Inference Server provides an optimized cloud and edge inferencing solution
- Cortex for serving predictions coming from your models
- Replicate run ML models in the cloud from your own code
- Streamlit turns data scripts into shareable web apps in minutes, all in-python
- Dash from Plotly converts python scripts to productive-grade apps for your business
- Flask Tool to serve your model on the web, to use weights and run inference
- FastAPI is a modern, fast (high-performance), web-framework for building APIs with Python 3.6+ based on standard Python type hints
- REST API
- GraphQL POST/graphql
- CerebiumAI deploy in 1 line of code
- Banana ship ML to prod instantly
- PipelineAI serverless GPUs inference for ML models
- Cloudflare web security
- Bing Search API web search api
- Python Django high-level Python web framework that enables rapid development of secure and maintainable websites
- Ruby on Rails RAILS for Ruby
- ExpressJS for nodeJS
- Java Spring for Java