This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Evaluation of ML systems by closing the feedback loop | |
In this tutorial, we will practice selected techniques for evaluating machine learning systems, and then monitoring them in production. | |
The lifecycle of a model may look something like this: | |
Training: Initially, a model is trained on some training data | |
Testing (offline): If training completes successfully, the model progresses to a testing - offline evaluation - stage. In this stage, it is evaluated using a held-out evaluation set not used in training, and potentially other special evaluation sets (as we’ll see in this tutorial). | |
Staging: Given satisfactory performance on the offline evaluation, the model may be packaged as part of a service, and then this package promoted to a staging environment that mimics the “production” service but without live users. In this staging environmenmt, we can perform integration tests against the service and also load tests to evaluate the inference performance of the system. | |
Canary (or blue/green, or other “preliminary |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Model optimizations for serving | |
In this tutorial, we explore some model-level optimizations for model serving: | |
graph optimizations | |
quantization | |
and hardware-specific execution providers, which switch out generic implementations of operations in the graph for hardware-specific optimized implementations | |
and we will see how these affect the throughput and inference time of a model. | |
To run this experiment, you should have already created an account on Chameleon, and become part of a project. You must also have added your SSH key to the CHI@UC and CHI@TACC sites. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
these are the lab parts required- | |
#LAB 5 | |
Model optimizations for serving | |
In this tutorial, we explore some model-level optimizations for model serving: | |
graph optimizations | |
quantization | |
and hardware-specific execution providers, which switch out generic implementations of operations in the graph for hardware-specific optimized implementations | |
and we will see how these affect the throughput and inference time of a model. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
GO THROUGH THE THREE FILES BELOW, ##A, ##B AND ##C. | |
##A- | |
these are the lab parts - | |
#LAB 1 | |
Hello, Chameleon | |
In this tutorial, you will learn how to use Chameleon to run experiments in computer networks or cloud computing. It should take you about 60-90 minutes of active time to work through this tutorial. | |
Note This process has a “human in the loop” approval stage - you’ll need to wait for your instructor or research advisor to approve your request to join their project. Be prepared to start the tutorial, wait for this approval, and then continue. |