Skip to content

Instantly share code, notes, and snippets.

@dev-aryaman
dev-aryaman / fb_eval.txt
Created April 28, 2025 00:09
Evaluation and Monitoring
Evaluation of ML systems by closing the feedback loop
In this tutorial, we will practice selected techniques for evaluating machine learning systems, and then monitoring them in production.
The lifecycle of a model may look something like this:
Training: Initially, a model is trained on some training data
Testing (offline): If training completes successfully, the model progresses to a testing - offline evaluation - stage. In this stage, it is evaluated using a held-out evaluation set not used in training, and potentially other special evaluation sets (as we’ll see in this tutorial).
Staging: Given satisfactory performance on the offline evaluation, the model may be packaged as part of a service, and then this package promoted to a staging environment that mimics the “production” service but without live users. In this staging environmenmt, we can perform integration tests against the service and also load tests to evaluate the inference performance of the system.
Canary (or blue/green, or other “preliminary
@dev-aryaman
dev-aryaman / Model_Optimization.txt
Last active April 28, 2025 00:23
Model serving
Model optimizations for serving
In this tutorial, we explore some model-level optimizations for model serving:
graph optimizations
quantization
and hardware-specific execution providers, which switch out generic implementations of operations in the graph for hardware-specific optimized implementations
and we will see how these affect the throughput and inference time of a model.
To run this experiment, you should have already created an account on Chameleon, and become part of a project. You must also have added your SSH key to the CHI@UC and CHI@TACC sites.
@dev-aryaman
dev-aryaman / mlops_labs.txt
Last active April 27, 2025 23:13
project requirements
these are the lab parts required-
#LAB 5
Model optimizations for serving
In this tutorial, we explore some model-level optimizations for model serving:
graph optimizations
quantization
and hardware-specific execution providers, which switch out generic implementations of operations in the graph for hardware-specific optimized implementations
and we will see how these affect the throughput and inference time of a model.
GO THROUGH THE THREE FILES BELOW, ##A, ##B AND ##C.
##A-
these are the lab parts -
#LAB 1
Hello, Chameleon
In this tutorial, you will learn how to use Chameleon to run experiments in computer networks or cloud computing. It should take you about 60-90 minutes of active time to work through this tutorial.
Note This process has a “human in the loop” approval stage - you’ll need to wait for your instructor or research advisor to approve your request to join their project. Be prepared to start the tutorial, wait for this approval, and then continue.