balvisio/mlflow_deployment_instructions.md

## mlflow_deployment_instructions.md

      
    Raw
  

              mlflow_deployment_instructions.md
            
          
    Types of MLflow Server Setups

There are multiple variants with respect to how the artifacts are handled: (Reference: https://mlflow.org/docs/latest/tracking.html#how-runs-and-artifacts-are-recorded)
a. In this scenario, for artifact logging, the MLflow client interacts with the remote Tracking Server and artifact storage host:

The MLflow client uses RestStore to send a REST request to fetch the artifact store URI location from the Tracking Server
The Tracking Server responds with an artifact store URI location (an S3 storage URI in this case)
The MLflow client creates an instance of an S3ArtifactRepository, connects to the remote AWS host using the boto client libraries, and uploads the artifacts to the S3 bucket URI location


# Starting the MLflow server, don't forget to change the fields in caps
# If you are unfamilier with nohop, read up on it here: https://man.openbsd.org/nohup.1
# I run into issues when password had special characters
nohup mlflow server --backend-store-uri postgresql://postgres:YOURPASSWORD@YOUR-DATABASE-ENDPOINT:5432/mlflow --default-artifact-root s3://YOURORGANISATION.MLFLOW.BUCKET --host 0.0.0.0 &

b. In this scenario (chosen), the MLflow server acts as a proxy for the artifacts. MLflow’s Tracking Server supports utilizing the host as a proxy server for operations involving artifacts. Once configured with the appropriate access requirements, an administrator can start the tracking server to enable assumed-role operations involving the saving, loading, or listing of model artifacts, images, documents, and files. This eliminates the need to allow end users to have direct path access to a remote object store (e.g., s3, adls, gcs, hdfs) for artifact handling and eliminates the need for an end-user to provide access credentials to interact with an underlying object store.

Logging events for artifacts are made by the client using the HttpArtifactRepository to write files to MLflow Tracking Server
The Tracking Server then writes these files to the configured object store location with assumed role authentication
Retrieving artifacts from the configured backend store for a user request is done with the same authorized authentication that was configured at server start
Artifacts are passed to the end user through the Tracking Server through the interface of the HttpArtifactRepository


Instructions


Create a new EC2 Service Role and add the relevant S3 permissions to that role. This step needs to be done one time only per AWS account. (https://knowledgeacademy.io/how-to-access-aws-s3-buckets-from-ec2-instances/)

In the IAM Console, go to "Roles" and click "Create Role"
Choose "AWS Service" and "EC2":


Add the "AmazonS3FullAccess" permission:


Give it a descriptive name and click "Create Role".


Create a new EC2 instance (Amazon Linux 2 AMI, free-tier t2.micro):

Have/generate a SSH key-pair to access the instance.
In the security group settings allow access via SSH and port 80.
Click on the "Advanced details" tab and add in "IAM Instance Profile" the IAM Role you created in step 1.


Create a new free-tier PostgreSQL RDS. Call it ‘mlflow-rds’ or a similar name, and type in your desired password.


Add a security group rule to RDS that allows inbound connections from a security group that the EC2 instance belongs to:

Go to your EC2 instance, click on the "Security" tab and take a note of the security group the instance belongs to under the "Security groups" section.


Go to your RDS instance and under "Connectivity & Security" click on the "VPC Security group"


Then click on the "Edit Inbound rules" and "Add Rules" button. For the security rule add "Type: All Traffic", "Source: Custom" and then choose the name of the security group of the instance noted in the previous step.


Create a new S3 bucket; this will be the storage bucket for our MLFlow server. You can pick a name such as  <yourorganisation>.mlflow.data, <project>.<yourorganisation>.mlflow.data or similar. You can test the accessibility of the instance to the S3 bucket by running from the instance: aws s3 ls s3://<bucket.name>


Once all your resources have been created, SSH into the EC2 and run the following commands to install the required packages:


# Updating all packages
sudo yum update

sudo yum install httpd-tools postgresql

sudo amazon-linux-extras install nginx1

# Installing MLFlow and the AWS Python SDK
sudo pip3 install mlflow[extras] psycopg2-binary boto3 --use-feature=2020-resolver


Create a user and password for authenticating:

sudo htpasswd -c /etc/nginx/.htpasswd <username>


Configure nginx to reverse proxy to port 5000

sudo vi /etc/nginx/nginx.conf

Add the following to the config file as follows
location / {
proxy_pass http://localhost:5000/;
auth_basic "Restricted Content";
auth_basic_user_file /etc/nginx/.htpasswd;
}


Enable and start nginx:

sudo systemctl enable nginx
sudo systemctl start nginx


Create a database for mlflow named 'mlflow':

psql --username=<username> --host mlflow-rds.deadc0de.us-east-1.rds.amazonaws.com

CREATE DATABASE mlflow;
\q

Other useful psql commands:

List databases: \l
Quit: \q

Note:

By default, the Postgres server has 4 databases defined: template0, template1, rdsadmin and postgres


Create a systemd service for the MLflow server:


Create a directory for the logs such as:

sudo mkdir -p /var/log/mlflow


Create a file /etc/systemd/system/mlflow-tracking.service:

[Unit]
Description=MLflow Tracking Server
After=network.target

[Service]
Restart=on-failure
RestartSec=30
StandardOutput=file:/<path/to/your/logging/folder>/stdout.log
StandardError=file:/<path/to/your/logging/folder>/stderr.log
User=root
ExecStart=/usr/local/bin/mlflow server --backend-store-uri postgresql://<username>:<password>@<database-endpoint>:5432/mlflow --artifacts-destination s3://<yourorganization.mlflow.bucket> --serve-artifacts --host 0.0.0.0

[Install]
WantedBy=multi-user.target


Enable and start the MLFlow server service:

sudo systemctl daemon-reload
sudo systemctl enable mlflow-tracking
sudo systemctl start mlflow-tracking


Check that everything worked as expected with the following command:

sudo systemctl status mlflow-tracking


Connect to your newly created MLFlow UI by accessing the EC2 public IP or DNS name in your browser (http).


For easier CLI usage, set your local environment variable of MLFLOW_TRACKING_URI to http://EC2-ENDPOINT-URL.


To test the connection inside the conda environment: (environment should have 'pip install mlflow boto3')


export MLFLOW_TRACKING_USERNAME=<user> 
export MLFLOW_TRACKING_PASSWORD=<password>
export AWS_ACCESS_KEY_ID=<KEY_ID> # Only necessary if connecting directly to S3 store
export AWS_SECRET_ACCESS_KEY=<ACCESS_KEY> # Only necessary if connecting directly to S3 store

python

import mlflow
remote_server_uri = "<mlflow_server_ip_or_dns_name>:<port>" # set to your server URI
mlflow.set_tracking_uri(remote_server_uri)
mlflow.set_experiment("/my-experiment")
with mlflow.start_run():
    mlflow.log_param("a", 1)
    mlflow.log_metric("b", 2)
    mlflow.log_artifact(<PATH/TO/FILE>)
References

https://towardsdatascience.com/setup-mlflow-in-production-d72aecde7fef
https://knowledgeacademy.io/how-to-access-aws-s3-buckets-from-ec2-instances/