Skip to content

Instantly share code, notes, and snippets.

@balvisio
Last active January 29, 2024 17:30
Show Gist options
  • Save balvisio/2cd02a4f403872893d0f3fd8de1f6fba to your computer and use it in GitHub Desktop.
Save balvisio/2cd02a4f403872893d0f3fd8de1f6fba to your computer and use it in GitHub Desktop.
Instructions to create a MLflow server in AWS

Types of MLflow Server Setups

There are multiple variants with respect to how the artifacts are handled: (Reference: https://mlflow.org/docs/latest/tracking.html#how-runs-and-artifacts-are-recorded)

a. In this scenario, for artifact logging, the MLflow client interacts with the remote Tracking Server and artifact storage host:

  • The MLflow client uses RestStore to send a REST request to fetch the artifact store URI location from the Tracking Server
  • The Tracking Server responds with an artifact store URI location (an S3 storage URI in this case)
  • The MLflow client creates an instance of an S3ArtifactRepository, connects to the remote AWS host using the boto client libraries, and uploads the artifacts to the S3 bucket URI location

Direct Access to Artifact Store

# Starting the MLflow server, don't forget to change the fields in caps
# If you are unfamilier with nohop, read up on it here: https://man.openbsd.org/nohup.1
# I run into issues when password had special characters
nohup mlflow server --backend-store-uri postgresql://postgres:YOURPASSWORD@YOUR-DATABASE-ENDPOINT:5432/mlflow --default-artifact-root s3://YOURORGANISATION.MLFLOW.BUCKET --host 0.0.0.0 &

b. In this scenario (chosen), the MLflow server acts as a proxy for the artifacts. MLflow’s Tracking Server supports utilizing the host as a proxy server for operations involving artifacts. Once configured with the appropriate access requirements, an administrator can start the tracking server to enable assumed-role operations involving the saving, loading, or listing of model artifacts, images, documents, and files. This eliminates the need to allow end users to have direct path access to a remote object store (e.g., s3, adls, gcs, hdfs) for artifact handling and eliminates the need for an end-user to provide access credentials to interact with an underlying object store.

  • Logging events for artifacts are made by the client using the HttpArtifactRepository to write files to MLflow Tracking Server
  • The Tracking Server then writes these files to the configured object store location with assumed role authentication
  • Retrieving artifacts from the configured backend store for a user request is done with the same authorized authentication that was configured at server start
  • Artifacts are passed to the end user through the Tracking Server through the interface of the HttpArtifactRepository

Proxy Access to Artifact Store

Instructions

  1. Create a new EC2 Service Role and add the relevant S3 permissions to that role. This step needs to be done one time only per AWS account. (https://knowledgeacademy.io/how-to-access-aws-s3-buckets-from-ec2-instances/)

    • In the IAM Console, go to "Roles" and click "Create Role"
    • Choose "AWS Service" and "EC2":

    Screen Shot 2022-06-16 at 3 23 18 PM

    • Add the "AmazonS3FullAccess" permission:

    Screen Shot 2022-06-16 at 3 25 15 PM

    • Give it a descriptive name and click "Create Role".
  2. Create a new EC2 instance (Amazon Linux 2 AMI, free-tier t2.micro):

    • Have/generate a SSH key-pair to access the instance.
    • In the security group settings allow access via SSH and port 80.
    • Click on the "Advanced details" tab and add in "IAM Instance Profile" the IAM Role you created in step 1.
  3. Create a new free-tier PostgreSQL RDS. Call it ‘mlflow-rds’ or a similar name, and type in your desired password.

  4. Add a security group rule to RDS that allows inbound connections from a security group that the EC2 instance belongs to:

    • Go to your EC2 instance, click on the "Security" tab and take a note of the security group the instance belongs to under the "Security groups" section.

    Screen Shot 2022-06-16 at 1 29 02 PM

    • Go to your RDS instance and under "Connectivity & Security" click on the "VPC Security group"

    Screen Shot 2022-06-16 at 1 32 25 PM

    • Then click on the "Edit Inbound rules" and "Add Rules" button. For the security rule add "Type: All Traffic", "Source: Custom" and then choose the name of the security group of the instance noted in the previous step.

    Screen Shot 2022-06-16 at 1 37 47 PM

  5. Create a new S3 bucket; this will be the storage bucket for our MLFlow server. You can pick a name such as <yourorganisation>.mlflow.data, <project>.<yourorganisation>.mlflow.data or similar. You can test the accessibility of the instance to the S3 bucket by running from the instance: aws s3 ls s3://<bucket.name>

  6. Once all your resources have been created, SSH into the EC2 and run the following commands to install the required packages:

# Updating all packages
sudo yum update

sudo yum install httpd-tools postgresql

sudo amazon-linux-extras install nginx1

# Installing MLFlow and the AWS Python SDK
sudo pip3 install mlflow[extras] psycopg2-binary boto3 --use-feature=2020-resolver
  1. Create a user and password for authenticating:
sudo htpasswd -c /etc/nginx/.htpasswd <username>
  1. Configure nginx to reverse proxy to port 5000
sudo vi /etc/nginx/nginx.conf

Add the following to the config file as follows

location / {
proxy_pass http://localhost:5000/;
auth_basic "Restricted Content";
auth_basic_user_file /etc/nginx/.htpasswd;
}
  1. Enable and start nginx:
sudo systemctl enable nginx
sudo systemctl start nginx
  1. Create a database for mlflow named 'mlflow':
psql --username=<username> --host mlflow-rds.deadc0de.us-east-1.rds.amazonaws.com

CREATE DATABASE mlflow;
\q

Other useful psql commands:

  • List databases: \l
  • Quit: \q

Note:

  • By default, the Postgres server has 4 databases defined: template0, template1, rdsadmin and postgres
  1. Create a systemd service for the MLflow server:
  • Create a directory for the logs such as:
sudo mkdir -p /var/log/mlflow
  • Create a file /etc/systemd/system/mlflow-tracking.service:
[Unit]
Description=MLflow Tracking Server
After=network.target

[Service]
Restart=on-failure
RestartSec=30
StandardOutput=file:/<path/to/your/logging/folder>/stdout.log
StandardError=file:/<path/to/your/logging/folder>/stderr.log
User=root
ExecStart=/usr/local/bin/mlflow server --backend-store-uri postgresql://<username>:<password>@<database-endpoint>:5432/mlflow --artifacts-destination s3://<yourorganization.mlflow.bucket> --serve-artifacts --host 0.0.0.0

[Install]
WantedBy=multi-user.target
  • Enable and start the MLFlow server service:
sudo systemctl daemon-reload
sudo systemctl enable mlflow-tracking
sudo systemctl start mlflow-tracking
  • Check that everything worked as expected with the following command:
sudo systemctl status mlflow-tracking
  1. Connect to your newly created MLFlow UI by accessing the EC2 public IP or DNS name in your browser (http).

  2. For easier CLI usage, set your local environment variable of MLFLOW_TRACKING_URI to http://EC2-ENDPOINT-URL.

  3. To test the connection inside the conda environment: (environment should have 'pip install mlflow boto3')

export MLFLOW_TRACKING_USERNAME=<user> 
export MLFLOW_TRACKING_PASSWORD=<password>
export AWS_ACCESS_KEY_ID=<KEY_ID> # Only necessary if connecting directly to S3 store
export AWS_SECRET_ACCESS_KEY=<ACCESS_KEY> # Only necessary if connecting directly to S3 store

python
import mlflow
remote_server_uri = "<mlflow_server_ip_or_dns_name>:<port>" # set to your server URI
mlflow.set_tracking_uri(remote_server_uri)
mlflow.set_experiment("/my-experiment")
with mlflow.start_run():
    mlflow.log_param("a", 1)
    mlflow.log_metric("b", 2)
    mlflow.log_artifact(<PATH/TO/FILE>)

References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment