There are multiple variants with respect to how the artifacts are handled: (Reference: https://mlflow.org/docs/latest/tracking.html#how-runs-and-artifacts-are-recorded)
a. In this scenario, for artifact logging, the MLflow client interacts with the remote Tracking Server and artifact storage host:
- The MLflow client uses RestStore to send a REST request to fetch the artifact store URI location from the Tracking Server
- The Tracking Server responds with an artifact store URI location (an S3 storage URI in this case)
- The MLflow client creates an instance of an S3ArtifactRepository, connects to the remote AWS host using the boto client libraries, and uploads the artifacts to the S3 bucket URI location
# Starting the MLflow server, don't forget to change the fields in caps
# If you are unfamilier with nohop, read up on it here: https://man.openbsd.org/nohup.1
# I run into issues when password had special characters
nohup mlflow server --backend-store-uri postgresql://postgres:YOURPASSWORD@YOUR-DATABASE-ENDPOINT:5432/mlflow --default-artifact-root s3://YOURORGANISATION.MLFLOW.BUCKET --host 0.0.0.0 &
b. In this scenario (chosen), the MLflow server acts as a proxy for the artifacts. MLflow’s Tracking Server supports utilizing the host as a proxy server for operations involving artifacts. Once configured with the appropriate access requirements, an administrator can start the tracking server to enable assumed-role operations involving the saving, loading, or listing of model artifacts, images, documents, and files. This eliminates the need to allow end users to have direct path access to a remote object store (e.g., s3, adls, gcs, hdfs) for artifact handling and eliminates the need for an end-user to provide access credentials to interact with an underlying object store.
- Logging events for artifacts are made by the client using the HttpArtifactRepository to write files to MLflow Tracking Server
- The Tracking Server then writes these files to the configured object store location with assumed role authentication
- Retrieving artifacts from the configured backend store for a user request is done with the same authorized authentication that was configured at server start
- Artifacts are passed to the end user through the Tracking Server through the interface of the HttpArtifactRepository
-
Create a new EC2 Service Role and add the relevant S3 permissions to that role. This step needs to be done one time only per AWS account. (https://knowledgeacademy.io/how-to-access-aws-s3-buckets-from-ec2-instances/)
- In the IAM Console, go to "Roles" and click "Create Role"
- Choose "AWS Service" and "EC2":
- Add the "AmazonS3FullAccess" permission:
- Give it a descriptive name and click "Create Role".
-
Create a new EC2 instance (Amazon Linux 2 AMI, free-tier t2.micro):
- Have/generate a SSH key-pair to access the instance.
- In the security group settings allow access via SSH and port 80.
- Click on the "Advanced details" tab and add in "IAM Instance Profile" the IAM Role you created in step 1.
-
Create a new free-tier PostgreSQL RDS. Call it ‘mlflow-rds’ or a similar name, and type in your desired password.
-
Add a security group rule to RDS that allows inbound connections from a security group that the EC2 instance belongs to:
- Go to your EC2 instance, click on the "Security" tab and take a note of the security group the instance belongs to under the "Security groups" section.
- Go to your RDS instance and under "Connectivity & Security" click on the "VPC Security group"
- Then click on the "Edit Inbound rules" and "Add Rules" button. For the security rule add "Type: All Traffic", "Source: Custom" and then choose the name of the security group of the instance noted in the previous step.
-
Create a new S3 bucket; this will be the storage bucket for our MLFlow server. You can pick a name such as
<yourorganisation>.mlflow.data
,<project>.<yourorganisation>.mlflow.data
or similar. You can test the accessibility of the instance to the S3 bucket by running from the instance:aws s3 ls s3://<bucket.name>
-
Once all your resources have been created, SSH into the EC2 and run the following commands to install the required packages:
# Updating all packages
sudo yum update
sudo yum install httpd-tools postgresql
sudo amazon-linux-extras install nginx1
# Installing MLFlow and the AWS Python SDK
sudo pip3 install mlflow[extras] psycopg2-binary boto3 --use-feature=2020-resolver
- Create a user and password for authenticating:
sudo htpasswd -c /etc/nginx/.htpasswd <username>
- Configure nginx to reverse proxy to port 5000
sudo vi /etc/nginx/nginx.conf
Add the following to the config file as follows
location / {
proxy_pass http://localhost:5000/;
auth_basic "Restricted Content";
auth_basic_user_file /etc/nginx/.htpasswd;
}
- Enable and start nginx:
sudo systemctl enable nginx
sudo systemctl start nginx
- Create a database for mlflow named 'mlflow':
psql --username=<username> --host mlflow-rds.deadc0de.us-east-1.rds.amazonaws.com
CREATE DATABASE mlflow;
\q
Other useful psql
commands:
- List databases:
\l
- Quit:
\q
Note:
- By default, the Postgres server has 4 databases defined:
template0
,template1
,rdsadmin
andpostgres
- Create a
systemd
service for the MLflow server:
- Create a directory for the logs such as:
sudo mkdir -p /var/log/mlflow
- Create a file
/etc/systemd/system/mlflow-tracking.service
:
[Unit]
Description=MLflow Tracking Server
After=network.target
[Service]
Restart=on-failure
RestartSec=30
StandardOutput=file:/<path/to/your/logging/folder>/stdout.log
StandardError=file:/<path/to/your/logging/folder>/stderr.log
User=root
ExecStart=/usr/local/bin/mlflow server --backend-store-uri postgresql://<username>:<password>@<database-endpoint>:5432/mlflow --artifacts-destination s3://<yourorganization.mlflow.bucket> --serve-artifacts --host 0.0.0.0
[Install]
WantedBy=multi-user.target
- Enable and start the MLFlow server service:
sudo systemctl daemon-reload
sudo systemctl enable mlflow-tracking
sudo systemctl start mlflow-tracking
- Check that everything worked as expected with the following command:
sudo systemctl status mlflow-tracking
-
Connect to your newly created MLFlow UI by accessing the EC2 public IP or DNS name in your browser (http).
-
For easier CLI usage, set your local environment variable of MLFLOW_TRACKING_URI to http://EC2-ENDPOINT-URL.
-
To test the connection inside the conda environment: (environment should have 'pip install mlflow boto3')
export MLFLOW_TRACKING_USERNAME=<user>
export MLFLOW_TRACKING_PASSWORD=<password>
export AWS_ACCESS_KEY_ID=<KEY_ID> # Only necessary if connecting directly to S3 store
export AWS_SECRET_ACCESS_KEY=<ACCESS_KEY> # Only necessary if connecting directly to S3 store
python
import mlflow
remote_server_uri = "<mlflow_server_ip_or_dns_name>:<port>" # set to your server URI
mlflow.set_tracking_uri(remote_server_uri)
mlflow.set_experiment("/my-experiment")
with mlflow.start_run():
mlflow.log_param("a", 1)
mlflow.log_metric("b", 2)
mlflow.log_artifact(<PATH/TO/FILE>)
References