tsaoyu/slurm_mlflow.md

## slurm_mlflow.md

      
    Raw
  

              slurm_mlflow.md
            
          
    In your job .sh file:
#!/bin/bash

set -e

python sample_experiment.py 

#optional remove .out file to keep folder clean
#rm slurm-$SLURM_JOB_ID.out


and in your sample_experiment.py file:
import os
from mlflow import log_metric, log_param, log_artifact

if __name__ == "__main__":
    # Log a parameter (key-value pair)
    log_param("param1", 5)

    # Log a metric; metrics can be updated throughout the run
    log_metric("foo", 1)
    log_metric("foo", 2)
    log_metric("foo", 3)

    # Log an artifact (output file)
    log_artifact("slurm-" + os.environ['SLURM_JOB_ID'] + ".out")

Check mlflow ui, the slurm output file will be linked as an artifact in the UI and left the working folder clean.