Skip to content

Instantly share code, notes, and snippets.

@yaythomas
Last active December 29, 2020 16:53
Show Gist options
  • Save yaythomas/aa359717ac5fce1287f979beedf8f501 to your computer and use it in GitHub Desktop.
Save yaythomas/aa359717ac5fce1287f979beedf8f501 to your computer and use it in GitHub Desktop.
docker run pipeline with dynamic parameters for ML

in the 2nd step, just for the sake of example and easy testing, I put "echo" in front of each command.

To run for real, delete the echo preceding every line from ln 27-30.

This pypyr pipeline pretty much does the same thing as https://github.com/huggingface/transformers/blob/master/valohai.yaml

I'm not exactly sure where the /valohai directory comes from - I don't see it in the huggingface repo root or in the pytorch container, so maybe there is a script/generator somewhere that creates this directory. As it is the command will fail because that directory isn't available.

The parameters input in the first step (ln. 9) is pretty much a straight copy-paste from valohai.yaml - this just being an example, I didn't bother tweaking it too much beyond just showing you the idea. Strictly speaking you don't need the name key, the pipeline only uses the pass-as and default keys to map the argument name to a value. I just picked 3 arbitrary parameters - you can of course include the entire list under parameters from valohai.yaml. (just note the pass-as is only --argname, the ={v} part is unnecessary for pypyr.)

pypyr will

  1. set some default values for you like docker_image and task_name - you can override these from the cmdline
  2. parse the parameters list collection into a flat string list of args
  3. run the docker container, mount the current host directory into the container workspace, and pass the command and flattened list of args to it

To run this, put the attached docker-run-example.yaml pipeline into your repo working directory and from that directory:

# run with defaults
$ pypyr docker-run-example

# override default task_name
$ pypyr docker-run-example task_name=arb-task-name

# override default task_name and model_name_or_path
$ pypyr docker-run-example task_name=arb-task-name model_name=arb-name
context_parser: pypyr.parser.keyvaluepairs
steps:
- name: pypyr.steps.default
in:
defaults:
docker_image: pytorch/pytorch:nightly-devel-cuda10.0-cudnn7
task_name: MRPC
model_name: bert-base-uncased
parameters:
- name: model_type
pass-as: --model_type
default: bert
- name: model_name_or_path
pass-as: --model_name_or_path
default: '{model_name}'
- name: save_steps
pass-as: --save_steps
description: Save checkpoint every X updates steps.
default: -1
- name: pypyr.steps.contextsetf
comment: prepare the command to pass to docker
in:
contextSetf:
parameters_string: !py |-
' '.join(f"{p['pass-as']} {p['default']}" for p in parameters)
command: >-
echo python /valohai/repository/utils/download_glue_data.py --data_dir=/glue_data;
echo pip install -e .;
echo pip install -r examples/requirements.txt;
echo python examples/text-classification/run_glue.py --do_train --data_dir=/glue_data/{task_name} {parameters_string};
- name: pypyr.steps.env
comment: want to mount current working dir into docker workspace
in:
env:
get:
current_dir: PWD
- name: pypyr.steps.cmd
comment: run the docker image, passing the prepared command to it
in:
cmd: docker run -v {current_dir}:/workspace {docker_image} sh -c '{command}'
- name: pypyr.steps.echo
in:
echoMe: done!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment