Skip to content

Instantly share code, notes, and snippets.

@cdepillabout
Created April 22, 2020 02:02
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save cdepillabout/f3d3ab2aca115e2b18989ebe8c47fecb to your computer and use it in GitHub Desktop.
Save cdepillabout/f3d3ab2aca115e2b18989ebe8c47fecb to your computer and use it in GitHub Desktop.
shell.nix to enable training ML models remotely with the GCP AI Platform

shell.nix for Training in GCP AI Platform

This gist contains a shell.nix file that can be used to create a Python environment for running training jobs in the GCP AI Platform.

This is specifically for the following tutorial:

https://cloud.google.com/ai-platform/docs/getting-started-keras

This uses code from https://github.com/GoogleCloudPlatform/cloudml-samples in the census/tf-keras directory.

However, this shell.nix file should be able to be modified to work for almost training job.

How To Use

First run nix-shell to get into the shell.

Then, install all required python Packages:

$ pip install -r requirements.txt

If you add additional Python packages to the buildInputs line in the shell.nix file, you should be able to use these system-level packages, instead of having to download them.

Now you should be able to actually run training:

$ python3 -m trainer.task --job-dir local-training-output

The tutorial linked above recommends using the following command, however, this fails when using gcloud from Nixpkgs :-\

$ gcloud ai-platform local train --package-path trainer --module-name trainer.task --job-dir local-training-output

As long as training directly with python works, you can launch a training job on the GCP AI Platform.

First, you need to login with gcloud using OAuth:

$ gcloud auth login

Set your default project name so you don't have to specify it in each command below:

$ gcloud config set project inner-melody-274800

Next, you need to create a bucket to store the trained models:

$ BUCKET_NAME="my-training-example-task-3"
$ REGION="us-central1"
$ gsutil mb -l $REGION gs://$BUCKET_NAME

Finally, actually launch the training job:

$ gcloud ai-platform jobs submit training "my_first_keras_job" --package-path trainer/ --module-name trainer.task --region $REGION --python-version 3.7 --runtime-version 1.15 --job-dir "gs://$BUCKET_NAME/keras-job-dir" --stream-logs
let
nixpkgs-src = builtins.fetchTarball {
# Recent version of nixpkgs master as of 2020-03-30.
url = "https://github.com/NixOS/nixpkgs/archive/570e3edc8519c666b74a5ca469e1dd286902691d.tar.gz";
sha256 = "sha256:0aw6rw4r13jij8hn27z2pbilvwzcpvaicc59agqznmr2bd2742xl";
};
nixpkgs = import nixpkgs-src { config = { allowUnfree = true; }; };
in
with nixpkgs;
let
lib-path = lib.makeLibraryPath [
libffi
openssl
stdenv.cc.cc
];
in
mkShell {
name = "pip-env";
buildInputs = with python37Packages; [
# System requirements.
ipython
pandas
readline
virtualenvwrapper
# needed for buildilng certain packages
google-cloud-sdk
libffi
openssl
stdenv.cc.cc
];
src = null;
shellHook = ''
# Allow the use of wheels.
SOURCE_DATE_EPOCH=$(date +%s)
# Augment the dynamic linker path
export "LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${lib-path}"
VENV=venv
if test ! -d $VENV; then
virtualenv $VENV
fi
source ./$VENV/bin/activate
export PYTHONPATH=`pwd`/$VENV/${python.sitePackages}/:$PYTHONPATH
'';
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment