Wide and Deep Models - combine Linear Model key feature memorization & DNN generalization: tf.contrib.learn.DNNLinearCombinedClassifier https://research.googleblog.com/2016/06/wide-deep-learning-better-together-with.html
problem: Predicting Income with the Census Income Dataset - Given census data about a person such as age, gender, education and occupation (the features), we will try to predict whether or not the person earns more than 50,000 dollars a year (the target label) https://archive.ics.uci.edu/ml/datasets/Census+Income
50000 records
Our GCP ML-Engine Console: https://console.cloud.google.com/mlengine/jobs?project=$GCP_PROJECT
basis: https://github.com/GoogleCloudPlatform/cloudml-samples/tree/master/census
- Construct and train a Wide & Deep TensorFlow Deep Learning Model use the high level
tf.contrib.learn.Estimator
API. - Specify a pipeline for staged evaluation: from single-worker training to distributed training without any code changes
- Leverage Google Cloud Machine Learning Engine - run training jobs & export model binaries for prediction
Census Income Data Set by the UC Irvine Machine Learning Repository. hosted on Google Cloud Storage:
- Training file is
adult.data.csv
- Evaluation file is
adult.test.csv
set up environment
export CENSUS_DATA=census_data
export TRAIN_FILE=adult.data.csv
export EVAL_FILE=adult.test.csv
mkdir $CENSUS_DATA
export TRAIN_GCS_FILE=gs://cloudml-public/census/data/$TRAIN_FILE
export EVAL_GCS_FILE=gs://cloudml-public/census/data/$EVAL_FILE
gsutil cp $TRAIN_GCS_FILE $CENSUS_DATA
gsutil cp $EVAL_GCS_FILE $CENSUS_DATA
allows running without changing global python packages on your system. Install Miniconda
- Create conda environment
conda create --name single-tf python=2.7
- Activate env
source activate single-tf
- Install Cloud SDK
- Install TensorFlow
- Install gcloud
learn_runner creates an Experiment which executes model code (Estimator and input functions) Task.main → learn_runner.run → generate_experiment_fn returns experiment_fn returns Experiment (
- model.build_estimator returns DNNLinearCombinedClassifier (model_dir,wide_columns,deep_columns,hidden_units) ,
- model.generate_input_fn x2 returns input_fn -> (features: Dict[Tensors], indices: Tensor[label indices]),
- model.serving_input_fn returns InputFnOps(features,None,feature_placeholders ) )
- parse arguments with
argparse.ArgumentParser
- add them withadd_argument
and extract a dict withparse_args
train-files
- GCS or local path to training datanum-epochs
train-batch-size
- default=40eval-batch-size
- default=40train-steps
- this or num-epochs requiredeval-files
- GCS or local path to test dataembedding-size
- #embedding dimensions for categorical columns. default=8first-layer-size
- #nodes in 1st layer of DNN. default=100num-layers
- default=4scale-factor
- How quickly layer size should decay. default=0.7job-dir
- GCS location to write checkpoints and export modelsverbose-logging
eval-delay-secs
- Experiment arg: How long to wait before running first evaluation. default=1min-eval-frequency
- Experiment arg: Minimum number of training steps between evaluations. default=10
tensorflow.contrib.learn.python.learn.learn_runner
runs theExperiment
-run(experiment_fn, output_dir, schedule)
- usestf.learn.RunConfig
to parseTF_CONFIG
environment variables set by TF
- Create an experiment function given hyperparameters - Returns: A function
(output_dir) -> Experiment
used bylearn_runner
to create an Experiment - the Experiment executes:
model.generate_input_fn
functions to gather test & train inputs- returns
Estimator
- the ctor of this takes:model.build_estimator
- constructs the model topologymodel.serving_input_fn
- specifies export strategies to control the prediction graph structure
- args:
model_dir
- used by the Classifier for checkpoints summaries and exports.embedding_size
- #dimensions used to represent categorical features when input to the DNN.hidden_units
- DNN topology
- leverage
tensorflow.contrib.layers
to ingest input datalayers.sparse_column_with_keys
- For categorical columns with known values, specifykeys
: lists of valueslayers.sparse_column_with_hash_bucket
- For categorical columns with many values, specifyhash_bucket_size
layers.real_valued_column
- continuous base columns.DEEP columnslayers.bucketized_column
- Continuous columns can be converted to categorical via bucketizationboundaries
listlayers.crossed_column
- WIDE columns - Interactions between different categorical featureslayers.embedding_column
- DEEP columns - specifydimension=embedding_size
- returns a
DNNCombinedLinearClassifier
- ctor params:model_dir
linear_feature_columns=wide_columns
dnn_feature_columns=deep_columns
dnn_hidden_units
- Builds the input subgraph for prediction - returns a
tf.contrib.learn.input_fn_utils.InputFnOps
, a named tuple consisting of:features
- dict of features to be passed to theEstimator
labels
-None
for predictionsinputs
- dict oftf.placeholder
for model input fields
- Generates an input function for training or evaluation.
- constructs a filenamequeue using
tf.train.string_input_producer
and usestf.TextLineReader
read_up_to
to read input rows bybatch_size
tf.train.shuffle_batch
- maintains a buffer for shuffling inputs between batches- Returns: A function
() -> (features, indices)
features
- a dict of Tensorsindices
- a Tensor of label indices
run same code locally and on Cloud ML Engine.
export TRAIN_STEPS=1000
export OUTPUT_DIR=census_output
rm -rf $OUTPUT_DIR
python trainer/task.py --train-files $CENSUS_DATA/$TRAIN_FILE \
--eval-files $CENSUS_DATA/$EVAL_FILE \
--job-dir $OUTPUT_DIR \
--train-steps $TRAIN_STEPS
mock running it on the cloud:
export TRAIN_STEPS=1000
export OUTPUT_DIR=census_output
rm -rf $OUTPUT_DIR
gcloud ml-engine local train --package-path trainer \
--module-name trainer.task \
-- \
--train-files $CENSUS_DATA/$TRAIN_FILE \
--eval-files $CENSUS_DATA/$EVAL_FILE \
--job-dir $OUTPUT_DIR \
--train-steps $TRAIN_STEPS
export ML_BUCKET=gs://josh-machine-learning
gsutil mb $ML_BUCKET
gcloud ml-engine init-project
export SVCACCT=cloud-ml-service@${GCP_PROJECT}-XXXXX.iam.gserviceaccount.com
gsutil acl ch -u $SVCACCT:WRITE $ML_BUCKET
--job-dir
comes before --
while training on the cloud -->
different trial runs during Hyperparameter tuning.
export GCS_JOB_DIR=gs://<my-bucket>/path/to/my/jobs/job3
export JOB_NAME=census
export TRAIN_STEPS=1000
gcloud ml-engine jobs submit training $JOB_NAME \
--runtime-version 1.0 \
--job-dir $GCS_JOB_DIR \
--module-name trainer.task \
--package-path trainer/ \
--region us-central1 \
-- \
--train-files $TRAIN_GCS_FILE \
--eval-files $EVAL_GCS_FILE \
--train-steps $TRAIN_STEPS
inspect the details about the graph.
tensorboard --logdir=$GCS_JOB_DIR
- Accuracy and Output - approx accuracy close to
80%
.
uses Distributed TensorFlow TF_CONFIG environment variable. - generated using gcloud
and parsed to create a
ClusterSpec.
specify ScaleTier for predefined tiers
Run the distributed training code locally
export TRAIN_STEPS=1000
export PS_SERVER_COUNT=2
export WORKER_COUNT=3
export TRAIN_STEPS=500
export OUTPUT_DIR=census_output
rm -rf $OUTPUT_DIR
gcloud ml-engine local train --package-path trainer \
--module-name trainer.task \
--parameter-server-count $PS_SERVER_COUNT \
--worker-count $WORKER_COUNT \
--distributed \
-- \
--train-files $CENSUS_DATA/$TRAIN_FILE \
--eval-files $CENSUS_DATA/$EVAL_FILE \
--train-steps $TRAIN_STEPS \
--job-dir $OUTPUT_DIR
Run the distributed training job
export SCALE_TIER=STANDARD_1
export GCS_JOB_DIR=gs://<my-bucket>/path/to/my/models/run3
export JOB_NAME=census
export TRAIN_STEPS=1000
gcloud ml-engine jobs submit training $JOB_NAME \
--scale-tier $SCALE_TIER \
--runtime-version 1.0 \
--job-dir $GCS_JOB_DIR \
--module-name trainer.task \
--package-path trainer/ \
--region us-central1 \
-- \
--train-files $TRAIN_GCS_FILE \
--eval-files $EVAL_GCS_FILE \
--train-steps $TRAIN_STEPS
find out the most optimal hyperparameters. (https://cloud.google.com/ml/docs/concepts/hyperparameter-tuning-overview)
specify hyperparameter tuning yaml file:
trainingInput:
hyperparameters:
goal: MAXIMIZE
hyperparameterMetricTag: accuracy
maxTrials: 4
maxParallelTrials: 2
params:
- parameterName: first-layer-size
type: INTEGER
minValue: 50
maxValue: 500
scaleType: UNIT_LINEAR_SCALE
- parameterName: num-layers
type: INTEGER
minValue: 1
maxValue: 15
scaleType: UNIT_LINEAR_SCALE
- parameterName: scale-factor
type: DOUBLE
minValue: 0.1
maxValue: 1.0
scaleType: UNIT_REVERSE_LOG_SCALE
add the --config
argument.
export HPTUNING_CONFIG=hptuning_config.yaml
export JOB_NAME=census
export TRAIN_STEPS=1000
gcloud ml-engine jobs submit training $JOB_NAME \
--scale-tier $SCALE_TIER \
--runtime-version 1.0 \
--config $HPTUNING_CONFIG \
--job-dir $GCS_JOB_DIR \
--module-name trainer.task \
--package-path trainer/ \
--region us-central1 \
-- \
--train-files $TRAIN_GCS_FILE \
--eval-files $EVAL_GCS_FILE \
--train-steps $TRAIN_STEPS
run the Tensorboard command to see results of different runs and compare accuracy / auroc numbers:
tensorboard --logdir=$GCS_JOB_DIR
Once training job has finished, use exported model to create a prediction server. first create a model:
gcloud ml-engine models create census --regions us-central1
from GCS path of exported trained model binaries :
gsutil ls -r $GCS_JOB_DIR/export
a directory named $GCS_JOB_DIR/export/Servo/<timestamp>
.
export MODEL_BINARIES=$GCS_JOB_DIR/export/Servo/<timestamp>
gcloud ml-engine versions create v1 --model census --origin $MODEL_BINARIES --runtime-version 1.0
can now send prediction requests to the API.
gcloud ml-engine predict --model census --version v1 --json-instances ../test.json
see a response with the predicted labels of the examples:
How to interpret results ?
{"probabilities": [0.9962924122810364, 0.003707568161189556], "logits": [-5.593664646148682], "classes": 0, "logistic": [0.003707568161189556]}
https://stackoverflow.com/questions/42827797/how-to-interpret-google-cloud-ml-prediction-results
probabilities: are the probabilities of < $50K
vs >=$50K
.
classes: the predicted class (0, i.e. < $50K)
logits: ln(p/(1-p))
= ln(0.00371/(1-.00371)) = -5.593
logistic: 1/(1+exp(-logit))
= 1/(1+exp(5.593)) = 0.0037
for large amounts of data + no latency requirements on receiving prediction results submit a prediction job to the API. requires data be stored in GCS.
export JOB_NAME=census_prediction
gcloud ml-engine jobs submit prediction $JOB_NAME \
--model census \
--version v1 \
--data-format TEXT \
--region us-central1 \
--input-paths gs://cloudml-public/testdata/prediction/census.json \
--output-path $GCS_JOB_DIR/predictions
Check status of prediction job:
gcloud ml-engine jobs describe $JOB_NAME
After job is SUCCEEDED
, check results in --output-path
.