Skip to content

Instantly share code, notes, and snippets.

@yptheangel
Created August 9, 2019 07:05
Show Gist options
  • Save yptheangel/2bc7ce25bf94d9ffdcbcde0ab2dd50c9 to your computer and use it in GitHub Desktop.
Save yptheangel/2bc7ce25bf94d9ffdcbcde0ab2dd50c9 to your computer and use it in GitHub Desktop.
apiVersion: "kubeflow.org/v1"
kind: "TFJob"
metadata:
name: "inception-train-job"
spec:
replicaSpecs:
- replicas: 4
tfReplicaType: WORKER
template:
spec:
containers:
- image: banzaicloud/tensorflow-inception-example:v0.1
name: tensorflow
command: ["bazel-bin/inception/imagenet_distributed_train"]
args: ["--batch_size=32", "--num_gpus=0", "--data_dir=/my-pv/image-data", "--train_dir=/my-pv/train"]
volumeMounts:
- name: pvc-efs
mountPath: "/my-pv"
volumes:
- name: pvc-efs
persistentVolumeClaim:
claimName: pvc-1
restartPolicy: OnFailure
- replicas: 2
tfReplicaType: PS
tensorboard:
logDir: /my-pv/train
serviceType: LoadBalancer
volumes:
- name: pvc-efs
persistentVolumeClaim:
claimName: pvc-1
volumeMounts:
- name: pvc-efs
mountPath: "/my-pv"
terminationPolicy:
chief:
replicaName: WORKER
replicaIndex: 0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment