Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dzlab/101e8583683117c221262d9496f29447 to your computer and use it in GitHub Desktop.
Save dzlab/101e8583683117c221262d9496f29447 to your computer and use it in GitHub Desktop.
TensorFlow Distributed Training on Kubeflow with TFJob
Name: mnist-tensorflow-job
Namespace: default
Labels: <none>
Annotations: API Version: kubeflow.org/v1
Kind: TFJob
Metadata:
Creation Timestamp: 2020-07-18T18:54:31Z
Generation: 1
Resource Version: 43041
Self Link: /apis/kubeflow.org/v1/namespaces/default/tfjobs/mnist-tensorflow-job
UID: 0b5b088f-0690-4089-8b18-1b4188eb345a
Spec:
Tf Replica Specs:
PS:
Replicas: 1
Restart Policy: Never
Template:
Metadata:
Annotations:
sidecar.istio.io/inject: false
Spec:
Containers:
Image: docker.io/<DOCKER_HUB_USERNAME>/tf-dist-mnist-test:1.0
Name: tensorflow
Worker:
Replicas: 2
Restart Policy: Never
Template:
Metadata:
Annotations:
sidecar.istio.io/inject: false
Spec:
Containers:
Image: docker.io/<DOCKER_HUB_USERNAME>/tf-dist-mnist-test:1.0
Name: tensorflow
Status:
Completion Time: 2020-07-18T18:56:16Z
Conditions:
Last Transition Time: 2020-07-18T18:54:31Z
Last Update Time: 2020-07-18T18:54:31Z
Message: TFJob mnist-tensorflow-job is created.
Reason: TFJobCreated
Status: True
Type: Created
Last Transition Time: 2020-07-18T18:54:36Z
Last Update Time: 2020-07-18T18:54:36Z
Message: TFJob mnist-tensorflow-job is running.
Reason: TFJobRunning
Status: False
Type: Running
Last Transition Time: 2020-07-18T18:56:16Z
Last Update Time: 2020-07-18T18:56:16Z
Message: TFJob mnist-tensorflow-job successfully completed.
Reason: TFJobSucceeded
Status: True
Type: Succeeded
Replica Statuses:
PS:
Succeeded: 1
Worker:
Succeeded: 2
Start Time: 2020-07-18T18:54:31Z
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreatePod 5m6s tf-operator Created pod: mnist-tensorflow-job-worker-0
Normal SuccessfulCreatePod 5m6s tf-operator Created pod: mnist-tensorflow-job-worker-1
Normal SuccessfulCreateService 5m6s tf-operator Created service: mnist-tensorflow-job-worker-0
Normal SuccessfulCreateService 5m6s tf-operator Created service: mnist-tensorflow-job-worker-1
Normal SuccessfulCreatePod 5m5s tf-operator Created pod: mnist-tensorflow-job-ps-0
Normal SuccessfulCreateService 5m5s tf-operator Created service: mnist-tensorflow-job-ps-0
Normal ExitedWithCode 3m21s tf-operator Pod: default.mnist-tensorflow-job-worker-0 exited with code 0
Normal TFJobSucceeded 3m21s tf-operator TFJob mnist-tensorflow-job successfully completed.
Normal SuccessfulDeletePod 3m21s tf-operator Deleted pod: mnist-tensorflow-job-worker-1
Normal SuccessfulDeleteService 3m21s tf-operator Deleted service: mnist-tensorflow-job-worker-1
Normal SuccessfulDeletePod 3m20s tf-operator Deleted pod: mnist-tensorflow-job-ps-0
Normal SuccessfulDeleteService 3m20s tf-operator Deleted service: mnist-tensorflow-job-ps-0
apiVersion: "kubeflow.org/v1"
kind: "TFJob"
metadata:
name: "mnist-tensorflow-job"
spec:
tfReplicaSpecs:
PS:
replicas: 1
restartPolicy: Never
template:
metadata:
annotations:
sidecar.istio.io/inject: "false"
spec:
containers:
- name: tensorflow
image: docker.io/<DOCKER_HUB_USERNAME>/tf-dist-mnist-test:1.0
Worker:
replicas: 2
restartPolicy: Never
template:
metadata:
annotations:
sidecar.istio.io/inject: "false"
spec:
containers:
- name: tensorflow
image: docker.io/<DOCKER_HUB_USERNAME>/tf-dist-mnist-test:1.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment