This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
apiVersion: batch.volcano.sh/v1alpha1 | |
kind: Job | |
metadata: | |
annotations: | |
name: nccl-allreduce-job0 | |
spec: | |
minAvailable: 0 | |
plugins: | |
ssh: [] | |
svc: [] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQCv++INMpsLoaSg7IMdgPLandO3sEJl1FBRUOH3ziXlGPuNXevnd0qz9SLknNQ/MKSGn6mBjrh/nfdqz53y4QYUpA57/cxBlgqk1EW9OsRM4daQnNi1aFL/oXb5ZwKuUiuBlC37QgDTO+RBHphkyKJneQdtWpD5WlqgEDSbXuW1ScHCCBz09eOkWGR2b2CmM9b9IVIxLpV6FnCROK3Pn39OL2U0kA8UHu1q6gJhxdP+gBVMXMYsKyFL3t8yPaQ0khLOAP8i3CIFB3hivP9n5IZ24s6BV46kOq/fvTAG3rC87L8SYFjWz/rLX4NzfbGwDn/ylRdwf4xxPgv0ettrQLRiREETrmOZQQqp6siIzP9kovo0KqXyOHsl8XqUGPpo1YLzxvJLeO1rDxdf3KyuvdDEAG9QKXkxhhwnaEsNC0jWQRLge4hjrdFyRf5MvpGRt5bs0uh2HqvuEneZlvRXwUUN/gnpLhT6B7tdMbF3Y75JfLCQlFrYmQ3XlYe5Ztzk+SWGZ2uDVDODLFArevb6xGg8V9AvcwPpF2bnqlfQQ9L1St0dBvhMqPjNAr3ac0y0sRjyFEAvXCt2OZtUJ9u65Uvr0Or2cfpQOY9DacLLQAMAtOnBr8FKoFejhbbbXga9mok9vrjRACSoLUwVOlBPjjnxQ7FkgcKcZKqqgz3lG9Q8bw== test@test.com |
This file has been truncated, but you can view the full file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Warning: Permanently added 'nccl-alltoall-job-oguz-mpiworker-0.nccl-alltoall-job-oguz' (ED25519) to the list of known hosts. | |
Warning: Permanently added 'nccl-alltoall-job-oguz-mpiworker-1.nccl-alltoall-job-oguz' (ED25519) to the list of known hosts. | |
Warning: Permanently added 'nccl-alltoall-job-oguz-mpiworker-2.nccl-alltoall-job-oguz' (ED25519) to the list of known hosts. | |
Warning: Permanently added 'nccl-alltoall-job-oguz-mpiworker-4.nccl-alltoall-job-oguz' (ED25519) to the list of known hosts. | |
Warning: Permanently added 'nccl-alltoall-job-oguz-mpiworker-3.nccl-alltoall-job-oguz' (ED25519) to the list of known hosts. | |
Warning: Permanently added 'nccl-alltoall-job-oguz-mpiworker-5.nccl-alltoall-job-oguz' (ED25519) to the list of known hosts. | |
Warning: Permanently added 'nccl-alltoall-job-oguz-mpiworker-6.nccl-alltoall-job-oguz' (ED25519) to the list of known hosts. | |
Warning: Permanently added 'nccl-alltoall-job-oguz-mpiworker-12.nccl-alltoall-job-oguz' (ED25519) to the list of known hosts. | |
Warning: Permanently ad |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<system version="1"> | |
<cpu numaid="0" affinity="00000000,000000ff,ffffffff,ffff0000,00000000,00ffffff,ffffffff" arch="x86_64" vendor="GenuineIntel" familyid="6" modelid="143"> | |
<pci busid="0000:08:00.0" class="0x060400" vendor="0x1000" device="0xc030" subsystem_vendor="0x1000" subsystem_device="0x0072" link_speed="5.0 GT/s PCIe" link_width="16"> | |
<pci busid="0000:0d:00.0" class="0x060400" vendor="0x1000" device="0xc030" subsystem_vendor="0x1000" subsystem_device="0x1003" link_speed="32.0 GT/s PCIe" link_width="16"> | |
<pci busid="0000:0f:00.0" class="0x030200" vendor="0x10de" device="0x2330" subsystem_vendor="0x10de" subsystem_device="0x16c1" link_speed="32.0 GT/s PCIe" link_width="16"> | |
<gpu dev="0" sm="90" rank="0" gdr="1"> | |
<nvlink target="0000:1c:00.0" count="5" tclass="0x068000"/> | |
<nvlink target="0000:1b:00.0" count="5" tclass="0x068000"/> | |
<nvlink target="0000:1a:00.0" count="4" tclass="0x068000"/> | |
<nvlink target="0000:1d:00.0" count=" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
set -ex | |
wget -O /home/ubuntu/oracle-cloud-agent_1.38.0-4_amd64.snap https://objectstorage.us-phoenix-1.oraclecloud.com/p/-EYKOzTNCQWpvJzwhH6KHGewyHYL47IuDnx3PHqwkmdoThKQEzlx_SJRjhpjTUpz/n/imagegen/b/agent_test/o/1.38.0/3/oracle-cloud-agent_1.38.0-4_amd64.snap | |
sudo snap stop oracle-cloud-agent | |
sudo snap install --classic --dangerous /home/ubuntu/oracle-cloud-agent_1.38.0-4_amd64.snap | |
sudo snap start oracle-cloud-agent |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
NODE_POOL_NAME= | |
NODE_POOL_SIZE= | |
NODE_POOL_BOOT_VOLUME_SIZE_IN_GB= | |
NODE_IMAGE_ID= | |
CLUSTER_ID= | |
COMPARTMENT_ID= | |
NODE_SHAPE= | |
oci ce node-pool create \ | |
--cluster-id $CLUSTER_ID \ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
apiVersion: kubeflow.org/v2beta1 | |
kind: MPIJob | |
metadata: | |
name: nccl-test-a100 | |
spec: | |
slotsPerWorker: 8 | |
runPolicy: | |
cleanPodPolicy: Running | |
mpiReplicaSpecs: | |
Launcher: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
apiVersion: kubeflow.org/v2beta1 | |
kind: MPIJob | |
metadata: | |
name: nccl-test-a100 | |
spec: | |
slotsPerWorker: 8 | |
runPolicy: | |
cleanPodPolicy: Running | |
mpiReplicaSpecs: | |
Launcher: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
apiVersion: v1 | |
kind: Pod | |
metadata: | |
name: nvidia-version-check | |
spec: | |
restartPolicy: OnFailure | |
containers: | |
- name: nvidia-version-check | |
image: nvidia/cuda:11.7.1-base-ubuntu20.04 | |
command: ["nvidia-smi"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#------------------------------------------------- | |
# SGE default configuration file | |
#------------------------------------------------- | |
# Use always fully qualified pathnames, please | |
# Path to a log file. If the file already exists, the log output | |
# will be appended. | |
# If empty, a log file will be created in <SGE_ROOT>/<SGE_CELL>/common | |
# The file needs to be writable by the admin user |
NewerOlder