# AWS P2 instance type has K80 GPUs, compute capability 3.7 | |
# AWS P3 instance type has V100 GPUs, compute capability 7.0 | |
build:cuda --action_env=TF_CUDA_COMPUTE_CAPABILITIES=3.7,7.0 | |
build:cuda --action_env=TF_CUDA_VERSION=10 | |
build:cuda --action_env=TF_CUDNN_VERSION=7 | |
# manually installed CUDA and cuDNN | |
build:cuda --action_env=TF_CUDA_PATHS=/usr/local/cuda-10.2.89,/usr/local/cudnn-10.2-7.6.5.32 |
(I also reported this to AWS: awsdocs/amazon-ec2-user-guide#123)
The "Running Amazon Linux 2 as a virtual machine onpremises" page describes a fairly cumbersome way of running Amazon Linux 2 in local virtual machines through using various tools to provision ISO9660 seed.iso
files just to serve the VM instance two small data files.
It would be great it the documentation also pointed out that since the VM provisioning is being done with cloud-init
, and the image has configured a fairly extensive datasource_list: [ NoCloud, AltCloud, ConfigDrive, OVF, None ]
which starts with NoCloud, that NoCloud also allows you to serve these files over HTTP.
There are two easy ways of using network configuration instead of seed.iso
. Either you tell GRUB to add a parameter to the kernel boot configuration, or you tell KVM/VMWare/Virtualbox to set the virtual machine's SMBIOS value to something which cloud-init
's NoCloud
understands.
The documentation page https://cloudinit.readthedocs.
date | title | tags | |
---|---|---|---|
2020-02-29 |
Proper CUDA and cuDNN installation |
|
You're here, so you're probably already hurting because of CUDA and cuDNN compatibility, and I won't have to motivate you, or explain why you'd want to have standalone CUDA and cuDNN installations, if you're going to develop using Tensorflow in the long term.
#! /bin/sh | |
# Workaround for broken CUDA/NVENC/NVDEC after suspend. | |
# This file now lives in /lib/systemd/system-sleep/nvidia-resume . | |
# Remember to chmod 755 it. | |
PATH=/sbin:/usr/sbin:/bin:/usr/bin | |
case $1 in | |
post) |
import os | |
import pathlib | |
import tensorflow as tf | |
import logging | |
## -------------------------------------------------------------------------------- | |
# Configure | |
physical_devices = tf.config.experimental.list_physical_devices('GPU') | |
assert len(physical_devices) > 0, "No GPUs found" |
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=781288;msg=37 | |
$ apt-get remove light-locker |
docker run --gpus all -it --rm --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 nvcr.io/nvidia/tensorflow:19.10-py3 nvidia-smi |
Environment: AWS P2.xlarge instance with a K80 GPU and CUDA compute capability of 3.7.
Error: "Ignoring visible gpu device [...] The minimum required Cuda capability is 6.0."
Expected: The minimum CUDA compute capability is 3.5, like the documents state, and as the Python packages require.
[100.079s][info][gc,stats ] === Garbage Collection Statistics ======================================================================================================================= | |
[100.079s][info][gc,stats ] Last 10s Last 10m Last 10h Total | |
[100.079s][info][gc,stats ] Avg / Max Avg / Max Avg / Max Avg / Max | |
[100.079s][info][gc,stats ] Collector: Garbage Collection Cycle 0.000 / 0.000 55.908 / 77.566 55.908 / 77.566 55.908 / 77.566 ms | |
[100.079s][info][gc,stats ] Contention: Mark Segment Reset Contention 0 / 0 0 / 0 0 / 0 0 / 0 ops/s | |
[100.079s][info][gc,stats ] Contention: Mark SeqNum Reset Contention 0 / 0 0 / 0 0 / 0 |