Skip to content

Instantly share code, notes, and snippets.

@jacobtomlinson
Created March 25, 2021 16:43
Show Gist options
  • Save jacobtomlinson/d66766c0773780c4b2e666d2da60199e to your computer and use it in GitHub Desktop.
Save jacobtomlinson/d66766c0773780c4b2e666d2da60199e to your computer and use it in GitHub Desktop.
Monitoring RAPIDS with Prometheus and Grafana (configs)
version: "3.9"
services:
rapids:
image: rapidsai/rapidsai:0.18-cuda11.0-runtime-ubuntu16.04-py3.8
ports:
- "8888:8888" # Jupyter
- "8786:8786" # Dask communication
- "8787:8787" # Dask dashboard
environment:
JUPYTER_FG: "true"
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]
prometheus:
image: prom/prometheus:latest
volumes:
- ./prometheus/:/etc/prometheus/
ports:
- "9090:9090"
node_exporter:
image: quay.io/prometheus/node-exporter:latest
command:
- '--path.rootfs=/host'
network_mode: host
pid: host
volumes:
- '/:/host:ro,rslave'
gpu_exporter:
image: nvcr.io/nvidia/k8s/dcgm-exporter:2.0.13-2.1.2-ubuntu18.04
ports:
- "9400:9400"
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]
grafana:
image: grafana/grafana:latest
volumes:
- ./grafana:/var/lib/grafana
ports:
- "3000:3000"
global:
scrape_interval: 15s
scrape_configs:
- job_name: rapids
static_configs:
- targets: ['<IP>:8787']
- targets: ['<IP>:9100']
- targets: ['<IP>:9400']
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment