Skip to content

Instantly share code, notes, and snippets.

View Markus92's full-sized avatar

Mark Markus92

  • Health-RI
  • Netherlands
View GitHub Profile
@Markus92
Markus92 / serversetup.md
Last active June 18, 2024 08:51
Setting up a GPU server with scheduling and containers

Setting up a GPU server with scheduling and containers

Our group recently acquired a new server to do some deep learning: a SuperMicro 4029GP-TRT2, stuffed with 8x NVidia RTX 2080 Ti. Though maybe a bit overpowered, with upcoming networks like BigGAN and fully 3D networks, as well as students joining our group, this machine will be used quite a lot in the future.

One challenge is, is how to manage these GPUs. There are many approaches, but given that most PhD candidates aren't sysadmins, these range from 'free-for-all', leading to one person hogging all GPUs for weeks due to a bug in the code, to Excel sheets that noone understands and noone adheres to because changing GPU ids in code is hard. This leads to a lot of frustration, low productivity and under-utilisation of these expensive servers. Another issue is conflicting software versions. TensorFlow and Keras, for example, tend to do breaking API changes every now and then. As t