Deep Learning InfraOps without Tears
This is an introduction to managing infrastructure (Infra) and system administration operations (Ops) particularly for deep learning applications.
Why do I have to learn this?
Q: I have my Jupyter notebooks and virtual machines that comes with all batteries included. Why do I have to handle bare-metal machines?
A: You don't (if you are not interested). In most cases, the normal virtual machines with everything included, e.g. AWS Sagemaker or Azures Machine Learning Studio. But these are technically "Managed" Virtual Machines and often fixing something that goes wrong after the VM has been spun up for a couple of months will lead to the same issues that requires the knowledge of infra-ops to handle/fix the issue.