In every machine learning project we have to continuously tweek and experiment with our models. This is necessary, not only to further improve performance, but also to explore underlying model characteristics. These constant experiments require rigorous logging and performance tracking. Hence, various different provider have come up with solutions to facilitate this tracking such as Tensorboard, Comet, W&B, as well as others. Here at Apoidea we make use of W&B.
Within this blog post we would like to give a more practical overview of how we run machine learning experiments and track their performance. Specifically, how we quickly set up clusters in the cloud and train our models. We hope this might help others, as well as improve our current practices by enganging in a discussion with the wider machine learning community.
Within this post we will out