Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save kylekyle/40683624f6aae722aeaa460c33bb794b to your computer and use it in GitHub Desktop.
Save kylekyle/40683624f6aae722aeaa460c33bb794b to your computer and use it in GitHub Desktop.
Using Aquamentus to Train Neural Networks
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
},
"colab": {
"name": "Using Aquamentus to Train Neural Networks",
"provenance": [],
"collapsed_sections": [],
"include_colab_link": true
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/kylekyle/244e9686c0c35242e5d8f14c3e83e5f3/using-aquamentus-to-train-neural-networks.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5sSoH52pc3qK",
"colab_type": "text"
},
"source": [
"# Using Aquamentus to Train Neural Networks\n",
"\n",
"<small><i>Kyle King - April 9, 2020. Thanks to Dan Hawthorne for building and supporting the cluster!</i></small><br>\n",
"\n",
"Aquamentus a joint endeavor between the Cyber and Robotics research centers at West Point. It is a cluster of four [NVIDIA TITAN RTX](https://www.nvidia.com/en-us/deep-learning-ai/products/titan-rtx/) graphic cards that can quickly train large models, like Xception.\n",
"\n",
"This notebook will show you how to train a neural network on Aquamentus. We'll use `keras` with a `tensorflow` backend to train a simple text generation model based [on the Cyberspace Solarium Commission's final report](https://www.solarium.gov/). \n",
"\n",
"At the time this notebook was written, each epoch of training took 3-4 minutes on a personal laptop compared to 15 seconds on a single Aquamentus GPU. This notebook will show you how to use all four GPUs. However, there is a quite a bit of overhead in using multiple GPUs so I recommend sticking to a single GPU unless your model is really large. \n",
"\n",
"**IMPORTANT:** Make sure that you install `tensorflow-gpu`, not `tensorflow`, for GPU support. \n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "z7Plq9K5U_7s",
"colab_type": "text"
},
"source": [
"## Download\n",
"\n",
"Let's start by downloading the corpus and using Keras's text preprocessing library to normalize and create an array of words:"
]
},
{
"cell_type": "code",
"metadata": {
"id": "52eYSkArU_7t",
"colab_type": "code",
"colab": {}
},
"source": [
"import requests\n",
"import tensorflow\n",
"\n",
"from tensorflow import keras\n",
"from collections import Counter\n",
"from tensorflow.keras.preprocessing.text import Tokenizer, text_to_word_sequence\n",
"\n",
"url = 'https://drive.google.com/uc?export=download&id=1Ke0h_WWOsndoQLBFutNBfZg0Qkksi-DD'\n",
"response = requests.get(url)\n",
"\n",
"# tokenize while preserving periods \n",
"filters = Tokenizer().filters.replace('.','')\n",
"words = text_to_word_sequence(response.text, filters=filters)\n",
"\n",
"# filter out words that only appear once\n",
"counts = Counter(words)\n",
"words = [word for word in words if counts[word] > 1]\n",
"\n",
"print('The final report is', len(words), \"words\")"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "6zBR8Pf6U_7x",
"colab_type": "text"
},
"source": [
"## Vectorize\n",
"\n",
"Next, we will generate our inputs and labels. Inputs are sequences of words from the corpus and labels are the word that should follow the sequence. For example, if we were using sequences of 4 words, then one input would be: \n",
"\n",
"```python\n",
"sample_input = ['you', 'spend', 'your', 'whole']\n",
"sample_label = 'career'\n",
"```\n",
"\n",
"Of course, we will use numbers instead of words and our labels will be probability distributions over all possible words. For the example above, the label will be an array of zeros except for the entry that corresponds to the word `career`, which would be a `1`. "
]
},
{
"cell_type": "code",
"metadata": {
"id": "rSUfyQXHHBi2",
"colab_type": "code",
"colab": {}
},
"source": [
"import numpy as np\n",
"\n",
"# create a tokenizer for our corpus\n",
"tokenizer = Tokenizer(filters=filters)\n",
"tokenizer.fit_on_texts(words)\n",
"\n",
"# compute the average sequence length\n",
"sequence_length = int(len(words) / counts['.'])\n",
"\n",
"# add one for index zero, which is a reserved index\n",
"num_unique_words = len(tokenizer.word_index) + 1\n",
"\n",
"# convert words to numbers \n",
"nums = tokenizer.texts_to_sequences([' '.join(words)])[0]\n",
"\n",
"# grab all sequences of `sequence_length`\n",
"# drop the last few so we don't have to worry about padding. \n",
"x = np.array([nums[i:i+sequence_length] for i in range(len(words)-sequence_length)])\n",
"\n",
"# we actually need y to be a prob dist over all possible words ...\n",
"y = np.zeros((len(words)-sequence_length, num_unique_words), dtype=np.bool)\n",
"\n",
"# set the correct index in y to 1\n",
"for i in range(0,len(nums)-sequence_length):\n",
" y[i][nums[i+sequence_length]] = 1"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "FtXhOcP9U_71",
"colab_type": "text"
},
"source": [
"## Model\n",
"\n",
"To utilize multiple GPUs, you need a [distribution strategy](https://www.tensorflow.org/tutorials/distribute/keras). We'll use `MirroredStrategy` which divides each batch by the number of GPUs, trains, then combines and syncs results across all GPUs. \n",
"\n",
"It is important that you declare and fit your model in the distributed scope. Note that you **cannot save your model** in the distributed scope.\n",
"\n",
"The [TensorFlow team recommends](https://www.tensorflow.org/tutorials/distribute/keras#setup_input_pipeline) using the largest batch size that will fit in GPU memory since moving data between system memory and GPU memory is expensive. "
]
},
{
"cell_type": "code",
"metadata": {
"id": "18fh_FMuU_72",
"colab_type": "code",
"colab": {}
},
"source": [
"from tensorflow.keras import Model, Sequential\n",
"from tensorflow.keras.models import load_model\n",
"from tensorflow.keras.layers import *\n",
"\n",
"# a big batch is better for GPUs, but you should probably \n",
"# add a learning rate schedule to compensate ...\n",
"batch_size = 256\n",
"embedding_dims = 64\n",
"\n",
"strategy = tensorflow.distribute.MirroredStrategy()\n",
"num_gpus = strategy.num_replicas_in_sync\n",
"\n",
"print('Number of GPUs: {}'.format(num_gpus))\n",
"\n",
"with strategy.scope():\n",
" input = Input(shape=(sequence_length,))\n",
" embed = Embedding(num_unique_words, embedding_dims, input_length=sequence_length)(input)\n",
" \n",
" # I recommend using LSTM if you need recurrent layers and plan to export to tfjs:\n",
" # https://github.com/tensorflow/tfjs/issues/2442\n",
" recurrent = Bidirectional(LSTM(128, return_sequences=True, dropout=0.1, recurrent_dropout=0.5))(embed)\n",
" recurrent = Bidirectional(LSTM(128, dropout=0.1, recurrent_dropout=0.5))(recurrent)\n",
"\n",
" output = Dense(num_unique_words, activation='softmax')(recurrent)\n",
"\n",
" model = Model(inputs=input, outputs=output)\n",
" model.compile(loss='categorical_crossentropy', optimizer='RMSProp', metrics=['accuracy']) \n",
"\n",
"model.summary()"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "8qz9PMe_U_7_",
"colab_type": "text"
},
"source": [
"# Sampling\n",
"\n",
"Since we don't want to generate the same text sequences every time, we will introduce some randomness into the sampling process. How much randomness? We'll use a softmax temperature algorithm in which higher temperatures produce more randomness in the sample. \n",
"\n",
"A very low temperature will always produce the same next word - the word that has the highest probability. If that's what you're after, then I recommend using [beam search](https://github.com/dabasajay/Image-Caption-Generator/blob/master/utils/model.py) instead. Beam search will give you the *sequence* with the highest probability, not just the next *word* the highest probability. "
]
},
{
"cell_type": "code",
"metadata": {
"id": "WP-DHqHrU_8A",
"colab_type": "code",
"colab": {}
},
"source": [
"def sample(preds, temperature=1.0):\n",
" preds = np.asarray(preds).astype('float64')\n",
" preds = np.log(preds) / temperature\n",
" exp_preds = np.exp(preds)\n",
" preds = exp_preds / np.sum(exp_preds)\n",
" probas = np.random.multinomial(1, preds, 1)\n",
" return np.argmax(probas)\n",
"\n",
"# given a phrase, complete it using the given model and tokenizer\n",
"def complete_phrase(model, tokenizer, phrase, temperature=0.5, max_length=25):\n",
" print(\"[\", phrase, \"]\", end=\" \")\n",
" \n",
" vector = tokenizer.texts_to_sequences([phrase])[0]\n",
" sequence = np.zeros(sequence_length)\n",
" sequence[-len(vector):] = vector\n",
"\n",
" for i in range(max_length):\n",
" preds = model(np.expand_dims(sequence, axis=0), training=False)[0]\n",
" \n",
" index = sample(preds, temperature)\n",
" word = tokenizer.index_word[index]\n",
" \n",
" sequence = np.roll(sequence, -1)\n",
" sequence[-1] = index \n",
" \n",
" print(word, end=\" \")\n",
" if word == '.':\n",
" break"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "V0E5_YKdU_8H",
"colab_type": "text"
},
"source": [
"# Train\n",
"\n",
"We're going to pause training every 10 epochs so we can see results. This is terrible for performance, but it's really hard to say how well the model is doing without generating some text. We'll train forever, saving a model every 10 epochs. \n",
"\n",
"There appears to be [some issue](https://github.com/keras-team/keras/issues/13861) calling `predict` on models that were generated using a distributed strategy. You *should* be able to use the predict method to get results:\n",
"\n",
"```\n",
"results = model.predict(inputs)\n",
"```\n",
"\n",
"But that produces an shape error. Instead, treat the model as a function: \n",
"\n",
"```\n",
"results = model(inputs, training=False)\n",
"```\n"
]
},
{
"cell_type": "code",
"metadata": {
"id": "3vumOWsRU_8I",
"colab_type": "code",
"colab": {}
},
"source": [
"import os.path\n",
"from random import choice\n",
"from tensorflow.keras.callbacks import Callback\n",
"\n",
"checkpoint_path = 'csc-weights.h5'\n",
"\n",
"# create a callback that generates some text every 10 epochs\n",
"class CheckpointAndEvaluate(Callback):\n",
" def on_epoch_end(self, epoch, logs):\n",
" if epoch % 10 == 0: \n",
" print(F\"Saving {checkpoint_path} ...\")\n",
" model.save_weights(checkpoint_path)\n",
"\n",
" print(\"Evaluating model ...\")\n",
" for temperature in [0.2, 0.5, 1.2]:\n",
" print('temperature:', temperature, endl=\" \")\n",
" phrase = choice(x)[0:5]\n",
" complete_phrase(\n",
" model,\n",
" tokenizer, \n",
" tokenizer.sequences_to_texts([phrase])[0],\n",
" temperature\n",
" )\n",
" print()\n",
"\n",
"if os.path.isfile(checkpoint_path):\n",
" print(F\"Loading saved weights from {checkpoint_path} ...\")\n",
" model.load_weights(checkpoint_path) \n",
"\n",
"with strategy.scope():\n",
" model.fit(x, y, epochs=1000, batch_size=batch_size, callbacks=[CheckpointAndEvaluate()])"
],
"execution_count": 0,
"outputs": []
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment