mdfarragher/blog.md Secret

## blog.md

      
    Raw
  

              blog.md
            
          
    Assignment: Detect movie review sentiment using an LSTM network

In this assignment you're going to revisit the IMDB movie sentiment database. But this time you will build an app that uses an LSTM neural network to detect the sentiment of each movie review.
How will the LSTM do? Will it perform better than the 1-dimensional convolutional network?
Let's find out!
Download the IMDB Movie Dataset and save the ZIP file in the project folder that you're going to create in a few minutes.
The dataset contains 25,000 positive movie reviews and 25,000 negative movie reviews. The reviews look like this:

You'll notice that the datafile is not a text file but a binary file, this is because the movie reviews have already been preprocessed. Each word in the reviews has been converted to an index number in a dictionary, and the words have been sorted in reverse order and padded with zeroes so each review is exactly 500 numbers long.
You will build an LSTM network that reads in these 500-word sequences and then makes a prediction for each review if it is positive or negative.
Let’s get started. You need to build a new application from scratch by opening a terminal and creating a new NET Core console project:
https://gist.github.com/8d72eb1365110368c38e9662559c9182
Also make sure to copy the dataset file imdb_data.zip into this folder because the code you're going to type next will expect it here.
Now install the following packages
https://gist.github.com/2646903984e047e1e9d8da2a0aa988de
The CNTK.GPU library is Microsoft's Cognitive Toolkit that can train and run deep neural networks. And Xplot.Plotly is an awesome plotting library based on Plotly. The library is designed for F# so we also need to pull in the Fsharp.Core library.
The CNTK.GPU package will train and run deep neural networks using your GPU. You'll need an NVidia GPU and Cuda graphics drivers for this to work.
If you don't have an NVidia GPU or suitable drivers, the library will fall back and use the CPU instead. This will work but training neural networks will take significantly longer.
CNTK is a low-level tensor library for building, training, and running deep neural networks. The code to build deep neural network can get a bit verbose, so I've developed a little wrapper called CNTKUtil that will help you write code faster.
Please download the CNTKUtil files in a new CNTKUtil folder at the same level as your project folder.
Then make sure you're in the console project folder and crearte a project reference like this:
https://gist.github.com/689e29ec5ed3a3a651117af4ebec48e2
Now you are ready to start writing code. Edit the Program.cs file with Visual Studio Code and add the following code:
https://gist.github.com/9db58c96917eafaf57111d452bfcf700
The code uses File.Exists and ZipFile.ExtractToDirectory to extract the dataset files from the zipfile if that hasn't been done yet. Then we call DataUtil.LoadBinary to load to load the training and testing data in memory. Note the sequenceLength variable that indicates that we're working with movie reviews that have been padded to a length of 500 words.
We now have 25,000 movie reviews ready for training and 25,000 movie reviews ready for testing. Each review has been encoded with each word converted into a numerical dictionary index, and the reviews have been padded with zeroes so that they're all 500 floats long.
Now we need to tell CNTK what shape the input data has that we'll train the neural network on, and what shape the output data of the neural network will have:
https://gist.github.com/9266e4a09629e809ef6ebb0a4cfa62fd
You might be surprised to see that first Var method call where we specify a tensor size of one. But remember that the LSTM network is a recurrent neural network that reads a sequence of data. During each time iteration we provide only a single sequence element to the network, and this is just one single number.
The second Var method tells CNTK that we want our neural network to output a single float value. But because this is a recurrent neural network, we have to specify that we want to use the default batch axis
Our next step is to design the neural network. We're going to build the following network:

This network uses a single LSTM layer to process the movie reviews, and a single dense layer to classify the results into a positive or negative prediction.
Here's how to build this neural network:
https://gist.github.com/52311a3109e05f6e52310c46b66ded8d
Note how we're first calling OneHotOp to convert each word into a one-hot encoded vector with 10,000 elements. We then call Embedding to embed these values in a 32-dimensional space. The call to LSTM adds an LSTM layer with 32 compute elements, and the final Dense call sets up a classifier final layer with a single node using Sigmoid activation.
Then we use the ToSummary method to output a description of the architecture of the neural network to the console.
Now we need to decide which loss function to use to train the neural network, and how we are going to track the prediction error of the network during each training epoch.
For this assignment we'll use BinaryCrossEntropy as the loss function because it's the standard metric for measuring binary classification loss.
We'll track the error with the BinaryClassificationError metric. This is the number of times (expressed as a percentage) that the model predictions are wrong. An error of 0 means the predictions are correct all the time, and an error of 1 means the predictions are wrong all the time.
https://gist.github.com/1717901f3d9a1d8223f71ac1976d2726
Next we need to decide which algorithm to use to train the neural network. There are many possible algorithms derived from Gradient Descent that we can use here.
For this assignment we're going to use the AdamLearner. You can learn more about the Adam algorithm here: https://machinelearningmastery.com/adam...
https://gist.github.com/f9214781c92f535d49ac3fc4070712af
These configuration values are a good starting point for many machine learning scenarios, but you can tweak them if you like to try and improve the quality of your predictions.
We're almost ready to train. Our final step is to set up a trainer and an evaluator for calculating the loss and the error during each training epoch:
https://gist.github.com/06ae2fbc6b4ce3b90026957b81016fe5
The GetTrainer method sets up a trainer which will track the loss and the error for the training partition. And GetEvaluator will set up an evaluator that tracks the error in the test partition.
Now we're finally ready to start training the neural network!
Add the following code:
https://gist.github.com/05513fbd95e67408731a89d670510fb8
We're training the network for 10 epochs using a batch size of 128. During training we'll track the loss and errors in the loss, trainingError and testingError arrays.
Once training is done, we show the final testing error on the console. This is the percentage of mistakes the network makes when predicting review sentiment.
Note that the error and the accuracy are related: accuracy = 1 - error. So we also report the final accuracy of the neural network.
Here's the code to train the neural network. Put this inside the for loop:
https://gist.github.com/ec5c533fd5aa1e426d51d6d39af32815
The Batch() call splits the data up in a collection of 128-record batches. The second argument to Batch() is a function that will be called for every batch.
Inside the batch function we first call GetSequenceBatch to get a feature batch containing 500-word sequences, and then we call GetBatch to get a corresponding label batch. Then we call TrainBatch to train the neural network on these two batches of training data.
The TrainBatch method returns the loss and error, but only for training on the 128-record batch. So we simply add up all these values and divide them by the number of batches in the dataset. That gives us the average loss and error for the predictions on the training partition during the current epoch, and we report this to the console.
So now we know the training loss and error for one single training epoch. The next step is to test the network by making predictions about the data in the testing partition and calculate the testing error.
Put this code inside the epoch loop and right below the training code:
https://gist.github.com/ca924864387d39c1cbda32c5d30023f8
We again call Batch to get a batch of testing records, and GetSequenceBatch and GetBatch to get the feature and label batches. But note that we're now providing the testing_data and testing_labels arrays.
We call TestBatch to test the neural network on the 128-record test batch. The method returns the error for the batch, and we again add up the errors for each batch and divide by the number of batches.
That gives us the average error in the neural network predictions on the test partition for this epoch.
After training completes, the training and testing errors for each epoch will be available in the trainingError and testingError arrays. Let's use XPlot to create a nice plot of the two error curves so we can check for overfitting:
https://gist.github.com/628985a3e98c70c8cd36988f7474b504
This code creates a Plot with two Scatter graphs. The first one plots 1 - trainingError which is the training accuracy, and the second one plots 1 - testingError which is the testing accuracy.
Finally we use File.WriteAllText to write the plot to disk as a HTML file.
We're now ready to build the app, so this is a good moment to save your work ;)
Go to the CNTKUtil folder and type the following:
https://gist.github.com/87cfbf31a3b7e371bff1f354d9163f7c
This will build the CNKTUtil project. Note how we're specifying the x64 platform because the CNTK library requires a 64-bit build.
Now go to the LstmDemo folder and type:
https://gist.github.com/c3afd26f2f194d9e40a9d4eb7d43bb17
This will build your app. Note how we're again specifying the x64 platform.
Now run the app:
https://gist.github.com/aa472abcf7211ba47364f95d2e546b04
The app will create the neural network, load the dataset, train the network on the data, and create a plot of the training and testing errors for each epoch.
The plot is written to disk in a new file called chart.html. Open the file now and take a look at the training and testing curves.
What is  your final testing accuracy? And what do the curves look like? Is the neural network overfitting?
Do you think this model is good at predicting text sentiment?
Try to improve the neural network by changing the network architecture. You can add more LSTM layers, or increase the number of compute elements in the layer, or increase the batch size or train for more epochs.
Did the changes help? What is the best accuracy you can achieve?
Post your results in our support group.