Skip to content

Instantly share code, notes, and snippets.

@mdfarragher
Created December 12, 2019 12:45
Show Gist options
  • Save mdfarragher/90472fada383ed72f5e0b82ebafc80ae to your computer and use it in GitHub Desktop.
Save mdfarragher/90472fada383ed72f5e0b82ebafc80ae to your computer and use it in GitHub Desktop.

Assignment: Create AI-generated art

In this assignment you're going to use a process called Artistic Style Transfer. This is a process where we recompose an image in the style of another image by transferring the artistic style from one picture to another using a convolutional neural network.

You can use any image you like for the input image, but I would recommend you use a selfie.

Here's the image I am going to use:

The content image

So this image will be the input and the convolutional neural network will apply artistic style to this image. What this means is that the network is going to repaint this image using a specific style.

And here is the style image we are going to use:

The style image

This is a famous cubist image by the painter Lyubov Popova.

So our challenge is to build an app that can take the artistic cubist style, and use it to completely repaint the input image. If everything works, I'll be fully rendered in cubist style.

Let’s get started. You need to build a new application from scratch by opening a terminal and creating a new NET Core console project:

https://gist.github.com/f4cbc8d35efa551aadefe34a1f833284

Also make sure to copy the input image and the style image into this folder because the code you're going to type next will expect it here.

Now install the following packages

https://gist.github.com/a11fe3c681c92c4dd5987b8510008b09

The CNTK.GPU library is Microsoft's Cognitive Toolkit that can train and run deep neural networks. It will train and run deep neural networks using your GPU. You'll need an NVidia GPU and Cuda graphics drivers for this to work.

If you don't have an NVidia GPU or suitable drivers, the library will fall back and use the CPU instead. This will work but training neural networks will take significantly longer.

CNTK is a low-level tensor library for building, training, and running deep neural networks. The code to build deep neural network can get a bit verbose, so I've developed a little wrapper called CNTKUtil that will help you write code faster.

Please download the CNTKUtil files in a new CNTKUtil folder at the same level as your project folder.

Then make sure you're in the console project folder and crearte a project reference like this:

https://gist.github.com/2c64c35cd502d7f700da71c125c1171c

Now you are ready to start writing code. Edit the Program.cs file with Visual Studio Code and add the following code:

https://gist.github.com/5b1cc995bbf2fc5a316334bf58a74892

The code calls NetUtil.CurrentDevice to display the compute device that will be used to train the neural network.

Then we use the StyleTransfer helper class and call LoadImage twice to load the content image and the style image.

Now we need to tell CNTK what shape the input data has that we'll train the neural network on:

https://gist.github.com/88795d1336404e66353fa8b86488fd6e

We are training the neural network with a dreaming layer which has the exact same width and height as the content and style images. So our input tensor is imageWidth times imageHeight times 3 color channels in size, and each pixel channel is a float that can be individually trained.

Our next step is to design the neural network. We're going to use the VGG19 network but only keep the convolutional layers for detecting content and style loss:

https://gist.github.com/795158689d7fd631735582752d54e326

Note how we're first calling VGG19 to load the complete VGG19 network and freeze all layers. We then call StyleTransferBase which will remove the classifier and only keep the convolutional base for style transfer.

Next we need to set up the labels to train the neural network on. These labels are the feature activation and Gramm Matrix values in the content and style layers of the neural network when we show it the content respectively the style image:

https://gist.github.com/80ac50d92ee1d783418b71b8974ccaae

Calculating the labels from the model and the content and style images is a complex operation, but fortunately there's a handy method called CalculateLabels that does it all automatically. The result is a float[][] array that contains the desired activation levels in the content and style layers that will let the neural network know that style transfer has been achieved.

The neural network is almost done. All we need to add is a dreaming layer to generate the mixed image:

https://gist.github.com/d162753e11e600722af61f41e7efac19

The dreaming layer is an input layer for the neural network that represents an image where every pixel is an individually trainable parameter. During the training process, the pixel colors in the dreaming layer will change in order to produce the mixed image.

Next we need to tell CNTK what shape the output tensor of the neural network will have. This shape is a bit complex because we're looking at feature activation and Gramm Matrix values in the content and style layers of the neural network. But we can programmatically calculate the shape like this:

https://gist.github.com/540a8058f2d6f928059f1b6e0fc00204

This code calls GetContentAndStyleLayers to access the content and style layers in the VGG19 network, loops over all labels in the labels array, and constructs an array of CNTK variables with the correct Shape value.

Now we need to set up the loss function to use to train the neural network. This loss function needs to measure the feature activation and Gramm Matrix values in the content and style layers of the neural network, and compare them to the reference activation and Gramm Matrix values when the network is looking at the content and the style images:

https://gist.github.com/38595479fdcab9c14de86a3f36ed3b41

The loss function for style transfer is quite complex, but fortunately we can set it up with a single call to CreateLossFunction and providing the model, the content and style layers, and the CNTK label variable.

Next we need to decide which algorithm to use to train the neural network. There are many possible algorithms derived from Gradient Descent that we can use here.

For this assignment we're going to use the AdamLearner. You can learn more about the Adam algorithm here: https://machinelearningmastery.com/adam...

https://gist.github.com/148d751d5eb3ad593d9034a406d0365d

These configuration values are a good starting point for many machine learning scenarios, but you can tweak them if you like to try and improve the quality of your predictions.

We're almost ready to train. Our final step is to set up a trainer for calculating the loss during each training epoch:

https://gist.github.com/5e1d328a38397f958334d05c88e92033

The GetTrainer method sets up a trainer which will track the loss during the style transfer process.

Now we're finally ready to start training the neural network!

Add the following code:

https://gist.github.com/5487bbc974dca26d9503681dd4488f98

We're training the network for 300 epochs using a training batch set up by the CreateBatch method. The TrainMiniBatch method trains the neural network for one single epoch. And every 50 epochs we display the loss by calling the PreviousMinibarchLossAverage method.

The neural network is now fully trained and the style and content loss is minimal. We now need to extract the image from the neural network:

https://gist.github.com/c58c8338515308a3bee8347bcedb8569

This code sets up an evaluation batch with CreateBatch. Normally we would evaluate the neural network on this batch and create predictions for the labels. But since the image we're interested in is actually stored in the dreaming layer, we can extract it directly from the batch with a call to InferImage.

We now have the value for each pixel in a float[] array, so we call the Mat constructor to project these values to an 8-bit 3-channel color image and call the ImShow method to render the image on screen.

Note that Mat and ImShow are OpenCV features. OpenCV is a flexible image library used by CNTKUtil to implement style transfer.

Finally we call WaitKey so the image remains on screen when the app completes, and we have time to admire the style transfer results.

We're ready to build the app, so this is a good moment to save your work ;)

Go to the CNTKUtil folder and type the following:

https://gist.github.com/32e8b7bf01ffcd4fdd35019d7a993480

This will build the CNKTUtil project. Note how we're specifying the x64 platform because the CNTK library requires a 64-bit build.

Now go to the project folder and type:

https://gist.github.com/59087cc97100ce71408e87db4f75e9ac

This will build your app. Note how we're again specifying the x64 platform.

Now run the app:

https://gist.github.com/38996ee5313882e7483e8e4f2a4aa3cc

The app will create the neural network, load the content and style images, train the network on the data, and create a mixed image with the artistic style from the style image applied to the content image.

What does your image look like? Are you happy with the result?

Try out style transfer with other style and content images. What's the best result you can achieve?

Post your results in our support group.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment