In a previous article (Generative Adversarial Networks, what are they and how do they work?), I talked about an app called Prisma and how it might be generating pictures in different styled filters. During my coursework in the PyTorch Scholarship Challenge 2018-2019, I learned of another method called “Style Transfer” using Convolutional Neural Networks (CNNs for short). In this article, I will describe how the Style Transfer works.
*Disclaimer: These articles are aimed at beginners who are new to AI. I hope I can explain the intuition behind these topics at a high level that’s easy to understand. For more in depth information on the latest in AI, rather than re-inventing the wheel, I recommend the resources below:
What are CNNs?
I won’t go too deeply into how CNNs work, but when you pass an image through this type of network, it will travel through Convolutional Layers that extract major features which help to identify a particular object. Here is a video of how these layers, commonly called filters or feature extractors, work in realtime.
How do I copy your style?
There are a lot of pre-trained CNNs that can identify many objects out there. Image-net (http://www.image-net.org) hosts a large database of images and often issues challenges to see who can come up with the best algorithms to classify the data. Rather than building a new model from the ground up, we can use the work already done by these teams. Here’s a link to some benchmarks for some popular image detection CNNs:
In this example, we will be using the VGG19 CNN to extract the Content and Style of a picture. This work is based on this research paper below:
Content vs. Style Representation
Our goal is to take a picture and preserve the content, or object, in that picture while adopting the style of another picture. With VGG19, we know exactly which specific convolutional layers in the CNN represent the content and the style. All we need to do is feed in our content image and our style image to extract the content and style layers respectively. Then we apply that style layer to the original content image. Using AI techniques, we can keep adjusting the content image’s style layer until it starts to get very close to the target style image. In reality, it will take thousands of rounds of computation to get there.
Here you can see the original style image of the samurai (left) and the original content image of the car (middle). After 30,000 rounds of computation, you can see the brush stroke style applied to the car (right). That’s it, thanks for reading! Links to my code below if you want to try it out yourself. I highly recommend using an NVIDIA GPU for the training, otherwise it might take a while.
If you need help in installing PyTorch on Windows, refer to my previous article here: