What is a generative model

A Brief Introduction to Generative Adversarial Networks

On October 25, 2018, the picture by Edmond de Belamy was sold for $ 423,500 in the world-famous Christie's auction house in New York. What is special about this picture? The artist behind Edmond's painting is not a real person, but an artificial intelligence from the Parisian collective Obvious; and the picture was created from a so-called Generative Adversarial Network. It is the first time that a machine learning work has come under the hammer from such a major auction house.


Painting "Edmond de Belamy". Source / Copyright: Obvious

On the occasion of this event we would like to take this opportunity to explain how Edmond de Belamy and his family came about. For this purpose, we first describe the general idea of ​​generative networks, before we present a simple implementation with Keras and Tensorflow and finally give an outlook on what these models can be used for.

What is a Generative Network?

The algorithm used by Obvious to generate the pictures of the de Belamy family is, as already mentioned, a Generative Adversarial Network (GAN, in German for example Generating enemy network). GAN is a special architecture from Neural networks, in which two networks act in opposition to one another and thus learn from one another. GANs were first introduced in 2014 by Ian Goodfellow and his colleagues and have since enjoyed great popularity in the research community.

To explain the principle of generative models, let us first use an analogy. There are three actors in our story:

  • Gabi is an art forger. Her greatest goal is to have one of her counterfeit paintings auctioned off at Miller’s auction house.
  • Diana is an intern at Miller’s. Your job is to sort out fake paintings.
  • Olivia is the chief buyer at Miller’s. She is infallible and knows exactly which images are authentic and which are not. But Olivia is about to retire and therefore wants to train Diana.

Gabi and Diana are still very inexperienced at the beginning of our story, but they continue to develop their respective skills over time. The learning process runs in cycles. At the beginning of each cycle, Gabi makes some art forgeries and submits them to Miller’s. There they arrive, along with real paintings, to Diana, who is supposed to recognize which pictures are real and which are not. Diana writes down her assessment of each picture and sends it to Gabi and Olivia. Olivia can use her experience to correct Diana's assessments, which allows her to draw conclusions and better identify falsifications in the future. Gabi, in turn, sees how her fakes have arrived at the auction house and tries to paint better fakes based on this. The next cycle begins.

With this brief illustration, the idea of ​​GANs becomes clear. There are two neural networks that play against each other and learn from the results of the other network. The first network that generator, receives a random signal as input and creates an image from it. Together with instances of the training data set, i.e. real images, this forms the input of the second network, the Discriminator. The task of the discriminator is to decide which images come from the training data set and which from the generator. The discriminator learns from his assumptions, as an oracle then conveys the ground truth, i.e. the true origin of the images, to him. How the learning process takes place in the networks themselves was explained here, for example.

What does the signature in Edmond's portrait mean?

As is customary with paintings, the portrait of Edmond de Belamy bears the artist's signature. But how should an algorithm sign? Obvious decided to:


Signature of "Edmond de Belamy". Source / Copyright: Obvious

This formula describes a GAN from a mathematical and game theory perspective. This is a typical minimax game in which two opponents pursue the optimal strategy in each case and thus end up in Nash equilibrium. If we take a closer look at the equation, we recognize our actors again: the discriminator D and generator G. The discriminator tries to maximize its own performance, expressed by the cross entropy. The generator on the other hand pursues a minimization strategy: It wants to minimize the performance of the discriminator on its own images (again expressed by the cross entropy).

A simple GAN with Keras and Tensorflow

In the following we would like to show how you can implement a simple generative model yourself with Keras and Tensorflow, which learns and reproduces handwritten digits. The implementation is based on the original version by Ian Goodfellow, and does not take into account the numerous improvement strategies that have appeared since then. The complete code is also available as an IPython (Jupyter) notebook.

In the first step we load our data set into our notebook. We use the MNIST dataset for this, which is very often used as a demonstration for GANs. We calculate the resolution of the images and normalize them so that they are in the interval [0, 1]. This improves the stability of the learning process.


Three sample images from the MNIST data set.

Next we define our discriminator. He receives an image as input, which we reduce over three hidden layers. As an activation function between the layers, we use the Leaky Rectified Linear Units, which gives us a weak activation for negative values ​​and a linear activation for positive values. The output layer contains only a single node which will contain the information about whether the image comes from the training data set or from the generator.

We now define our generator. It receives a random noise signal as input. We increase the resolution of the signal over three hidden layers until we get the desired resolution of the MNIST images in the last layer.

For our training process, it makes sense to only train one network at a time. Therefore, in each epoch, we will train the discriminator first and then the generator. For the generator, we compile a combined model in which the discriminator cannot be trained. As an optimizer we use Adam, a variant of the stochastic gradient descent method, and the binary cross entropy as an error function.

We train in minibatches, i.e. always with a subset of the training data set. Half of our batches consist of real images, the other half of generated images.

Here are some images that our trained generator produced:

Even if some pictures may not be quite convincing, you can recognize the potential of generative models with this simple implementation.

How can I improve my model?

Numerous research groups are dealing with this question. Since the original publication by Ian Goodfellow, over 5000 papers have been published that suggest approaches for better results from GANs or test new applications. One popular improvement is the use of Batch normalization, since the generator has often stuck to a local minimum and only produced similar images. The Minibatch Discrimination introduced in which the discriminator is presented with several images at the same time.

Another approach suggests initializing the generator and discriminator with a low image resolution and increasing them gradually over the course of training. Finally, increasing the network parameters and the size of the minibatches also resulted in very good results.

What are possible uses of GANs?

At first glance, GANs seem to be just an (intelligent) gimmick to create more or less realistic images. But GANs can also have very practical applications. Here are three examples:

In many crime series, the investigators take the noisy image of a surveillance camera and increase its resolution in a seemingly magical way, whereby they can recognize important information such as the license plate. What was previously fiction could soon actually be used, because Christian Ledig and colleagues presented SRGAN, which can significantly increase the resolution of images.


Source: Tensorlayer / SRGAN

Countless (video) camera recordings from the early 20th century only exist in black and white. Subsequent coloring is rarely done because it has to be done manually, which is time-consuming and expensive. The GAN from Kamyar Nazeri and colleagues learns to colorize grayscale images.


From left to right: grayscale, original, SRGAN. Source: ImagingLab

The fashion industry could also benefit from the use of generative networks. There have already been successful attempts to develop new clothing styles or to clothe models in order to avoid expensive photo shoots.

Conclusion

In this article we looked at the basic idea behind Generative Adversarial Networks. These represent a special form of neural networks in which two sub-networks try to outsmart each other by playing a minimax game. This enables one of the two networks to generate new types of images. Getting started with GANs is not complicated, as we were able to experience ourselves with our simple implementation. Even if the full potential of the networks does not seem to be exhausted, it is already showing a high level of relevance for a wide variety of fields of application.

If you would like to learn more about artificial intelligence, you can find more information on our website.