Generative Adversarial Networks

When the relationship between the input and the output is uncertain, the neural networks are used. They are categorized as supervised, unsupervised and reinforcement learning. The Generative Adversarial Networks (GANs) are a class of unsupervised learning used for generative modelling. Through iterative process, the generative model generates new data and are derived from a realistic data and the randomly generated noise which are similar to the existing data. In each iteration, the loss is calculated and back propagated to obtain an optimized model. Figure 1 illustrates how a CycleGAN translated the brush art into a photograph.

The GAN gets its name from a model which is generative in nature with adversarial meaning setting generator model against discriminator model and network for training deep neural networks. A discriminator model plus a generator model make up a GAN.

Generator model

A generator model is applied to generate synthetic data. It transforms random noise into intricate data samples that nearly mimic the original data. The generator’s goals are to produce images, reduce the loss function, and deceive the discriminator.

Discriminator Model

The Discriminator Model’s function is to distinguish between the actual data and the model produced by the generator. Discriminator models use discriminator functions to evaluate input samples and probability allocation, so acting as a binary classifier.

Types of GANs

Vanilla GAN: It is the simplest neural network with multilayer perceptron and optimizes the mathematical equation using stochastic gradient descent.

Conditional GAN: It is the GAN which uses labels in the discriminator and a conditional parameter Y in the generator to differentiate between real and false data.

Deep Convolutional GAN: It is the GAN made of ConvNets with convolutional stride instead of max pooling with not fully connected layers.

Laplacian Pyramid GAN: It is the GAN is of multiple number of Generator and multiple number of Discriminator with different levels of linear invertible image representation called Laplacian Pyramid.

Super Resolution GAN: It is the GAN with an adversarial network which optimally upscale a low resolution image to high resolution with minimized error.

Working

The GAN starts with initialization and learning process.

Initialization: The generator uses its internal layer and learnt pattern to take a noise vector and turn it into fresh data. The generator generates fresh data that resembles the original. The generator provides both the updated and original data to the discriminator. The discriminator distinguishes the training data into real and fake data and outputs 0 for fake data or 1 for real data.

Learning Process: If the discriminator identifies the data correctly, both generator and discriminator are rewarded a little. If the discriminator fails to identify, the generator is significantly updated while discriminator is penalized. If the discriminator identifies the fake data, no rewards for the generator while discriminator is strengthened by the update. The learning process continuous till the discriminator is half time fooled by the generator. Finally, generator gets better adapted so that the discriminator fails to distinguish and the generator becomes well trained and can be used to generate new realistic samples.

Advantages

GANs can produce synthetic data for data augmentation and generative art.
GANs can produce high resolution data from a low resolution data like image, video and music.

Disadvantages

They are difficult and are prone to instability or may fails to convergence.
It requires high computational resources and hence slow to train.
It can over fit, reflect biases and unfairness in the applications.

Applications

They are used to synthesis realistic pictures and generate artworks.
They are used in Image-to-Image translation, which is the process of superimpose an input image from one domain with other image while retaining its key features. The transformation of day to night is one such example.
They are used in Text-to-Image synthesis to generate pictures from the text description.
They are used to augment the data which are used to train the machine-learning models for robustness and generalizability.
They are used in data generation for training to increase the resolution of low quality images. This is possible by training the low and high resolution image pair. It is useful in medical imaging, video enhancement etc.

Source: