(NIPS 2015) Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks Paper: http://arxiv.org/abs/1506.05751 Code: https://github.com/facebook/eyescream
In this paper we introduce a generative parametric model capable of producing high quality samples of natural images.
Our approach uses a cascade of convolutional networks within a Laplacian pyramid framework to generate images in a coarse-to-fine fashion.
Building a good generative model of natural images has been a fundamental problem within computer vision.
However, images are complex and high dimensional, making them hard to model well, despite extensive efforts.
we exploit the multiscale structure of natural images, building a series of generative models, each of which captures image structure at a particular scale of a Laplacian pyramid [1].
At each scale we train a convolutional network-based generative model using the Generative Adversarial Networks (GAN) approach of Goodfellow et al. [11]. Samples are drawn in a coarse-to-fine fashion, commencing with a low-frequency residual image.
The second stage samples the band-pass structure at the next level, conditioned on the sampled residual.
The Laplacian pyramid [1] is a linear invertible image representation consisting of a set of band-pass images, spaced an octave apart, plus a low-frequency residual.
Our proposed approach combines the conditional GAN model with a Laplacian pyramid representation.
The generative models {G0,⋯,GK} are trained using the CGAN approach at each level of the pyramid.
Specifically, we construct a Laplacian pyramid from each training image I. At each level we make a stochastic choice (with equal probability) to either
(i) construct the coefficients hk either using the standard procedure from Eqn. 3,
or
(ii) generate them using Gk .
Figure 1: The sampling procedure for our LAPGAN model.
We start with a noise sample z3 (right side) and use a generative model G3 to generate I~3 .
This is upsampled (green arrow) and then used as the conditioning variable (orange arrow) l2 for the generative model at the next level, G2 .
Together with another noise sample z2 , G2 generates a difference image h~2 which is added to l2 to create I~2 .
This process repeats across two subsequent levels to yield a final full resolution sample I~0 .
Figure 2: The training procedure for our LAPGAN model.
Starting with a 64x64 input image I from our training set (top left):
(i) we take I0=I and blur and downsample it by a factor of two (red arrow) to produce I1 ;
(ii) we upsample I1 by a factor of two (green arrow), giving a low-pass version l0 of I0 ;
(iii) with equal probability we use l0 to create either a real or a generated example for the discriminative model D0 .
In the real case (blue arrows), we compute high-pass h0=I0−l0 which is input to D0 that computes the probability of it being real vs generated.
In the generated case (magenta arrows), the generative network G0 receives as input a random noise vector z0 and l0 . It outputs a generated high-pass image h~0=G0(z0,l0) , which is input to D0 .
In both the real/generated cases, D0 also receives l0 (orange arrow).
Optimizing Eqn. 2, G0 thus learns to generate realistic high-frequency structure h~0 consistent with the low-pass image l0 .
The same procedure is repeated at scales 1 and 2, using I1 and I2 .
Note that the models at each level are trained independently.
At level 3, I3 is an 8×8 image, simple enough to be modeled directly with a standard GANs G3 & D3 .