Basic review of GAN

酥炸物 · 发表于 2017-10-15 22:16:29

本帖最后由酥炸物于 2017-10-15 22:18 编辑

Basic reviewof GAN

A computer canstore a few PB photos, but it doesn't know what makes a bunch of pixels have ameaning related to someone's appearance. Deep learning enables computers torecognize faces. Over the years, this problem has been solved by variousgeneration models. They use different assumptions, usually too powerful and notpractical, to simulate the underlying distribution of data.

For most of ourcurrent tasks, the results are not optimal. The text generated using the hiddenMarkov model is very boring and predictable, and the image from the variationalautoencoder is blurred, although the name is insufficient, but lacks diversity.All of these shortcomings require a completely new approach that has recentlybeen invented.

In this article,our goal is to provide a comprehensive overview of the concepts behind theGenerating Confrontation Network (GAN), showing that the main architectureswill be a good starting point and provide you with a range of techniques thatcan significantly improve the results of your experiments.

The basic idea ofgenerating a model is to collect a set of training examples and form arepresentation of their probability distribution. The usual approach is todirectly infer the probability density function. When I first learned togenerate models, I couldn't help but wonder - why do we have to worry when wehave so many examples of real-life training? The answer is very compelling,here are just a few applications that might need a good build model:

1. Simulate thepossible outcomes of the experiment, reduce costs and speed up research. 2. Action plan using predicted future state -Imagine the GAN that "knows" the road situation in the next moment. 3. Generate missing data and tags - we oftenlack clean data in the correct format and cause

overfitting. 4. High quality speech generation 5. Automatic quality improvement of photos(image super resolution)

In 2014, IanGoodfellow and colleagues from the University of Montreal introduced GenerativeAdversarial Networks (GAN). This is a new way to learn the underlyingdistribution of data, and it can generate artificial objects that look verysimilar to real life. The idea behind GAN is very simple. Two networks - onegenerator and one discriminator play against each other. The goal of thegenerator is to generate an object, such as a person's picture, that looks likea real object. The goal of Discriminator is to be able to distinguish thedifference between generated and real images.

This figure givesan overview of the generative confrontation network. The most important thingto understand at the moment is to understand that GAN is a way to make twonetworks work together - and both Generator and Discriminator have their ownarchitecture. To better understand the source of this idea, we need to recallsome basic algebra and ask ourselves - how can we deceive a neural network thatbetter classifies images than most people?

Before we describeGAN in detail, let's take a look at a similar topic. Given a well-trainedclassifier, can we generate a sample of a spoofed network? If we do this, whatwould it look like?

It turns out thatwe can. Even more - For almost any given image classifier, you can transform animage into another image, which is highly misclassified and at the same timevisually indistinguishable from the original image! This process is called aconfrontational attack, and the simplicity of the generation method explains alot of content about GAN. The confrontation example in the carefully calculatedexample, whose purpose is misclassification. The following is a description ofthis process. The panda on the left is indistinguishable from the panda on theright - but it is classified as a gibbons.

Image classifiersare essentially complex decision boundaries in high dimensional space. Ofcourse, we can't draw this boundary when categorizing images. But we can safelyassume that when the training is over, the network is not generalized for allimages - only for those images we have in the training set. This generalizationmay not be a good approximation of real life. In other words, it works for ourdata - we will use it.

Let's start addingrandom noise to the image and making it very close to zero. We can do this bycontrolling the L2 norm of the noise. Mathematical symbols should not worry you- for all practical purposes, you can think of the L2 norm as the length of thevector. The trick here is that the more pixels you have in the image - thelarger the average L2 norm. Therefore, if the norm of noise is low enough, youcan expect it to be visually imperceptible, and the corrupted image will be farfrom the original image in vector space.

If the HxW imageis a vector, then the HxW noise we add to it is also a vector. The originalimage has quite a variety of colors - this adds to the L2 specification. On theother hand, noise is a set of visually confusingly rather pale pixels - avector of small norms. Finally, we add them together to get a new vector forthe damaged image, which is relatively close to the original image - butmisclassified!

Now, if thedecision boundary of the original class Dog is not so far (in terms of the L2norm), this additive noise places the new image outside of the decisionboundary. You don't need to be a world-class topologyist to understand certaincategories of manifolds or decision boundaries. Since each image is only avector in a high dimensional space, the classifier trained on it defines"all monkeys" as "all image vectors in the high dimensional spotdescribed by the hidden parameters". We call this blob the decisionboundary of this class.

If it is not theclassifier of the previous section, we have a network designed for only twoclasses - "real" and "false"? According to the symbols usedin the original paper by Goodfellow et al., we call them discriminators.

Let's add anothernetwork to learn to generate fake images, and Discriminator classifies itserrors as "real." This process will be exactly the same as the one weused in the Countermeasure example section. This network is called a generatorand gives it its fascinating character against the training process. At eachstep of the training, we identify a bunch of images from the training set and abunch of fake images, so it can better distinguish them. As you remember fromstatistical learning

theory, thisessentially means the underlying distribution of learning data.

There is anancient and wonderful mathematical result (Minimax theorem) that begins withwhat we know about game theory and points out that for two participants in azero-sum game, the minimally large solution is the same as the Nashequilibrium. In simple terms, when two players (D and G) compete with eachother (zero sum game), and both assume that their opponent is the best (minimummax strategy), the best effect, the result is predetermined And no one playercan change it (Nash Equilibrium). So for our network, this means that if wetrain them long enough, the generator will learn how to sample from the real"distribution", which means it will start generating realistic lifeimages, and The discriminator will not be able to tell the difference betweentrue and false.

1399249321 · 发表于 2017-10-25 22:03:37

有图有正想

生物科学家xq · 发表于 2017-10-15 22:52:45

笑摸二楼狗头

婷姐是我最爱 · 发表于 2017-10-16 09:55:27

我去年买了表

火中剪影 · 发表于 2017-10-21 00:16:03

看到楼主我有种智商上的优越

赵宇振i · 发表于 2017-10-23 01:42:32

笑而不语

帅气の大叔 · 发表于 2017-10-20 17:11:12

我不是随便的人，随便起来不是人

sggs嘛 · 发表于 2017-10-25 01:23:19

楼主继续加油啊

龐窿 · 发表于 2017-10-17 09:37:48

围观

爱没有字的窗 · 发表于 2017-10-18 03:37:11

喜闻乐见

		自动登录	找回密码
密码			立即注册

Basic review of GAN

本帖子中包含更多资源