Yann LeCun's latest insights on AI | In-depth study of existing questions, do you want to know the answer?

Joint compilation: Chen Zhen, Gao Fei, Zhang Min

What are the limitations of deep learning?

Some "typical" forms of deep learning include various combinations of various feedforward modules (convolutional networks) and recursive networks (sometimes presented in the form of memory units such as LSTM or MemNN).

These models have limitations in their â€œreasoningâ€ capabilities, that is, performing complex reasoning or optimization procedures to get a proper answer. The calculation step is constrained by both the number of layers in the feed-forward network and the length of time the recursive network has to remember things.

In order to make in-depth learning structures have reasoning capabilities, we need to adjust the structure so that it can not only calculate a single output result (for example, an image of understanding, a sentence translation, etc.), but also able to produce A series of alternative output results (for example, a sentence for a different translation method). This is where the energy-based model is designed: to score each potential configuration of all the variables that will be inferred. Factor maps (non-probabilistic graphical models) are a special case of energy-based models. Combining the factor graph with the learning system is called the "structural prediction" method in machine learning. Back in the early 1990s, there were many proposals for combining neural networks with structural prediction methods. In fact, the detection reading system that my colleague and I established at Bell Labs at the time used a structural prediction method in addition to the convolutional network. We call this â€œGraphics Transformation Networkâ€. About pasting the graphic model to ConvNets and training end-to-end algorithms, many research results have recently emerged.

For more information on energy-based models and structural predictions on neural networks, please refer to this article: Details

In terms of the current form of deep learning, it certainly has limitations because all successful applications concerning its use require supervisory learning with the help of manually annotated data. We need to find ways to train large-scale neural networks from "raw" non-annotated data, so that these networks can become familiar with the operating laws of the real world. As I said in my previous interview, my money is invested in confrontation training.

When can you see the theoretical background and mathematical foundation of deep learning?

This is a very popular research topic. I am very happy to see famous mathematicians and theoretical physicists paying more and more attention to the theory behind deep learning.

One of the theoretical difficulties is why non-convex optimization is needed when the deep neural network training works seem reliable. One of the intuitions is that it is very difficult to optimize the non-convex function, because we will fall into a very small part and will slow down with the arrival of the plateau and the saddle point. The plateau or saddle point may be a problem, but the local minimum does not. Our intuition is wrong because we can plan energy maps in low dimensions. However, the subjective goal of the deep neural network usually has more than 100 million dimensions or more. But building boxes in more than 100 million dimensions is very difficult. Because there will be many restrictions. This is some of the theoretical work I conducted in the New York University experiment, and Yoshua Bengio's laboratory is also engaged in this direction. They use mathematical tools from random matrix theory or statistical mechanisms.

Another interesting theoretical question is why multi-layered networks can help neural networks. All of the Boolean function's finite nodes can be implemented within 2 levels (using function merge or split mode). But most Boolean functions require an index of the formula (for example, a unit index that needs to be hidden in two neural networks). As a computer programmer, many functions become very simple if you use multiple sequential steps for function calculations (multilevel calculations). This is a multi-layer manual wave function. However, it is not clear that in the context of a similar neural network structure, more formal explanations will be made.

The third interesting question is why convolutional neural networks perform very well. Mark Tygert et al. wrote a great article about why it is right to use a convolutional network structure when performing certain signal analysis (I'm not a co-editor of this article, so all honors belong to Mark, and he still FAIR scientific researcher).

This research work was based on the results of previous research by Stephane Mallat and his doctoral student Joan Bruna on the â€œscattering shiftâ€. Scattering transformation is a similar convolutional network structure with fixed filters, and its mathematical achievements can be formally studied: Google Scholar cited (Joan was a postdoctoral fellow of the New York University Laboratory and then participated in FAIR, but in this He previously worked as an assistant professor at Berkeley in the statistics department.)

I think there are many interesting issues for theorists in deep learning, such as the random optimization of the surrounding distribution.

Is there anything that deep learning can never learn?

Obviously, with its current learning model, deep learning has limitations. However, when people find ways to construct human-level artificial intelligence approaches, concepts such as deep learning will provide a partial solution to the optimization challenges associated with deep structures.

The ideas related to the concept of deep learning are as follows:

(1) Learning is an indispensable part of AI (artificial intelligence): In the 1980s and 1990s, this view has not been widely recognized. However, I have always believed that more and more people will gradually accept this view.

(2) Deep learning advocates that an AI (artificial intelligence) system should have the ability to abstract things in life and to represent them in higher levels or levels. Regardless of the way in which the artificial intelligence system learns these characterizations, this will provide some solutions to the problems encountered in the development of AI (artificial intelligence).

(3) Whether human-level AI (artificial intelligence) can be developed in accordance with the central paradigm of machine learning, aiming to apply this central paradigm to minimize the objective function; this minimum algorithm can be used based on Gradient methods (such as the stochastic gradient descent algorithm that uses backdrop for calculations) are calculated, which is a problem in the deep learning process. If human-level AI (artificial intelligence) cannot be developed according to the central paradigm, we need to discover some new paradigms and establish new representational learning algorithms based on these new paradigms.

In addition, there is a philosophical and theoretical problem about AI (artificial intelligence) that remains to be solved: what tasks can be learned, and what tasks cannot be learned no matter how many resources are provided. In learning theory, we must put more effort on these issues. The resulting interesting results follow the â€œno free lunch theoremâ€, that is, within a controllable range, a particular learning machine can learn a small part of the task from a number of possible tasks. At present, no learning machine can learn all potential tasks efficiently. All machines have to focus on learning certain specific tasks. Our human brain is not an ordinary learning machine. Although this sounds somewhat self-deprecating, it is a fact. Although our brains seem to have a strong ability to adapt, our brains are very professional.

There are inherent inherent difficulties with any computing device. This explains why, even if we create machines with superhuman intelligence, in the real world these machines still cannot surpass human intelligence. Although these smart machines can defeat us in chess and Go, they can't accurately predict whether they are positive or negative in the game of coin flipping.

What are the potential breakthroughs in deep learning recently?

There have been many interesting advances in deep learning recently, but I cannot describe them here. However, there are some ideas that attracted my attention, and I also participated in research projects.

In my opinion, the most important one is confrontational training (also known as GAN, which generates adversarial networks). The idea was proposed by Ian Goodfellow when he was with Yoshua Bengio or a student at the University of Montjuria (he worked at Google Brain and now at OpenAI).

This is the most interesting idea put forward in ML in the past 10 years.

The central idea of â€‹â€‹this idea is to train two neural networks simultaneously. The first is called the discriminatorâ€”let's call it D(Y)â€”accept an input (for example, an image), and then output a scalar that indicates whether the image Y looks â€œnaturalâ€ or unnatural. In an adversarial example, D(Y) can be seen as some kind of energy function, but Y is a real sample (eg, from an image of a data set), it can accept a lower value; or sample Not real (for example, a strange image), it will accept a positive value. The second neural network is the generator, labeled G(Z), where Z is a vector that is randomly sampled in the sample distribution (eg, Gaussian). The generator's role is to generate the image and use it for the D(Y) function training to accept the correct image (the real image is low, the others are high). During training, D shows a real image and adjusts its parameters so that its output value is lower. Then D will display an image generated from G, and adjust its parameters so that its output D (G(Z)) becomes larger (followed by the target gradient determined by the function). But G(Z) will self-train and produce images that can fool D to mistakenly believe it to be true. It is based on the gradient of each instance of Y that D corresponds to. In other words, it tries to minimize the output of D, although D tries to maximize it. Therefore, it is called confrontation training.

The original formula is to use a rather complex probabilistic framework, but this is its main point.

Why is this very interesting? Because it allows us to train discriminators and make them an unsupervised "density estimate," for example, the contrast function, which gives us data generally low, but for others it is higher output. Discriminators must generate a good internal data representation for the proper resolution of the problem. It can then also be used as a feature processor. But what's even more interesting is that the generator can be seen as a complex surface of real data parameterization: give it a vector Z and map it to a point on the data pop. There are many papers that people are very surprised about here, such as generating an image of a room and doing arithmetic in the Z vector space.

Here is an interesting paper from FAIR on this topic:

Â· Denton et al. â€œDeep Generative Image Models using a Laplacian Pyramid of Adversarial Networksâ€ (NIPS 2015):

Â· Radford et al. â€œUnsupervised Representation Learning with Deep Convolutional Generative Adversarial Networksâ€ (ICLR 2015):

. Â· Mathieu et al "Deep multi -scale video prediction beyond mean square error": Original

The last one is to conduct confrontation training on video prediction. This solves a very important problem when you train a neural network (or other model) to predict the future, or when there are several possibilities for the future, the network that is trained in the most traditional way (for example, Least squares) predicts the average future of all possible features. In the video conditions, it will produce a fuzzy confusion. Antagonistic training allows the system to produce everything it wants, as long as it is within the limits of what is allowed. This also solves the problem of "blurring" when making predictions under uncertainty.

This seems like a very technical thing, but I really think it opens the door to the possibilities of the world.

In what direction will AI (artificial intelligence) be likely to develop in the next five to ten years?

People are currently committed to many areas and have made great progress:

1. Combining deep learning with reasoning and planning;

2. Model-based deep reinforcement learning (involving unsupervised predictive learning);

3. Reinforce reinforced recurrent neural networks (eg, memory networks) via a discernable memory module:

a. Memory Network (FAIR): Details

b. Stack enhanced RNN (FAIR): Details

c. DeepMind: Details

d. End-to-End MemNN (FAIR/NYU): Details

4. Generating model and predicting model obtained through confrontation training

5. "Microprogramming": Its core idea is to think of a program (or circuit) as a micromodule that can be trained on backdrop. This idea shows that deep learning can not only learn to identify models (such as feedforward neural networks), but also generate algorithms (such as circular algorithms, recursive algorithms, subroutine algorithms, etc.). Some relevant articles are available from DeepMind, FAIR and other resources, but these are just the results of the initial stage.

6. Layered Planning and Layered Reinforcement Learning: This is the problem of learning to split a complex task into some simple sub-tasks. It is a requirement that all intelligent systems have to meet.

7. Unsupervised Learning Prediction Models of Outward Things (eg, video predictions)

If artificial intelligence can make significant progress in these directions in the coming years, there will be a large number of smarter AI agents on dialogue systems, answering questions, and adaptive robot control and planning.

Unsupervised/predictive learning methods are designed. These learning methods allow large neural networks to â€œlearn how the real world worksâ€ by watching videos and reading books without the help of direct manual annotation of data.

This will eventually give rise to machines that have a good understanding of the real world. We may even think that these cold machines also have human "common sense." To achieve this goal, it may take 5 years, 10 years, 20 years, or even longer. We cannot yet determine the specific deadline.

If AI becomes a human threat, what will be the effective solution (if any)?

I don't think AI really threatens humans. I did not say that this is impossible, but we are foolish enough to let such things happen. Many people claim that we are smart enough to prevent such things from happening, but I do not think this is a fact. If we are smart enough to build a super human intelligent machine, the risk is that we will not be stupid enough to give them unlimited power to destroy humanity.

In addition, there is a completely erroneous view that the only way we can obtain intelligent intelligence is by other humans. There is no reason at all to prove that intelligent machines will even dominate the world or threaten humanity. The will to control is very human (and only for certain people).

Even in humans, the desire for intelligence and power is not related. In fact, the current events tell us that people with limited intelligence are overwhelming their desire for power (and some successful things).

As a manager of an industrial research laboratory, I am the owner of most people (in some ways smarter than me) (I have hired people who are smarter than me to be the target of my work).

Many of the bad things humans do are very specific to human nature. Similarly, feeling afraid to become violent, er, resources that want exclusive access, as opposed to strangers and relatives who like us, etc., are all built upon us through the survival of evolutionary species. Smart machines will not have these basic behaviors unless we explicitly establish them for them. But why should we do this?

In addition, if someone intentionally creates a dangerous, mass-intelligent artificial intelligence, other people can build a second, narrower artificial intelligence whose sole purpose is to destroy the first one. If the AIs get the same amount of computing resources, the second person will win, just as tiger sharks or viruses can kill a human being with superior intelligence.

If artificial intelligence becomes a threat to humans, what will be the effective solution (if any)?

As a manager of an industrial research laboratory, I am the owner of most people (in some ways smarter than me) (I have hired people who are smarter than me to be the target of my work).

Who is the leader in AI research, Google, Facebook, Apple or Microsoft?

I admit that I am biased, but I can say:

Â· Apple is not a player in AI research because they have a very secretive culture. You cannot do the most advanced research in private. If it is not published, it is not research, and it is at most technical development.

Â· Microsoft is doing some good work, but it's a lot of employees go to Facebook and Google. They did a great job in the deep learning lectures (and the handwriting recognition before the current boom of the century) but compared to their recent efforts in FAIR and DeepMinde, they did not seem to have much ambition in deep learning.

Â· Google (including Google Brain and other teams) may be the leader in bringing deep learning into products and services. Because they start earlier than anyone else and are a very large company. They do a lot of background work on the infrastructure (such as TensorFlow, tensor processing unit hardware...). But most of its work is focused on application and product development rather than long-term artificial intelligence research. Most of the research from Google Brain turned to DeepMind, OpenAI, or FAIR.

Â· DeepMind done a very good job in terms of learning based AI. Their long-term research goals are very similar to our FAIR targets, and most of the work points are similar: monitoring/generating models, planning, RL, games, differentiable plans for memory-enhanced networks, etc. Their challenge is to be geographically and organizationally separated from their largest internal customers (including Alphabet(Google)). It makes it harder for them to "pay for their trips" by creating revenue for their owners. But they seem to be doing very well.

Â· Facebook started FAIR 2.5 years ago and is looking for ways to become a leader in AI research within a short period of time. I was surprised at how many world-class researchers we could attract (now FAIR has about 60 researchers and engineers in New York, Menlo Park, Paris, and Seattle). I was also impressed by the quality and impact of our research over the past 2.5 years. We are ambitious. We are ready for long-term competition. We have influence on the company. This makes it easy for us to prove that we have a presence. Most importantly, we are very open: Our researchers publish many papers each year. Nothing is more sobering than seeing a promising young researcher join the less-than-open company or start and disappear from the research circle.

What is the difference between the goal of Facebook artificial intelligence research and that of other companies?

Here are our goals, institutions, and ways of operating.

First, talk about goals. We basically have a long-term goal: to understand artificial intelligence and build smart machines. This is not only a technical challenge, it is also a scientific issue. What is artificial intelligence? How can we regenerate it in the machine? Like â€œwhat is the universeâ€ and â€œwhat is lifeâ€, â€œWhat is intelligenceâ€ may be one of the most important scientific issues of our time. Ultimately, it not only helps us build intelligent machines, it also helps us understand the human brain and how the brain works.

In other words, we will find new theories, new rules, new methods, and new algorithms (applicable in the short and medium term) on the path to building true smart machines. Many other technologies quickly found ways to understand images, understand natural language, filter content/ranking, etc. in Facebook products and services.

When Mark Zuckerberg hired me at Facebook, he and CTO Mike Schroepfer (my boss) gave me great freedom to build what I think was the best FAIR.

Previously I worked in several industrial research labs (Bell Labs, AT&T Labs, NEC Research Institute, even at Xerox PARC as an intern in the 1980s) and in Microsoft Research, IBM Research, Google, DeepMind and many others All have friends (some have studied dead). So I know what kind of work is needed in an industrial research environment and does not require any kind of work. I also know why a research lab will succeed or die. These experiences taught me how to build FAIR and how to run it.

First, only companies with a long-term vision can afford advanced research laboratories with ambitious goals. This means that companies with â€œrealâ€ research laboratories are relatively large and have a good living environment in their markets (they do not need to worry about long-term survival). Historically, these companies are IBM, AT&T, Xerox, General Electric, Microsoft, and now Google and Facebook.

Second, research must be open, and researchers must be forced to publish their work. This is very important: compared to published research, the quality of research conducted in secret is almost always quite low (a bit like the quality of open source software, often better than closed source software. By the way, we are FAIR released our source code.) If they publish and pass peer-reviewed filtering, the results will be more reliable and solid. In addition, the life and occupation of a researcher is related to his/her intellectual influence. Unless you encourage them to publish their own work, you cannot attract the best research scientists. The final publication is very beneficial to the reputation of the company. Many engineers and scientists want to work for companies that are science/technical leaders/innovators. This open research philosophy allows us to easily collaborate with universities and public/non-profit research laboratories. The company monopolizes good ideas. Many good ideas come from academia (actually, most of them come from academia), but some may require infrastructure and engineering support in companies such as Facebook to maximize their potential.

Third, scientific discovery is a "bottom-up" process. We hired part of the researchers because of their good sense of smell, to choose good project work and good exploration topics. In the initial stage, a lot of research was exploratory: you have ideas and try it. You need flexible tools that allow you to quickly implement things and explore how they work. When things start, you can form a team of scientists and engineers that focus on making ideas successful and apply them to practical problems. If things go well, it becomes an engineering project. At each stage of the process, the team is getting bigger and bigger, and the proportion of engineers (and scientists) is also increasing. At FAIR, we work very closely with a team called AML (Applied Machine Learning). They are more oriented toward engineering than FAIR (though they are in the fields of machine learning/artificial intelligence, computational photography, virtual and augmented reality, etc.) There are quite a lot of incredibly cool research projects.) FAIR is 70% of research plus 30% of the project, while AML is another way. I have had experimenting experience in Bell Labs, sitting in the same corridor in a research lab and working closely with an engineering team. This model works well. The following link is a very appropriate description of the relationship between FAIR and AML: Facebook's Race To Dominate AI

Via:Session with Yann LeCun

PS : This article was compiled by Lei Feng Network (search â€œLei Feng Networkâ€ public number) and it was compiled without permission.