Select Continuous Expression of Natural Language Processing Tasks | Microsoft IJCAI2016 Presentation PPT

Microsoft Research Institute described its application of deep learning and deep neural networks to different scenarios in the Tutorial of IJCAI 2016. The second part mentioned the application of deep learning in statistical machine translation and conversation. This article is part 3—Selected Continuous expression of natural language processing tasks.

Joint compilation: Blake, Zhang Min, Chen Chun


Select continuous expression of natural language processing tasks

l Deep Semantic Similarity Model (DSSM) for Information Retrieval and Personal Sorting

l Intensive reinforcement learning in a continuous semantic natural language processing task environment

l Multiple semantic learning and reasoning for subtitles and visual questions


Continuous semantic representation of natural language learning, for example: from the original statement to an abstract semantic vector

Sent2Vec is very important in many natural language processing tasks. It can handle issues including web search, ad selection, text sorting, online recommendation, machine translation, knowledge architecture, question answering, personalized recommendations, image search, icon annotations, and more.

Supervision issues:

although

l The text semantics that need to be learned are hidden

l There is no clear target learning model

l Also I do not know how to reverse transmission?

But luckily

l We generally know if two texts are "similar"

l This is the signal of semantic expression learning

Deep architecture semantic model

Depth Architecture Semantic Model/Deep Semantic Similarity Model transforms the entire sentence into a continuous semantic space. For example: sentence becomes vector

DSSM builds on character (non-vocabulary) for its scalability and universality

DSSM is trained to optimize similar drive objects

Encode at the character level, which is a lexical hash.

The DSSM established at the character level breaks down any vocabulary into a series of related characters and tends to handle large-scale natural language tasks.

DSSM: A Similar Driving Sent2Vec Model

Initialization: neural networks are initialized with random weights

Training: Computing cosine similarity between semantic vectors

operation hours

Training objective: loss based on cosine similarity

Use web search as an example:

Query q with a series of documents D

Goal: The possibility of clicking on a document after a given query

Using Curl Neural Networks in DSSM

Modeling local texts at the curl layer

Modeling Global Text at the Convergence Layer

The model captures the meaning of the relevant words of the local text in the curl layer and learns the embedded vector of each local text related word.

CDSSM: What happened at the largest convergence layer?

Aggregate local themes to form a global

Identify the major words/phrases of the largest aggregation layer

Obtaining the most active neuron vocabulary at the largest aggregation layer

For the DSSM of the learning search, the training dataset searches for semantically relevant text groups in the records.


Experiment settings

The curl depth semantic model achieves the best result.

Semantic matching example

Convolution DSSM:

Convolution hidden layers are coded one after the other

The hidden layer semantically encodes the entire sentence in the last vocabulary

Training models with cosine similar driving goals

Use long-term short-term memory (LSTM) results:

LSTM learns much faster than regular RNN

LSTM can effectively represent semantic information using vector sentences

DSSM vs. Seq2Seq

Give a special user's article span representing interest entities and look for entity supplements.

Learn Contextual Entity Order of DSSM

Extract tag pairs from web browsing logs, hyperlinks to Wikipedia P`, anchor text for P's in H, environmental vocabulary, text

Contextual entity search - experiment setup:

1. Training/verification data is 18M user clicks on wiki page

2. Evaluate the data: First sample the 10k webpage file as the source file, then use the named entity in the document as the query; third, keep 100 return files as the target file, and finally manually mark whether each target file can be well described entity. There are a total of 870k pairs.

Contextual entity search results

Some related work:

Depth CNN text input (the main grading task in the article)

Sequence-to-sequence learning; paragraph vector (learning paragraph vector)

Recursive NN (tree structure like decomposition)

Tensor product representation (tree representation)

Tree Structure LSTM Network (Tree Structure LSTM)

Reinforcement learning models - including environmental status settings S; action settings A, and transition rules between states; instant reward rules and agent observation rules for state transitions.

Q-learning is a policy for learning RLs (rules that an agent must follow when selecting actions for a given current state). Its purpose is to find the optimal policy of MDP by learning an action-value function, aka. Q-function: Used to calculate the expected utility of actions taken after the training has converged.

Recent Achievements: The Deep Q Network achieved human-level performance when playing five Atari games. The convolutional neural network is used to compute Q(s,a) and utilizes a large action space, ignoring the small action space.

Recent Achievements (continued): Robot AlphaGO defeated the world Go champion. It is similar to the deep Q network setting, ignoring the small action space. Two CNNs (policy and value networks) are established in its model.

Enhanced learning of language understanding: states and actions are described in the form of text strings, and agents act accordingly through text strings (correctly maximizing long-term rewards). Then, the state of the environment is changed to a new state, and the agent is also rewarded instantly.

The action space and the state space are very large, and the characteristics of the action are determined by the unbounded neural language description.

In the NLP mission, the characteristics of the action space are determined by the neural language, which is discrete and nearly borderless. We propose a deep reinforcement related network to plan both state and space into a continuous space, where the Q-function is the correlation function of the state vector and the behavior vector.

To learn the visualization of the continuous space, Figure 2 shows the text PCA project of 200,400,600 training segments after the embedded state vector and related action vectors. State means: When you move forward. People around you have a horrified expression on their faces and flee the streets. Action 1 (good choice): Look up, Action 2 (bad choice): Disregard others' warnings and move on.

DRRN and DQN test results on two text games (learning curve)

The table shows sample values ​​of the aggregated Q function, and DRNN is a good summary of the unseen behavior.

Humans learn to process texts, images, and joint information.

Image-side convolutional network implementation process

Language-side convolutional network implementation process

The image shows that the MELE detects the key concepts in the image through the deep learning mode, and the MELE generates an interpretation from the image detection. After detecting words, they are rearranged to form sentences.

CaptionBot sample

Explanation to the process of answering questions



Stacked attention mode inference engineering: problem mode, image mode, multi-level attention mode, answer predictor.

SAN image mode

Problem model in SAN

Answer example

to sum up:

This section mainly introduces concepts such as learning Sent2Vec through DSSM, strengthening learning of NLP tasks in continuous space, and visual language joint expression learning. Specifically include:

Learn Sent2Vec with DSSM:

DSSM places the entire sentence in a continuous space

Create DSSM based on feature character level

DSSM directly optimizes the semantic similarity of objective function

Intensive learning of NLP tasks in continuous space:

Use deep neural networks to put states and actions (determined by borderless NL) into continuous semantic space

Calculating Q Functions in Continuous Semantic Spaces

Visual language joint expression learning:

Image Interpretation - CaptopnBot Sample

Visual Question Answer - The Key is Reasoning

PS : This article was compiled by Lei Feng Network (search “Lei Feng Network” public number) and it was compiled without permission.

Via Microsoft IJCAI2016

Posted on