scispace - formally typeset
Search or ask a question
Author

Yelong Shen

Other affiliations: Tencent, Duke University, Johns Hopkins University  ...read more
Bio: Yelong Shen is an academic researcher from Microsoft. The author has contributed to research in topics: Computer science & Question answering. The author has an hindex of 26, co-authored 85 publications receiving 3829 citations. Previous affiliations of Yelong Shen include Tencent & Duke University.


Papers
More filters
Proceedings ArticleDOI
03 Nov 2014
TL;DR: A new latent semantic model that incorporates a convolutional-pooling structure over word sequences to learn low-dimensional, semantic vector representations for search queries and Web documents is proposed.
Abstract: In this paper, we propose a new latent semantic model that incorporates a convolutional-pooling structure over word sequences to learn low-dimensional, semantic vector representations for search queries and Web documents. In order to capture the rich contextual structures in a query or a document, we start with each word within a temporal context window in a word sequence to directly capture contextual features at the word n-gram level. Next, the salient word n-gram features in the word sequence are discovered by the model and are then aggregated to form a sentence-level feature vector. Finally, a non-linear transformation is applied to extract high-level semantic information to generate a continuous vector representation for the full text string. The proposed convolutional latent semantic model (CLSM) is trained on clickthrough data and is evaluated on a Web document ranking task using a large-scale, real-world data set. Results show that the proposed model effectively captures salient semantic information in queries and documents for the task while significantly outperforming previous state-of-the-art semantic models.

723 citations

Proceedings ArticleDOI
07 Apr 2014
TL;DR: This paper presents a series of new latent semantic models based on a convolutional neural network to learn low-dimensional semantic vectors for search queries and Web documents that significantly outperforms other se-mantic models in retrieval performance.
Abstract: This paper presents a series of new latent semantic models based on a convolutional neural network (CNN) to learn low-dimensional semantic vectors for search queries and Web documents. By using the convolution-max pooling operation, local contextual information at the word n-gram level is modeled first. Then, salient local fea-tures in a word sequence are combined to form a global feature vector. Finally, the high-level semantic information of the word sequence is extracted to form a global vector representation. The proposed models are trained on clickthrough data by maximizing the conditional likelihood of clicked documents given a query, us-ing stochastic gradient ascent. The new models are evaluated on a Web document ranking task using a large-scale, real-world data set. Results show that our model significantly outperforms other se-mantic models, which were state-of-the-art in retrieval performance prior to this work.

706 citations

Journal ArticleDOI
TL;DR: In this article, the LSTM-RNN model was used for sentence embedding in a web search engine and the results showed that the proposed method significantly outperformed the Paragraph Vector method for web document retrieval task.
Abstract: This paper develops a model that addresses sentence embedding, a hot topic in current natural language processing research, using recurrent neural networks (RNN) with Long Short-Term Memory (LSTM) cells. The proposed LSTM-RNN model sequentially takes each word in a sentence, extracts its information, and embeds it into a semantic vector. Due to its ability to capture long term memory, the LSTM-RNN accumulates increasingly richer information as it goes through the sentence, and when it reaches the last word, the hidden layer of the network provides a semantic representation of the whole sentence. In this paper, the LSTM-RNN is trained in a weakly supervised manner on user click-through data logged by a commercial web search engine. Visualization and analysis are performed to understand how the embedding process works. The model is found to automatically attenuate the unimportant words and detect the salient keywords in the sentence. Furthermore, these detected keywords are found to automatically activate different cells of the LSTM-RNN, where words belonging to a similar topic activate the same cell. As a semantic representation of the sentence, the embedding vector can be used in many different applications. These automatic keyword detection and topic allocation abilities enabled by the LSTM-RNN allow the network to perform document retrieval, a difficult language processing task, where the similarity between the query and documents can be measured by the distance between their corresponding sentence embedding vectors computed by the LSTM-RNN. On a web search task, the LSTM-RNN embedding is shown to significantly outperform several existing state of the art methods. We emphasize that the proposed model generates sentence embedding vectors that are specially useful for web document retrieval tasks. A comparison with a well known general sentence embedding method, the Paragraph Vector, is performed. The results show that the proposed method in this paper significantly outperforms Paragraph Vector method for web document retrieval task.

659 citations

Proceedings ArticleDOI
13 Aug 2017
TL;DR: In this paper, a neural network architecture called the Reasoning Network (ReasoNet) is proposed for machine comprehension tasks, which makes use of multiple turns to effectively exploit and then reason over the relation among queries, documents, and answers.
Abstract: Teaching a computer to read and answer general questions pertaining to a document is a challenging yet unsolved problem. In this paper, we describe a novel neural network architecture called the Reasoning Network (ReasoNet) for machine comprehension tasks. ReasoNets make use of multiple turns to effectively exploit and then reason over the relation among queries, documents, and answers. Different from previous approaches using a fixed number of turns during inference, ReasoNets introduce a termination state to relax this constraint on the reasoning depth. With the use of reinforcement learning, ReasoNets can dynamically determine whether to continue the comprehension process after digesting intermediate results, or to terminate reading when it concludes that existing information is adequate to produce an answer. ReasoNets achieve superior performance in machine comprehension datasets, including unstructured CNN and Daily Mail datasets, the Stanford SQuAD dataset, and a structured Graph Reachability dataset.

254 citations

Proceedings Article
01 Jan 2016
TL;DR: In this paper, a neural network architecture called the Reasoning Network (ReasoNet) is proposed for machine comprehension tasks, which makes use of multiple turns to effectively exploit and then reason over the relation among queries, documents, and answers.
Abstract: Teaching a computer to read and answer general questions pertaining to a document is a challenging yet unsolved problem. In this paper, we describe a novel neural network architecture called the Reasoning Network (ReasoNet) for machine comprehension tasks. ReasoNets make use of multiple turns to effectively exploit and then reason over the relation among queries, documents, and answers. Different from previous approaches using a fixed number of turns during inference, ReasoNets introduce a termination state to relax this constraint on the reasoning depth. With the use of reinforcement learning, ReasoNets can dynamically determine whether to continue the comprehension process after digesting intermediate results, or to terminate reading when it concludes that existing information is adequate to produce an answer. ReasoNets achieve superior performance in machine comprehension datasets, including unstructured CNN and Daily Mail datasets, the Stanford SQuAD dataset, and a structured Graph Reachability dataset.

207 citations


Cited by
More filters
Proceedings ArticleDOI
Yoon Kim1
25 Aug 2014
TL;DR: The CNN models discussed herein improve upon the state of the art on 4 out of 7 tasks, which include sentiment analysis and question classification, and are proposed to allow for the use of both task-specific and static vectors.
Abstract: We report on a series of experiments with convolutional neural networks (CNN) trained on top of pre-trained word vectors for sentence-level classification tasks. We show that a simple CNN with little hyperparameter tuning and static vectors achieves excellent results on multiple benchmarks. Learning task-specific vectors through fine-tuning offers further gains in performance. We additionally propose a simple modification to the architecture to allow for the use of both task-specific and static vectors. The CNN models discussed herein improve upon the state of the art on 4 out of 7 tasks, which include sentiment analysis and question classification.

9,776 citations

Posted Content
Yoon Kim1
TL;DR: In this article, CNNs are trained on top of pre-trained word vectors for sentence-level classification tasks and a simple CNN with little hyperparameter tuning and static vectors achieves excellent results on multiple benchmarks.
Abstract: We report on a series of experiments with convolutional neural networks (CNN) trained on top of pre-trained word vectors for sentence-level classification tasks. We show that a simple CNN with little hyperparameter tuning and static vectors achieves excellent results on multiple benchmarks. Learning task-specific vectors through fine-tuning offers further gains in performance. We additionally propose a simple modification to the architecture to allow for the use of both task-specific and static vectors. The CNN models discussed herein improve upon the state of the art on 4 out of 7 tasks, which include sentiment analysis and question classification.

7,826 citations

Proceedings ArticleDOI
15 Feb 2018
TL;DR: This paper introduced a new type of deep contextualized word representation that models both complex characteristics of word use (e.g., syntax and semantics), and how these uses vary across linguistic contexts (i.e., to model polysemy).
Abstract: We introduce a new type of deep contextualized word representation that models both (1) complex characteristics of word use (e.g., syntax and semantics), and (2) how these uses vary across linguistic contexts (i.e., to model polysemy). Our word vectors are learned functions of the internal states of a deep bidirectional language model (biLM), which is pre-trained on a large text corpus. We show that these representations can be easily added to existing models and significantly improve the state of the art across six challenging NLP problems, including question answering, textual entailment and sentiment analysis. We also present an analysis showing that exposing the deep internals of the pre-trained network is crucial, allowing downstream models to mix different types of semi-supervision signals.

7,412 citations

Proceedings ArticleDOI
13 Jun 2016
TL;DR: Experiments conducted on six large scale text classification tasks demonstrate that the proposed architecture outperform previous methods by a substantial margin.
Abstract: We propose a hierarchical attention network for document classification. Our model has two distinctive characteristics: (i) it has a hierarchical structure that mirrors the hierarchical structure of documents; (ii) it has two levels of attention mechanisms applied at the wordand sentence-level, enabling it to attend differentially to more and less important content when constructing the document representation. Experiments conducted on six large scale text classification tasks demonstrate that the proposed architecture outperform previous methods by a substantial margin. Visualization of the attention layers illustrates that the model selects qualitatively informative words and sentences.

4,282 citations

Proceedings Article
07 Dec 2015
TL;DR: In this paper, the use of character-level convolutional networks (ConvNets) for text classification has been explored and compared with traditional models such as bag of words, n-grams and their TFIDF variants.
Abstract: This article offers an empirical exploration on the use of character-level convolutional networks (ConvNets) for text classification. We constructed several large-scale datasets to show that character-level convolutional networks could achieve state-of-the-art or competitive results. Comparisons are offered against traditional models such as bag of words, n-grams and their TFIDF variants, and deep learning models such as word-based ConvNets and recurrent neural networks.

3,052 citations