Showing papers in "arXiv: Computation and Language in 2015"

PDF

Open Access

Posted Content•

Bidirectional LSTM-CRF Models for Sequence Tagging

[...]

09 Aug 2015-arXiv: Computation and Language

TL;DR: This work is the first to apply a bidirectional LSTM CRF model to NLP benchmark sequence tagging data sets and it is shown that the BI-LSTM-CRF model can efficiently use both past and future input features thanks to a biddirectional L STM component.

...read moreread less

Abstract: In this paper, we propose a variety of Long Short-Term Memory (LSTM) based models for sequence tagging. These models include LSTM networks, bidirectional LSTM (BI-LSTM) networks, LSTM with a Conditional Random Field (CRF) layer (LSTM-CRF) and bidirectional LSTM with a CRF layer (BI-LSTM-CRF). Our work is the first to apply a bidirectional LSTM CRF (denoted as BI-LSTM-CRF) model to NLP benchmark sequence tagging data sets. We show that the BI-LSTM-CRF model can efficiently use both past and future input features thanks to a bidirectional LSTM component. It can also use sentence level tag information thanks to a CRF layer. The BI-LSTM-CRF model can produce state of the art (or close to) accuracy on POS, chunking and NER data sets. In addition, it is robust and has less dependence on word embedding as compared to previous observations.

...read moreread less

3,158 citations

Posted Content•

VQA: Visual Question Answering

[...]

Aishwarya Agrawal, Jiasen Lu, Stanislaw Antol, Margaret Mitchell, C. Lawrence Zitnick, Dhruv Batra, Devi Parikh - Show less +3 more

03 May 2015-arXiv: Computation and Language

TL;DR: The task of free-form and open-ended Visual Question Answering (VQA) is proposed, given an image and a natural language question about the image, the task is to provide an accurate natural language answer.

...read moreread less

Abstract: We propose the task of free-form and open-ended Visual Question Answering (VQA). Given an image and a natural language question about the image, the task is to provide an accurate natural language answer. Mirroring real-world scenarios, such as helping the visually impaired, both the questions and answers are open-ended. Visual questions selectively target different areas of an image, including background details and underlying context. As a result, a system that succeeds at VQA typically needs a more detailed understanding of the image and complex reasoning than a system producing generic image captions. Moreover, VQA is amenable to automatic evaluation, since many open-ended answers contain only a few words or a closed set of answers that can be provided in a multiple-choice format. We provide a dataset containing ~0.25M images, ~0.76M questions, and ~10M answers (this http URL), and discuss the information it provides. Numerous baselines and methods for VQA are provided and compared with human performance. Our VQA demo is available on CloudCV (this http URL).

...read moreread less

2,365 citations

Posted Content•

A Neural Conversational Model

[...]

Oriol Vinyals¹, Quoc V. Le•Institutions (1)

Google¹

19 Jun 2015-arXiv: Computation and Language

TL;DR: A simple approach to conversational modeling which uses the recently proposed sequence to sequence framework, and is able to extract knowledge from both a domain specific dataset, and from a large, noisy, and general domain dataset of movie subtitles.

...read moreread less

Abstract: Conversational modeling is an important task in natural language understanding and machine intelligence Although previous approaches exist, they are often restricted to specific domains (eg, booking an airline ticket) and require hand-crafted rules In this paper, we present a simple approach for this task which uses the recently proposed sequence to sequence framework Our model converses by predicting the next sentence given the previous sentence or sentences in a conversation The strength of our model is that it can be trained end-to-end and thus requires much fewer hand-crafted rules We find that this straightforward model can generate simple conversations given a large conversational training dataset Our preliminary results suggest that, despite optimizing the wrong objective function, the model is able to converse well It is able extract knowledge from both a domain specific dataset, and from a large, noisy, and general domain dataset of movie subtitles On a domain-specific IT helpdesk dataset, the model can find a solution to a technical problem via conversations On a noisy open-domain movie transcript dataset, the model can perform simple forms of common sense reasoning As expected, we also find that the lack of consistency is a common failure mode of our model

...read moreread less

1,774 citations

Posted Content•

A large annotated corpus for learning natural language inference

[...]

Samuel R. Bowman¹, Gabor Angeli¹, Christopher Potts¹, Christopher D. Manning¹•Institutions (1)

Stanford University¹

21 Aug 2015-arXiv: Computation and Language

TL;DR: The Stanford Natural Language Inference corpus is introduced, a new, freely available collection of labeled sentence pairs, written by humans doing a novel grounded task based on image captioning, which allows a neural network-based model to perform competitively on natural language inference benchmarks for the first time.

...read moreread less

Abstract: Understanding entailment and contradiction is fundamental to understanding natural language, and inference about entailment and contradiction is a valuable testing ground for the development of semantic representations. However, machine learning research in this area has been dramatically limited by the lack of large-scale resources. To address this, we introduce the Stanford Natural Language Inference corpus, a new, freely available collection of labeled sentence pairs, written by humans doing a novel grounded task based on image captioning. At 570K pairs, it is two orders of magnitude larger than all other resources of its type. This increase in scale allows lexicalized classifiers to outperform some sophisticated existing entailment models, and it allows a neural network-based model to perform competitively on natural language inference benchmarks for the first time.

...read moreread less

1,717 citations

Posted Content•

Attention-Based Models for Speech Recognition

[...]

Jan Chorowski¹, Dzmitry Bahdanau², Dmitriy Serdyuk³, Kyunghyun Cho³, Yoshua Bengio³ - Show less +1 more•Institutions (3)

University of Wrocław¹, Jacobs University Bremen², Université de Montréal³

24 Jun 2015-arXiv: Computation and Language

TL;DR: The attention-mechanism is extended with features needed for speech recognition and a novel and generic method of adding location-awareness to the attention mechanism is proposed to alleviate the issue of high phoneme error rate.

...read moreread less

Abstract: Recurrent sequence generators conditioned on input data through an attention mechanism have recently shown very good performance on a range of tasks in- cluding machine translation, handwriting synthesis and image caption gen- eration. We extend the attention-mechanism with features needed for speech recognition. We show that while an adaptation of the model used for machine translation in reaches a competitive 18.7% phoneme error rate (PER) on the TIMIT phoneme recognition task, it can only be applied to utterances which are roughly as long as the ones it was trained on. We offer a qualitative explanation of this failure and propose a novel and generic method of adding location-awareness to the attention mechanism to alleviate this issue. The new method yields a model that is robust to long inputs and achieves 18% PER in single utterances and 20% in 10-times longer (repeated) utterances. Finally, we propose a change to the at- tention mechanism that prevents it from concentrating too much on single frames, which further reduces PER to 17.6% level.

...read moreread less

1,447 citations

Posted Content•

Skip-Thought Vectors

[...]

Ryan Kiros¹, Yukun Zhu¹, Ruslan Salakhutdinov², Richard S. Zemel², Antonio Torralba³, Raquel Urtasun¹, Sanja Fidler¹ - Show less +3 more•Institutions (3)

University of Toronto¹, Canadian Institute for Advanced Research², Massachusetts Institute of Technology³

22 Jun 2015-arXiv: Computation and Language

TL;DR: The approach for unsupervised learning of a generic, distributed sentence encoder is described, using the continuity of text from books to train an encoder-decoder model that tries to reconstruct the surrounding sentences of an encoded passage.

...read moreread less

Abstract: We describe an approach for unsupervised learning of a generic, distributed sentence encoder. Using the continuity of text from books, we train an encoder-decoder model that tries to reconstruct the surrounding sentences of an encoded passage. Sentences that share semantic and syntactic properties are thus mapped to similar vector representations. We next introduce a simple vocabulary expansion method to encode words that were not seen as part of training, allowing us to expand our vocabulary to a million words. After training our model, we extract and evaluate our vectors with linear models on 8 tasks: semantic relatedness, paraphrase detection, image-sentence ranking, question-type classification and 4 benchmark sentiment and subjectivity datasets. The end result is an off-the-shelf encoder that can produce highly generic sentence representations that are robust and perform well in practice. We will make our encoder publicly available.

...read moreread less

1,115 citations

Posted Content•

Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models

[...]

Iulian Vlad Serban¹, Alessandro Sordoni¹, Yoshua Bengio¹, Aaron Courville¹, Joelle Pineau² - Show less +1 more•Institutions (2)

Université de Montréal¹, McGill University²

17 Jul 2015-arXiv: Computation and Language

TL;DR: The authors extend the hierarchical recurrent encoder-decoder neural network to the dialogue domain, and demonstrate that this model is competitive with state-of-the-art neural language models and back-off n-gram models.

...read moreread less

Abstract: We investigate the task of building open domain, conversational dialogue systems based on large dialogue corpora using generative models. Generative models produce system responses that are autonomously generated word-by-word, opening up the possibility for realistic, flexible interactions. In support of this goal, we extend the recently proposed hierarchical recurrent encoder-decoder neural network to the dialogue domain, and demonstrate that this model is competitive with state-of-the-art neural language models and back-off n-gram models. We investigate the limitations of this and similar approaches, and show how its performance can be improved by bootstrapping the learning from a larger question-answer pair corpus and from pretrained word embeddings.

...read moreread less

1,090 citations

Posted Content•

Convolutional Neural Network Architectures for Matching Natural Language Sentences

[...]

Baotian Hu¹, Zhengdong Lu², Hang Li², Qingcai Chen¹•Institutions (2)

Harbin Institute of Technology Shenzhen Graduate School¹, Huawei²

11 Mar 2015-arXiv: Computation and Language

TL;DR: This paper proposed convolutional neural network models for matching two sentences, which can be applied to matching tasks of different nature and in different languages and demonstrate the efficacy of the proposed model on a variety of matching tasks and its superiority to competitor models.

...read moreread less

Abstract: Semantic matching is of central importance to many natural language tasks \cite{bordes2014semantic,RetrievalQA}. A successful matching algorithm needs to adequately model the internal structures of language objects and the interaction between them. As a step toward this goal, we propose convolutional neural network models for matching two sentences, by adapting the convolutional strategy in vision and speech. The proposed models not only nicely represent the hierarchical structures of sentences with their layer-by-layer composition and pooling, but also capture the rich matching patterns at different levels. Our models are rather generic, requiring no prior knowledge on language, and can hence be applied to matching tasks of different nature and in different languages. The empirical study on a variety of matching tasks demonstrates the efficacy of the proposed model on a variety of matching tasks and its superiority to competitor models.

...read moreread less

872 citations

Posted Content•

A Neural Attention Model for Abstractive Sentence Summarization

[...]

Alexander M. Rush¹, Sumit Chopra², Jason Weston³•Institutions (3)

Imperial College London¹, AT&T², Facebook³

02 Sep 2015-arXiv: Computation and Language

TL;DR: This work proposes a fully data-driven approach to abstractive sentence summarization by utilizing a local attention-based model that generates each word of the summary conditioned on the input sentence.

...read moreread less

Abstract: Summarization based on text extraction is inherently limited, but generation-style abstractive methods have proven challenging to build. In this work, we propose a fully data-driven approach to abstractive sentence summarization. Our method utilizes a local attention-based model that generates each word of the summary conditioned on the input sentence. While the model is structurally simple, it can easily be trained end-to-end and scales to a large amount of training data. The model shows significant performance gains on the DUC-2004 shared task compared with several strong baselines.

...read moreread less

837 citations

Posted Content•

Effective Approaches to Attention-based Neural Machine Translation

[...]

Minh-Thang Luong¹, Hieu Pham¹, Christopher D. Manning¹•Institutions (1)

Stanford University¹

17 Aug 2015-arXiv: Computation and Language

TL;DR: This article proposed two simple and effective classes of attentional mechanism: a global approach which always attends to all source words and a local one that only looks at a subset of source words at a time.

...read moreread less

Abstract: An attentional mechanism has lately been used to improve neural machine translation (NMT) by selectively focusing on parts of the source sentence during translation. However, there has been little work exploring useful architectures for attention-based NMT. This paper examines two simple and effective classes of attentional mechanism: a global approach which always attends to all source words and a local one that only looks at a subset of source words at a time. We demonstrate the effectiveness of both approaches over the WMT translation tasks between English and German in both directions. With local attention, we achieve a significant gain of 5.0 BLEU points over non-attentional systems which already incorporate known techniques such as dropout. Our ensemble model using different attention architectures has established a new state-of-the-art result in the WMT'15 English to German translation task with 25.9 BLEU points, an improvement of 1.0 BLEU points over the existing best system backed by NMT and an n-gram reranker.

...read moreread less

824 citations

Posted Content•

Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks

[...]

Kai Sheng Tai¹, Richard Socher², Christopher D. Manning¹•Institutions (2)

Stanford University¹, University of Colorado Boulder²

28 Feb 2015-arXiv: Computation and Language

TL;DR: The Tree-LSTM is introduced, a generalization of LSTMs to tree-structured network topologies that outperform all existing systems and strong LSTM baselines on two tasks: predicting the semantic relatedness of two sentences and sentiment classification.

...read moreread less

Abstract: Because of their superior ability to preserve sequence information over time, Long Short-Term Memory (LSTM) networks, a type of recurrent neural network with a more complex computational unit, have obtained strong results on a variety of sequence modeling tasks. The only underlying LSTM structure that has been explored so far is a linear chain. However, natural language exhibits syntactic properties that would naturally combine words to phrases. We introduce the Tree-LSTM, a generalization of LSTMs to tree-structured network topologies. Tree-LSTMs outperform all existing systems and strong LSTM baselines on two tasks: predicting the semantic relatedness of two sentences (SemEval 2014, Task 1) and sentiment classification (Stanford Sentiment Treebank).

...read moreread less

Posted Content•

Teaching Machines to Read and Comprehend

[...]

Karl Moritz Hermann¹, Tomáš Kočiský², Edward Grefenstette¹, Lasse Espeholt¹, Will Kay¹, Mustafa Suleyman¹, Phil Blunsom² - Show less +3 more•Institutions (2)

Google¹, University of Oxford²

10 Jun 2015-arXiv: Computation and Language

TL;DR: This article developed a class of attention-based deep neural networks that learn to read real documents and answer complex questions with minimal prior knowledge of language structure, but this method requires large-scale reading comprehension data.

...read moreread less

Abstract: Teaching machines to read natural language documents remains an elusive challenge. Machine reading systems can be tested on their ability to answer questions posed on the contents of documents that they have seen, but until now large scale training and test datasets have been missing for this type of evaluation. In this work we define a new methodology that resolves this bottleneck and provides large scale supervised reading comprehension data. This allows us to develop a class of attention based deep neural networks that learn to read real documents and answer complex questions with minimal prior knowledge of language structure.

...read moreread less

Posted Content•

A C-LSTM Neural Network for Text Classification

[...]

Chunting Zhou, Chonglin Sun, Zhiyuan Liu, Francis C. M. Lau

27 Nov 2015-arXiv: Computation and Language

TL;DR: C-LSTM is a novel and unified model for sentence representation and text classification that outperforms both CNN and LSTM and can achieve excellent performance on these tasks.

...read moreread less

Abstract: Neural network models have been demonstrated to be capable of achieving remarkable performance in sentence and document modeling. Convolutional neural network (CNN) and recurrent neural network (RNN) are two mainstream architectures for such modeling tasks, which adopt totally different ways of understanding natural languages. In this work, we combine the strengths of both architectures and propose a novel and unified model called C-LSTM for sentence representation and text classification. C-LSTM utilizes CNN to extract a sequence of higher-level phrase representations, and are fed into a long short-term memory recurrent neural network (LSTM) to obtain the sentence representation. C-LSTM is able to capture both local features of phrases as well as global and temporal sentence semantics. We evaluate the proposed architecture on sentiment classification and question classification tasks. The experimental results show that the C-LSTM outperforms both CNN and LSTM and can achieve excellent performance on these tasks.

...read moreread less

Posted Content•

Reasoning about Entailment with Neural Attention

[...]

Tim Rocktäschel, Edward Grefenstette, Karl Moritz Hermann, Tomáš Kočiský, Phil Blunsom - Show less +1 more

22 Sep 2015-arXiv: Computation and Language

TL;DR: This paper proposes a neural model that reads two sentences to determine entailment using long short-term memory units and extends this model with a word-by-word neural attention mechanism that encourages reasoning over entailments of pairs of words and phrases, and presents a qualitative analysis of attention weights produced by this model.

...read moreread less

Abstract: While most approaches to automatically recognizing entailment relations have used classifiers employing hand engineered features derived from complex natural language processing pipelines, in practice their performance has been only slightly better than bag-of-word pair classifiers using only lexical similarity. The only attempt so far to build an end-to-end differentiable neural network for entailment failed to outperform such a simple similarity classifier. In this paper, we propose a neural model that reads two sentences to determine entailment using long short-term memory units. We extend this model with a word-by-word neural attention mechanism that encourages reasoning over entailments of pairs of words and phrases. Furthermore, we present a qualitative analysis of attention weights produced by this model, demonstrating such reasoning capabilities. On a large entailment dataset this model outperforms the previous best neural model and a classifier with engineered features by a substantial margin. It is the first generic end-to-end differentiable system that achieves state-of-the-art accuracy on a textual entailment dataset.

...read moreread less

Posted Content•

Transition-Based Dependency Parsing with Stack Long Short-Term Memory

[...]

Chris Dyer¹, Miguel Ballesteros², Wang Ling¹, Austin Matthews¹, Noah A. Smith¹ - Show less +1 more•Institutions (2)

Carnegie Mellon University¹, Pompeu Fabra University²

29 May 2015-arXiv: Computation and Language

TL;DR: The authors propose a stack LSTM to learn representations of parser states in transition-based dependency parsers, which can be used to learn a parser's state, including the buffer of incoming words, the history of actions taken by the parser, and the complete contents of the stack of partially built tree fragments.

...read moreread less

Abstract: We propose a technique for learning representations of parser states in transition-based dependency parsers. Our primary innovation is a new control structure for sequence-to-sequence neural networks---the stack LSTM. Like the conventional stack data structures used in transition-based parsing, elements can be pushed to or popped from the top of the stack in constant time, but, in addition, an LSTM maintains a continuous space embedding of the stack contents. This lets us formulate an efficient parsing model that captures three facets of a parser's state: (i) unbounded look-ahead into the buffer of incoming words, (ii) the complete history of actions taken by the parser, and (iii) the complete contents of the stack of partially built tree fragments, including their internal structures. Standard backpropagation techniques are used for training and yield state-of-the-art parsing performance.

...read moreread less

Posted Content•

Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems

[...]

Tsung-Hsien Wen¹, Milica Gasic¹, Nikola Mrkšić¹, Pei-Hao Su¹, David Vandyke¹, Steve Young¹ - Show less +2 more•Institutions (1)

University of Cambridge¹

07 Aug 2015-arXiv: Computation and Language

TL;DR: This article presented a statistical language generator based on a semantically controlled Long Short-Term Memory (LSTM) structure, which can learn from unaligned data by jointly optimising sentence planning and surface realisation using a simple cross entropy training criterion, and language variation can be easily achieved by sampling from output candidates.

...read moreread less

Abstract: Natural language generation (NLG) is a critical component of spoken dialogue and it has a significant impact both on usability and perceived quality. Most NLG systems in common use employ rules and heuristics and tend to generate rigid and stylised responses without the natural variation of human language. They are also not easily scaled to systems covering multiple domains and languages. This paper presents a statistical language generator based on a semantically controlled Long Short-term Memory (LSTM) structure. The LSTM generator can learn from unaligned data by jointly optimising sentence planning and surface realisation using a simple cross entropy training criterion, and language variation can be easily achieved by sampling from output candidates. With fewer heuristics, an objective evaluation in two differing test domains showed the proposed method improved performance compared to previous methods. Human judges scored the LSTM system higher on informativeness and naturalness and overall preferred it to the other systems.

...read moreread less

Journal Article•

On using monolingual corpora in neural machine translation

[...]

Caglar Gulcehre, Orhan Firat, Kelvin Xu, Kyunghyun Cho, Loïc Barrault, Huei-Chi Lin, Fethi Bougares, Holger Schwenk, Yoshua Bengio - Show less +5 more

11 Mar 2015-arXiv: Computation and Language

TL;DR: This work investigates how to leverage abundant monolingual corpora for neural machine translation to improve results for En-Fr and En-De translation and extends to high resource languages such as Cs-En and De-En.

...read moreread less

Abstract: Recent work on end-to-end neural network-based architectures for machine translation has shown promising results for En-Fr and En-De translation. Arguably, one of the major factors behind this success has been the availability of high quality parallel corpora. In this work, we investigate how to leverage abundant monolingual corpora for neural machine translation. Compared to a phrase-based and hierarchical baseline, we obtain up to $1.96$ BLEU improvement on the low-resource language pair Turkish-English, and $1.59$ BLEU on the focused domain task of Chinese-English chat messages. While our method was initially targeted toward such tasks with less parallel data, we show that it also extends to high resource languages such as Cs-En and De-En where we obtain an improvement of $0.39$ and $0.47$ BLEU scores over the neural machine translation baselines, respectively.

...read moreread less

Posted Content•

EESEN: End-to-End Speech Recognition using Deep RNN Models and WFST-based Decoding

[...]

Yajie Miao¹, Mohammad Gowayyed¹, Florian Metze¹•Institutions (1)

Carnegie Mellon University¹

29 Jul 2015-arXiv: Computation and Language

TL;DR: Eesen as discussed by the authors proposes a generalized decoding approach based on weighted finite-state transducers (WFSTs), which enables the efficient incorporation of lexicons and language models into CTC decoding.

...read moreread less

Abstract: The performance of automatic speech recognition (ASR) has improved tremendously due to the application of deep neural networks (DNNs). Despite this progress, building a new ASR system remains a challenging task, requiring various resources, multiple training stages and significant expertise. This paper presents our Eesen framework which drastically simplifies the existing pipeline to build state-of-the-art ASR systems. Acoustic modeling in Eesen involves learning a single recurrent neural network (RNN) predicting context-independent targets (phonemes or characters). To remove the need for pre-generated frame labels, we adopt the connectionist temporal classification (CTC) objective function to infer the alignments between speech and label sequences. A distinctive feature of Eesen is a generalized decoding approach based on weighted finite-state transducers (WFSTs), which enables the efficient incorporation of lexicons and language models into CTC decoding. Experiments show that compared with the standard hybrid DNN systems, Eesen achieves comparable word error rates (WERs), while at the same time speeding up decoding significantly.

...read moreread less

Posted Content•

Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation

[...]

Wang Ling, Tiago Luís, Luís Marujo, Ramón Fernandez Astudillo, Silvio Amir, Chris Dyer, Alan W. Black, Isabel Trancoso - Show less +4 more

09 Aug 2015-arXiv: Computation and Language

Abstract: We introduce a model for constructing vector representations of words by composing characters using bidirectional LSTMs. Relative to traditional word representation models that have independent vectors for each word type, our model requires only a single vector per character type and a fixed set of parameters for the compositional model. Despite the compactness of this model and, more importantly, the arbitrary nature of the form-function relationship in language, our "composed" word representations yield state-of-the-art results in language modeling and part-of-speech tagging. Benefits over traditional baselines are particularly pronounced in morphologically rich languages (e.g., Turkish).

...read moreread less

Posted Content•

A Hierarchical Neural Autoencoder for Paragraphs and Documents

[...]

Jiwei Li, Minh-Thang Luong, Dan Jurafsky

02 Jun 2015-arXiv: Computation and Language

TL;DR: This paper proposed a hierarchical LSTM auto-encoder to preserve and reconstruct multi-sentence paragraphs, and evaluated the reconstructed paragraph using standard metrics like ROUGE and Entity Grid, showing that neural models can encode texts in a way that preserve syntactic, semantic, and discourse coherence.

...read moreread less

Abstract: Natural language generation of coherent long texts like paragraphs or longer documents is a challenging problem for recurrent networks models. In this paper, we explore an important step toward this generation task: training an LSTM (Long-short term memory) auto-encoder to preserve and reconstruct multi-sentence paragraphs. We introduce an LSTM model that hierarchically builds an embedding for a paragraph from embeddings for sentences and words, then decodes this embedding to reconstruct the original paragraph. We evaluate the reconstructed paragraph using standard metrics like ROUGE and Entity Grid, showing that neural models are able to encode texts in a way that preserve syntactic, semantic, and discourse coherence. While only a first step toward generating coherent text units from neural models, our work has the potential to significantly impact natural language generation and summarization\footnote{Code for the three models described in this paper can be found at this http URL .

...read moreread less

Posted Content•

Classifying Relations by Ranking with Convolutional Neural Networks

[...]

Cicero Nogueira dos Santos¹, Bing Xiang¹, Bowen Zhou¹•Institutions (1)

IBM¹

24 Apr 2015-arXiv: Computation and Language

TL;DR: This work proposes a new pairwise ranking loss function that makes it easy to reduce the impact of artificial classes and shows that it is more effective than CNN followed by a softmax classifier and using only word embeddings as input features is enough to achieve state-of-the-art results.

...read moreread less

Abstract: Relation classification is an important semantic processing task for which state-ofthe-art systems still rely on costly handcrafted features. In this work we tackle the relation classification task using a convolutional neural network that performs classification by ranking (CR-CNN). We propose a new pairwise ranking loss function that makes it easy to reduce the impact of artificial classes. We perform experiments using the the SemEval-2010 Task 8 dataset, which is designed for the task of classifying the relationship between two nominals marked in a sentence. Using CRCNN, we outperform the state-of-the-art for this dataset and achieve a F1 of 84.1 without using any costly handcrafted features. Additionally, our experimental results show that: (1) our approach is more effective than CNN followed by a softmax classifier; (2) omitting the representation of the artificial class Other improves both precision and recall; and (3) using only word embeddings as input features is enough to achieve state-of-the-art results if we consider only the text between the two target nominals.

...read moreread less

Posted Content•

LSTM-based Deep Learning Models for Non-factoid Answer Selection

[...]

Ming Tan, Cicero Nogueira dos Santos, Bing Xiang, Bowen Zhou

12 Nov 2015-arXiv: Computation and Language

TL;DR: A general deep learning framework is applied for the answer selection task, which does not depend on manually defined features or linguistic tools, and is extended in two directions to define a more composite representation for questions and answers.

...read moreread less

Abstract: In this paper, we apply a general deep learning (DL) framework for the answer selection task, which does not depend on manually defined features or linguistic tools. The basic framework is to build the embeddings of questions and answers based on bidirectional long short-term memory (biLSTM) models, and measure their closeness by cosine similarity. We further extend this basic model in two directions. One direction is to define a more composite representation for questions and answers by combining convolutional neural network with the basic framework. The other direction is to utilize a simple but efficient attention mechanism in order to generate the answer representation according to the question context. Several variations of models are provided. The models are examined by two datasets, including TREC-QA and InsuranceQA. Experimental results demonstrate that the proposed models substantially outperform several strong baselines.

...read moreread less

Posted Content•

Listen, Attend and Spell

[...]

William Chan, Navdeep Jaitly, Quoc V. Le, Oriol Vinyals

05 Aug 2015-arXiv: Computation and Language

TL;DR: A neural network that learns to transcribe speech utterances to characters without making any independence assumptions between the characters, which is the key improvement of LAS over previous end-to-end CTC models.

...read moreread less

Abstract: We present Listen, Attend and Spell (LAS), a neural network that learns to transcribe speech utterances to characters. Unlike traditional DNN-HMM models, this model learns all the components of a speech recognizer jointly. Our system has two components: a listener and a speller. The listener is a pyramidal recurrent network encoder that accepts filter bank spectra as inputs. The speller is an attention-based recurrent network decoder that emits characters as outputs. The network produces character sequences without making any independence assumptions between the characters. This is the key improvement of LAS over previous end-to-end CTC models. On a subset of the Google voice search task, LAS achieves a word error rate (WER) of 14.1% without a dictionary or a language model, and 10.3% with language model rescoring over the top 32 beams. By comparison, the state-of-the-art CLDNN-HMM model achieves a WER of 8.0%.

...read moreread less

Posted Content•

The Goldilocks Principle: Reading Children's Books with Explicit Memory Representations

[...]

Felix Hill¹, Antoine Bordes¹, Sumit Chopra¹, Jason Weston¹•Institutions (1)

Facebook¹

07 Nov 2015-arXiv: Computation and Language

TL;DR: This article showed that the amount of text encoded in a single memory representation is highly influential to the performance: there is a sweet-spot, not too big and not too small, between single words and full sentences that allows the most meaningful information in a text to be effectively retained and recalled.

...read moreread less

Abstract: We introduce a new test of how well language models capture meaning in children's books. Unlike standard language modelling benchmarks, it distinguishes the task of predicting syntactic function words from that of predicting lower-frequency words, which carry greater semantic content. We compare a range of state-of-the-art models, each with a different way of encoding what has been previously read. We show that models which store explicit representations of long-term contexts outperform state-of-the-art neural language models at predicting semantic content words, although this advantage is not observed for syntactic function words. Interestingly, we find that the amount of text encoded in a single memory representation is highly influential to the performance: there is a sweet-spot, not too big and not too small, between single words and full sentences that allows the most meaningful information in a text to be effectively retained and recalled. Further, the attention over such window-based memories can be trained effectively through self-supervision. We then assess the generality of this principle by applying it to the CNN QA benchmark, which involves identifying named entities in paraphrased summaries of news articles, and achieve state-of-the-art performance.

...read moreread less

Posted Content•

Named Entity Recognition with Bidirectional LSTM-CNNs

[...]

Jason P.C. Chiu¹, Eric Nichols²•Institutions (2)

University of British Columbia¹, Honda²

26 Nov 2015-arXiv: Computation and Language

TL;DR: A novel neural network architecture is presented that automatically detects word- and character-level features using a hybrid bidirectional LSTM and CNN architecture, eliminating the need for most feature engineering.

...read moreread less

Abstract: Named entity recognition is a challenging task that has traditionally required large amounts of knowledge in the form of feature engineering and lexicons to achieve high performance. In this paper, we present a novel neural network architecture that automatically detects word- and character-level features using a hybrid bidirectional LSTM and CNN architecture, eliminating the need for most feature engineering. We also propose a novel method of encoding partial lexicon matches in neural networks and compare it to existing approaches. Extensive evaluation shows that, given only tokenized text and publicly available word embeddings, our system is competitive on the CoNLL-2003 dataset and surpasses the previously reported state of the art performance on the OntoNotes 5.0 dataset by 2.13 F1 points. By using two lexicons constructed from publicly-available sources, we establish new state of the art performance with an F1 score of 91.62 on CoNLL-2003 and 86.28 on OntoNotes, surpassing systems that employ heavy feature engineering, proprietary lexicons, and rich entity linking information.

...read moreread less

Posted Content•

Character-Aware Neural Language Models

[...]

Yoon Kim¹, Yacine Jernite², David Sontag², Alexander M. Rush¹•Institutions (2)

Harvard University¹, Courant Institute of Mathematical Sciences²

26 Aug 2015-arXiv: Computation and Language

TL;DR: This article used a convolutional neural network (CNN) and a highway network over characters, whose output is given to a long short-term memory (LSTM) recurrent neural network language model (RNN-LM).

...read moreread less

Abstract: We describe a simple neural language model that relies only on character-level inputs. Predictions are still made at the word-level. Our model employs a convolutional neural network (CNN) and a highway network over characters, whose output is given to a long short-term memory (LSTM) recurrent neural network language model (RNN-LM). On the English Penn Treebank the model is on par with the existing state-of-the-art despite having 60% fewer parameters. On languages with rich morphology (Arabic, Czech, French, German, Spanish, Russian), the model outperforms word-level/morpheme-level LSTM baselines, again with fewer parameters. The results suggest that on many languages, character inputs are sufficient for language modeling. Analysis of word representations obtained from the character composition part of the model reveals that the model is able to encode, from characters only, both semantic and orthographic information.

...read moreread less

Posted Content•

Neural Variational Inference for Text Processing

[...]

Yishu Miao¹, Lei Yu¹, Phil Blunsom¹•Institutions (1)

University of Oxford¹

19 Nov 2015-arXiv: Computation and Language

TL;DR: This paper introduces a generic variational inference framework for generative and conditional models of text, and constructs an inference network conditioned on the discrete text input to provide the variational distribution.

...read moreread less

Abstract: Recent advances in neural variational inference have spawned a renaissance in deep latent variable models. In this paper we introduce a generic variational inference framework for generative and conditional models of text. While traditional variational methods derive an analytic approximation for the intractable distributions over latent variables, here we construct an inference network conditioned on the discrete text input to provide the variational distribution. We validate this framework on two very different text modelling applications, generative document modelling and supervised question answering. Our neural variational document model combines a continuous stochastic document representation with a bag-of-words generative model and achieves the lowest reported perplexities on two standard test corpora. The neural answer selection model employs a stochastic representation layer within an attention mechanism to extract the semantics between a question and answer pair. On two question answering benchmarks this model exceeds all previous published benchmarks.

...read moreread less

Posted Content•

Document embedding with paragraph vectors

[...]

Andrew M. Dai, Chris Olah, Quoc V. Le

29 Jul 2015-arXiv: Computation and Language

TL;DR: This work observes that the Paragraph Vector method performs significantly better than other methods, and proposes a simple improvement to enhance embedding quality, and shows that much like word embeddings, vector operations on Paragraph Vectors can perform useful semantic results.

...read moreread less

Abstract: Paragraph Vectors has been recently proposed as an unsupervised method for learning distributed representations for pieces of texts. In their work, the authors showed that the method can learn an embedding of movie review texts which can be leveraged for sentiment analysis. That proof of concept, while encouraging, was rather narrow. Here we consider tasks other than sentiment analysis, provide a more thorough comparison of Paragraph Vectors to other document modelling algorithms such as Latent Dirichlet Allocation, and evaluate performance of the method as we vary the dimensionality of the learned representation. We benchmarked the models on two document similarity data sets, one from Wikipedia, one from arXiv. We observe that the Paragraph Vector method performs significantly better than other methods, and propose a simple improvement to enhance embedding quality. Somewhat surprisingly, we also show that much like word embeddings, vector operations on Paragraph Vectors can perform useful semantic results.

...read moreread less

Posted Content•

Fast and Accurate Recurrent Neural Network Acoustic Models for Speech Recognition

[...]

Hasim Sak¹, Andrew W. Senior¹, Kanishka Rao¹, Francoise Beaufays¹•Institutions (1)

Google¹

24 Jul 2015-arXiv: Computation and Language

TL;DR: In this paper, the performance of LSTM RNN acoustic models for large vocabulary speech recognition was further improved by frame stacking and reduced frame rate, leading to more accurate models and faster decoding.

...read moreread less

Abstract: We have recently shown that deep Long Short-Term Memory (LSTM) recurrent neural networks (RNNs) outperform feed forward deep neural networks (DNNs) as acoustic models for speech recognition. More recently, we have shown that the performance of sequence trained context dependent (CD) hidden Markov model (HMM) acoustic models using such LSTM RNNs can be equaled by sequence trained phone models initialized with connectionist temporal classification (CTC). In this paper, we present techniques that further improve performance of LSTM RNN acoustic models for large vocabulary speech recognition. We show that frame stacking and reduced frame rate lead to more accurate models and faster decoding. CD phone modeling leads to further improvements. We also present initial results for LSTM RNN models outputting words directly.

...read moreread less

Posted Content•

Neural Responding Machine for Short-Text Conversation

[...]

Lifeng Shang¹, Zhengdong Lu¹, Hang Li¹•Institutions (1)

Huawei¹

09 Mar 2015-arXiv: Computation and Language

TL;DR: Empirical study shows that NRM can generate grammatically correct and content-wise appropriate responses to over 75% of the input text, outperforming state-of-the-arts in the same setting, including retrieval-based and SMT-based models.

...read moreread less

Abstract: We propose Neural Responding Machine (NRM), a neural network-based response generator for Short-Text Conversation. NRM takes the general encoder-decoder framework: it formalizes the generation of response as a decoding process based on the latent representation of the input text, while both encoding and decoding are realized with recurrent neural networks (RNN). The NRM is trained with a large amount of one-round conversation data collected from a microblogging service. Empirical study shows that NRM can generate grammatically correct and content-wise appropriate responses to over 75% of the input text, outperforming state-of-the-arts in the same setting, including retrieval-based and SMT-based models.

...read moreread less

Collapse