scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Recurrent neural networks for language understanding.

25 Aug 2013-pp 2524-2528
TL;DR: This paper modify the architecture to perform Language Understanding, and advance the state-of-the-art for the widely used ATIS dataset.
Abstract: Recurrent Neural Network Language Models (RNN-LMs) have recently shown exceptional performance across a variety of applications. In this paper, we modify the architecture to perform Language Understanding, and advance the state-of-the-art for the widely used ATIS dataset. The core of our approach is to take words as input as in a standard RNN-LM, and then to predict slot labels rather than words on the output side. We present several variations that differ in the amount of word context that is used on the input side, and in the use of non-lexical features. Remarkably, our simplest model produces state-of-the-art results, and we advance state-of-the-art through the use of bagof-words, word embedding, named-entity, syntactic, and wordclass features. Analysis indicates that the superior performance is attributable to the task-specific word representations learned by the RNN.

Content maybe subject to copyright    Report

Citations
More filters
Book
Li Deng1, Dong Yu1
12 Jun 2014
TL;DR: This monograph provides an overview of general deep learning methodology and its applications to a variety of signal and information processing tasks, including natural language and text processing, information retrieval, and multimodal information processing empowered by multi-task deep learning.
Abstract: This monograph provides an overview of general deep learning methodology and its applications to a variety of signal and information processing tasks. The application areas are chosen with the following three criteria in mind: (1) expertise or knowledge of the authors; (2) the application areas that have already been transformed by the successful use of deep learning technology, such as speech recognition and computer vision; and (3) the application areas that have the potential to be impacted significantly by deep learning and that have been experiencing research growth, including natural language and text processing, information retrieval, and multimodal information processing empowered by multi-task deep learning.

2,817 citations


Cites background from "Recurrent neural networks for langu..."

  • ...[403] reported the success of RNNs in spoken language understanding....

    [...]

  • ...to emotion recognition from speech in [207, 222], to spoken language understanding in [242, 366, 403], to speaker recognition in [351, 372], to language-type recognition in [112], to dialogue state tracking for spoken dialogue systems in [94, 152], to automatic voice activity detection in [442], to speech enhancement in [396], to voice conversion in [266], and to single-channel source separation in [132, 387]....

    [...]

Journal ArticleDOI
TL;DR: This paper implemented and compared several important RNN architectures, including Elman, Jordan, and hybrid variants, and implemented these networks with the publicly available Theano neural network toolkit and completed experiments on the well-known airline travel information system (ATIS) benchmark.
Abstract: Semantic slot filling is one of the most challenging problems in spoken language understanding (SLU). In this paper, we propose to use recurrent neural networks (RNNs) for this task, and present several novel architectures designed to efficiently model past and future temporal dependencies. Specifically, we implemented and compared several important RNN architectures, including Elman, Jordan, and hybrid variants. To facilitate reproducibility, we implemented these networks with the publicly available Theano neural network toolkit and completed experiments on the well-known airline travel information system (ATIS) benchmark. In addition, we compared the approaches on two custom SLU data sets from the entertainment and movies domains. Our results show that the RNN-based models outperform the conditional random field (CRF) baseline by 2% in absolute error reduction on the ATIS benchmark. We improve the state-of-the-art by 0.5% in the Entertainment domain, and 6.7% for the movies domain.

562 citations


Cites background or methods or result from "Recurrent neural networks for langu..."

  • ...We used a context window of 3 for bag-of-word feature [24]....

    [...]

  • ...some preliminary SLU experiments [15], [24], [30], [56], in this...

    [...]

  • ...Such word embeddings actually present interesting properties [23] and tend to cluster [20] when their semantics are similar While [15][24] suggest initializing the embedding vectors with unsupervised vised learned features and then fine-tune it on the task of interest, we found that directly learning the embedding vectors initialized from random values led to the same performance on the ATIS dataset, when using the SENNA word embeddings (http://ml....

    [...]

  • ...In our experiments, we preprocessed the data as in [24]....

    [...]

  • ...6% reported in [24] with the 95% significance level based on the binomial test....

    [...]

Proceedings ArticleDOI
25 Aug 2013
TL;DR: The results show that on this task, both types of recurrent networks outperform the CRF baseline substantially, and a bi-directional Jordantype network that takes into account both past and future dependencies among slots works best, outperforming a CRFbased baseline by 14% in relative error reduction.
Abstract: One of the key problems in spoken language understanding (SLU) is the task of slot filling. In light of the recent success of applying deep neural network technologies in domain detection and intent identification, we carried out an in-depth investigation on the use of recurrent neural networks for the more difficult task of slot filling involving sequence discrimination. In this work, we implemented and compared several important recurrent-neural-network architectures, including the Elman-type and Jordan-type recurrent networks and their variants. To make the results easy to reproduce and compare, we implemented these networks on the common Theano neural network toolkit, and evaluated them on the ATIS benchmark. We also compared our results to a conditional random fields (CRF) baseline. Our results show that on this task, both types of recurrent networks outperform the CRF baseline substantially, and a bi-directional Jordantype network that takes into account both past and future dependencies among slots works best, outperforming a CRFbased baseline by 14% in relative error reduction.

503 citations


Cites methods from "Recurrent neural networks for langu..."

  • ...Future work will explore more efficient training of RNNs and the choice of more comprehensive features [28] and using a different RNN training toolkit [14] incorporating more advanced features....

    [...]

Journal ArticleDOI
TL;DR: The usefulness and effectiveness of GAN for classification of hyperspectral images (HSIs) are explored for the first time and the proposed models provide competitive results compared to the state-of-the-art methods.
Abstract: A generative adversarial network (GAN) usually contains a generative network and a discriminative network in competition with each other. The GAN has shown its capability in a variety of applications. In this paper, the usefulness and effectiveness of GAN for classification of hyperspectral images (HSIs) are explored for the first time. In the proposed GAN, a convolutional neural network (CNN) is designed to discriminate the inputs and another CNN is used to generate so-called fake inputs. The aforementioned CNNs are trained together: the generative CNN tries to generate fake inputs that are as real as possible, and the discriminative CNN tries to classify the real and fake inputs. This kind of adversarial training improves the generalization capability of the discriminative CNN, which is really important when the training samples are limited. Specifically, we propose two schemes: 1) a well-designed 1D-GAN as a spectral classifier and 2) a robust 3D-GAN as a spectral–spatial classifier. Furthermore, the generated adversarial samples are used with real training samples to fine-tune the discriminative CNN, which improves the final classification performance. The proposed classifiers are carried out on three widely used hyperspectral data sets: Salinas, Indiana Pines, and Kennedy Space Center. The obtained results reveal that the proposed models provide competitive results compared to the state-of-the-art methods. In addition, the proposed GANs open new opportunities in the remote sensing community for the challenging task of HSI classification and also reveal the huge potential of GAN-based methods for the analysis of such complex and inherently nonlinear data.

501 citations

Proceedings ArticleDOI
08 Sep 2016
TL;DR: Experimental results show the power of a holistic multi-domain, multi-task modeling approach to estimate complete semantic frames for all user utterances addressed to a conversational system over alternative methods based on single domain/task deep learning.
Abstract: Sequence-to-sequence deep learning has recently emerged as a new paradigm in supervised learning for spoken language understanding. However, most of the previous studies explored this framework for building single domain models for each task, such as slot filling or domain classification, comparing deep learning based approaches with conventional ones like conditional random fields. This paper proposes a holistic multi-domain, multi-task (i.e. slot filling, domain and intent detection) modeling approach to estimate complete semantic frames for all user utterances addressed to a conversational system, demonstrating the distinctive power of deep learning methods, namely bi-directional recurrent neural network (RNN) with long-short term memory (LSTM) cells (RNN-LSTM) to handle such complexity. The contributions of the presented work are three-fold: (i) we propose an RNN-LSTM architecture for joint modeling of slot filling, intent determination, and domain classification; (ii) we build a joint multi-domain model enabling multi-task deep learning where the data from each domain reinforces each other; (iii) we investigate alternative architectures for modeling lexical context in spoken language understanding. In addition to the simplicity of the single model framework, experimental results show the power of such an approach on Microsoft Cortana real user data over alternative methods based on single domain/task deep learning.

464 citations


Additional excerpts

  • ...[23] and Mesnil et al....

    [...]

References
More filters
Journal Article
TL;DR: A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.
Abstract: We present a new technique called “t-SNE” that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map. The technique is a variation of Stochastic Neighbor Embedding (Hinton and Roweis, 2002) that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map. t-SNE is better than existing techniques at creating a single map that reveals structure at many different scales. This is particularly important for high-dimensional data that lie on several different, but related, low-dimensional manifolds, such as images of objects from multiple classes seen from multiple viewpoints. For visualizing the structure of very large datasets, we show how t-SNE can use random walks on neighborhood graphs to allow the implicit structure of all of the data to influence the way in which a subset of the data is displayed. We illustrate the performance of t-SNE on a wide variety of datasets and compare it with many other non-parametric visualization techniques, including Sammon mapping, Isomap, and Locally Linear Embedding. The visualizations produced by t-SNE are significantly better than those produced by the other techniques on almost all of the datasets.

30,124 citations


"Recurrent neural networks for langu..." refers methods in this paper

  • ...To understand it better, we use t-SNE [39] to plot frequent words that are in the ten most common slots in two-dimensional space, as shown in color in Figure 5....

    [...]

Proceedings Article
28 Jun 2001
TL;DR: This work presents iterative parameter estimation algorithms for conditional random fields and compares the performance of the resulting models to HMMs and MEMMs on synthetic and natural-language data.
Abstract: We present conditional random fields , a framework for building probabilistic models to segment and label sequence data. Conditional random fields offer several advantages over hidden Markov models and stochastic grammars for such tasks, including the ability to relax strong independence assumptions made in those models. Conditional random fields also avoid a fundamental limitation of maximum entropy Markov models (MEMMs) and other discriminative Markov models based on directed graphical models, which can be biased towards states with few successor states. We present iterative parameter estimation algorithms for conditional random fields and compare the performance of the resulting models to HMMs and MEMMs on synthetic and natural-language data.

13,190 citations


"Recurrent neural networks for langu..." refers methods in this paper

  • ...Perhaps the most obvious approach to this task is the use of Conditional Random Fields (CRFs) [26], in which an exponential model is used to compute the probability of a label sequence given the input word sequence....

    [...]

Journal ArticleDOI
TL;DR: A proposal along these lines first described by Jordan (1986) which involves the use of recurrent links in order to provide networks with a dynamic memory and suggests a method for representing lexical categories and the type/token distinction is developed.

10,264 citations


"Recurrent neural networks for langu..." refers methods in this paper

  • ...To adapt the RNN-LM to the LU task, we use the classic Elman RNN architecture [31] adopted by Mikolov [1]....

    [...]

Journal ArticleDOI
TL;DR: The authors propose to learn a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences, which can be expressed in terms of these representations.
Abstract: A goal of statistical language modeling is to learn the joint probability function of sequences of words in a language. This is intrinsically difficult because of the curse of dimensionality: a word sequence on which the model will be tested is likely to be different from all the word sequences seen during training. Traditional but very successful approaches based on n-grams obtain generalization by concatenating very short overlapping sequences seen in the training set. We propose to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences. The model learns simultaneously (1) a distributed representation for each word along with (2) the probability function for word sequences, expressed in terms of these representations. Generalization is obtained because a sequence of words that has never been seen before gets high probability if it is made of words that are similar (in the sense of having a nearby representation) to words forming an already seen sentence. Training such large models (with millions of parameters) within a reasonable time is itself a significant challenge. We report on experiments using neural networks for the probability function, showing on two text corpora that the proposed approach significantly improves on state-of-the-art n-gram models, and that the proposed approach allows to take advantage of longer contexts.

6,832 citations