scispace - formally typeset
Search or ask a question
Posted Content

Conditional Random Field Autoencoders for Unsupervised Structured Prediction

TL;DR: Competitive results with instantiations of the framework for unsupervised learning of structured predictors with overlapping, global features are shown, and it is shown that training the proposed model can be substantially more efficient than a comparable feature-rich baseline.
Abstract: We introduce a framework for unsupervised learning of structured predictors with overlapping, global features. Each input's latent representation is predicted conditional on the observable data using a feature-rich conditional random field. Then a reconstruction of the input is (re)generated, conditional on the latent structure, using models for which maximum likelihood estimation has a closed-form. Our autoencoder formulation enables efficient learning without making unrealistic independence assumptions or restricting the kinds of features that can be used. We illustrate insightful connections to traditional autoencoders, posterior regularization and multi-view learning. We show competitive results with instantiations of the model for two canonical NLP tasks: part-of-speech induction and bitext word alignment, and show that training our model can be substantially more efficient than comparable feature-rich baselines.
Citations
More filters
Book ChapterDOI
08 Oct 2016
TL;DR: A novel approach which learns grounding by reconstructing a given phrase using an attention mechanism, which can be either latent or optimized directly, and demonstrates the effectiveness on the Flickr 30k Entities and ReferItGame datasets.
Abstract: Grounding (i.e. localizing) arbitrary, free-form textual phrases in visual content is a challenging problem with many applications for human-computer interaction and image-text reference resolution. Few datasets provide the ground truth spatial localization of phrases, thus it is desirable to learn from data with no or little grounding supervision. We propose a novel approach which learns grounding by reconstructing a given phrase using an attention mechanism, which can be either latent or optimized directly. During training our approach encodes the phrase using a recurrent network language model and then learns to attend to the relevant image region in order to reconstruct the input phrase. At test time, the correct attention, i.e., the grounding, is evaluated. If grounding supervision is available it can be directly applied via a loss over the attention mechanism. We demonstrate the effectiveness of our approach on the Flickr30k Entities and ReferItGame datasets with different levels of supervision, ranging from no supervision over partial supervision to full supervision. Our supervised variant improves by a large margin over the state-of-the-art on both datasets.

441 citations

Book ChapterDOI
TL;DR: In this article, an attention mechanism is used to reconstruct a given phrase by reconstructing the given phrase using an attention loss, which can be either latent or optimized directly for ground-truth spatial localization.
Abstract: Grounding (i.e. localizing) arbitrary, free-form textual phrases in visual content is a challenging problem with many applications for human-computer interaction and image-text reference resolution. Few datasets provide the ground truth spatial localization of phrases, thus it is desirable to learn from data with no or little grounding supervision. We propose a novel approach which learns grounding by reconstructing a given phrase using an attention mechanism, which can be either latent or optimized directly. During training our approach encodes the phrase using a recurrent network language model and then learns to attend to the relevant image region in order to reconstruct the input phrase. At test time, the correct attention, i.e., the grounding, is evaluated. If grounding supervision is available it can be directly applied via a loss over the attention mechanism. We demonstrate the effectiveness of our approach on the Flickr 30k Entities and ReferItGame datasets with different levels of supervision, ranging from no supervision over partial supervision to full supervision. Our supervised variant improves by a large margin over the state-of-the-art on both datasets.

346 citations

Proceedings ArticleDOI
Yong Cheng1, Wei Xu2, Zhongjun He3, Wei He3, Hua Wu3, Maosong Sun2, Yang Liu2 
15 Jun 2016
TL;DR: This work proposes a semi-supervised approach for training NMT models on the concatenation of labeled and unlabeled monolingual corpora data, in which the source- to-target and target-to-source translation models serve as the encoder and decoder, respectively.
Abstract: While end-to-end neural machine translation (NMT) has made remarkable progress recently, NMT systems only rely on parallel corpora for parameter estimation. Since parallel corpora are usually limited in quantity, quality, and coverage, especially for low-resource languages, it is appealing to exploit monolingual corpora to improve NMT. We propose a semi-supervised approach for training NMT models on the concatenation of labeled (parallel corpora) and unlabeled (monolingual corpora) data. The central idea is to reconstruct the monolingual corpora using an autoencoder, in which the sourceto-target and target-to-source translation models serve as the encoder and decoder, respectively. Our approach can not only exploit the monolingual corpora of the target language, but also of the source language. Experiments on the ChineseEnglish dataset show that our approach achieves significant improvements over state-of-the-art SMT and NMT systems.

171 citations


Cites background from "Conditional Random Field Autoencode..."

  • ...1 Likewise, given a monolingual corpus of source language S = {x(s)}Ss=1, it is natural to introduce a source autoencoder that aims at reconstructing 1Our definition of auotoencoders is inspired by Ammar et al. (2014)....

    [...]

  • ...Autoencoders and their variants have been widely used in unsupervised deep learning ((Vincent et al., 2010; Socher et al., 2011; Ammar et al., 2014), just to name a few)....

    [...]

Proceedings ArticleDOI
01 Nov 2016
TL;DR: A general framework is developed that enables learning knowledge and its confidence jointly with the DNNs, so that the vast amount of fuzzy knowledge can be incorporated and automatically optimized with little manual efforts.
Abstract: Regulating deep neural networks (DNNs) with human structured knowledge has shown to be of great benefit for improved accuracy and interpretability. We develop a general framework that enables learning knowledge and its confidence jointly with the DNNs, so that the vast amount of fuzzy knowledge can be incorporated and automatically optimized with little manual efforts. We apply the framework to sentence sentiment analysis, augmenting a DNN with massive linguistic constraints on discourse and polarity structures. Our model substantially enhances the performance using less training data, and shows improved interpretability. The principled framework can also be applied to posterior regularization for regulating other statistical models.

84 citations


Cites background from "Conditional Random Field Autoencode..."

  • ...Though apparently similar to recent deep structured models such as neural-CRFs (Durrett and Klein, 2015; Ammar et al., 2014; Do et al., 2010), ours is different since we parsimoniously extract features that are necessary for precise and efficient knowledge expression, as opposed to neural-CRFs that learn as rich representations as possible for final prediction....

    [...]

  • ...Though apparently similar to recent deep structured models such as neural-CRFs (Durrett and Klein, 2015; Ammar et al., 2014; Do et al., 2010), ours is different since we parsimoniously extract features that are necessary for precise and efficient knowledge expression, as opposed to neural-CRFs that…...

    [...]

Proceedings ArticleDOI
02 May 2018
TL;DR: This paper exploits the dimensionality reduction and feature extraction property of the autoencoder framework to efficiently carry out the reconstruction process and uses the LSTM networks to handle the sequential nature of the computer network data.
Abstract: In this paper, we introduce a sequential autoencoder framework using long short term memory (LSTM) neural network for computer network intrusion detection. We exploit the dimensionality reduction and feature extraction property of the autoencoder framework to efficiently carry out the reconstruction process. Furthermore, we use the LSTM networks to handle the sequential nature of the computer network data. We assign a threshold value based on cross-validation in order to classify whether the incoming network data sequence is anomalous or not. Moreover, the proposed framework can work on both fixed and variable length data sequence and works efficiently for unforeseen and unpredictable network attacks. We then also use the unsupervised version of the LSTM, GRU, Bi-LSTM and Neural Networks. Through a comprehensive set of experiments, we demonstrate that our proposed sequential intrusion detection framework performs well and is dynamic, robust and scalable.

78 citations


Cites background from "Conditional Random Field Autoencode..."

  • ...similar in that they are both sequential variants of the standard autoencoder [23]....

    [...]

References
More filters
Proceedings ArticleDOI
06 Jul 2002
TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.
Abstract: Human evaluations of machine translation are extensive but expensive. Human evaluations can take months to finish and involve human labor that can not be reused. We propose a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run. We present this method as an automated understudy to skilled human judges which substitutes for them when there is need for quick or frequent evaluations.

21,126 citations

Proceedings Article
28 Jun 2001
TL;DR: This work presents iterative parameter estimation algorithms for conditional random fields and compares the performance of the resulting models to HMMs and MEMMs on synthetic and natural-language data.
Abstract: We present conditional random fields , a framework for building probabilistic models to segment and label sequence data. Conditional random fields offer several advantages over hidden Markov models and stochastic grammars for such tasks, including the ability to relax strong independence assumptions made in those models. Conditional random fields also avoid a fundamental limitation of maximum entropy Markov models (MEMMs) and other discriminative Markov models based on directed graphical models, which can be biased towards states with few successor states. We present iterative parameter estimation algorithms for conditional random fields and compare the performance of the resulting models to HMMs and MEMMs on synthetic and natural-language data.

13,190 citations


"Conditional Random Field Autoencode..." refers methods in this paper

  • ...Conditional random fields [24] are used to model structure in numerous problem domains, including natural language processing (NLP), computational biology, and computer vision....

    [...]

Proceedings Article
01 Jan 2010
TL;DR: Adaptive subgradient methods as discussed by the authors dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning, which allows us to find needles in haystacks in the form of very predictive but rarely seen features.
Abstract: We present a new family of subgradient methods that dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning. Metaphorically, the adaptation allows us to find needles in haystacks in the form of very predictive but rarely seen features. Our paradigm stems from recent advances in stochastic optimization and online learning which employ proximal functions to control the gradient steps of the algorithm. We describe and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal function that can be chosen in hindsight. We give several efficient algorithms for empirical risk minimization problems with common and important regularization functions and domain constraints. We experimentally study our theoretical analysis and show that adaptive subgradient methods outperform state-of-the-art, yet non-adaptive, subgradient algorithms.

7,244 citations