Conditional Random Field Autoencoders for Unsupervised Structured Prediction

Home
/
Papers
/
Conditional Random Field Autoencoders for Unsupervised Structured Prediction

Posted Content•

Conditional Random Field Autoencoders for Unsupervised Structured Prediction

Waleed Ammar¹, Chris Dyer¹, Noah A. Smith¹•Institutions (1)

05 Nov 2014-arXiv: Learning-

TL;DR: Competitive results with instantiations of the framework for unsupervised learning of structured predictors with overlapping, global features are shown, and it is shown that training the proposed model can be substantially more efficient than a comparable feature-rich baseline.

read less

Abstract: We introduce a framework for unsupervised learning of structured predictors with overlapping, global features. Each input's latent representation is predicted conditional on the observable data using a feature-rich conditional random field. Then a reconstruction of the input is (re)generated, conditional on the latent structure, using models for which maximum likelihood estimation has a closed-form. Our autoencoder formulation enables efficient learning without making unrealistic independence assumptions or restricting the kinds of features that can be used. We illustrate insightful connections to traditional autoencoders, posterior regularization and multi-view learning. We show competitive results with instantiations of the model for two canonical NLP tasks: part-of-speech induction and bitext word alignment, and show that training our model can be substantially more efficient than comparable feature-rich baselines.

...read moreread less

Citations

PDF

Open Access

More filters

Book Chapter•DOI•

Grounding of Textual Phrases in Images by Reconstruction

[...]

Anna Rohrbach¹, Marcus Rohrbach², Marcus Rohrbach³, Ronghang Hu³, Trevor Darrell³, Bernt Schiele¹ - Show less +2 more•Institutions (3)

Max Planck Society¹, Institute of Company Secretaries of India², University of California, Berkeley³

08 Oct 2016

TL;DR: A novel approach which learns grounding by reconstructing a given phrase using an attention mechanism, which can be either latent or optimized directly, and demonstrates the effectiveness on the Flickr 30k Entities and ReferItGame datasets.

...read moreread less

Abstract: Grounding (i.e. localizing) arbitrary, free-form textual phrases in visual content is a challenging problem with many applications for human-computer interaction and image-text reference resolution. Few datasets provide the ground truth spatial localization of phrases, thus it is desirable to learn from data with no or little grounding supervision. We propose a novel approach which learns grounding by reconstructing a given phrase using an attention mechanism, which can be either latent or optimized directly. During training our approach encodes the phrase using a recurrent network language model and then learns to attend to the relevant image region in order to reconstruct the input phrase. At test time, the correct attention, i.e., the grounding, is evaluated. If grounding supervision is available it can be directly applied via a loss over the attention mechanism. We demonstrate the effectiveness of our approach on the Flickr30k Entities and ReferItGame datasets with different levels of supervision, ranging from no supervision over partial supervision to full supervision. Our supervised variant improves by a large margin over the state-of-the-art on both datasets.

...read moreread less

441 citations

Book Chapter•DOI•

Grounding of Textual Phrases in Images by Reconstruction

[...]

Anna Rohrbach¹, Marcus Rohrbach², Marcus Rohrbach³, Ronghang Hu³, Trevor Darrell³, Bernt Schiele¹ - Show less +2 more•Institutions (3)

Max Planck Society¹, Institute of Company Secretaries of India², University of California, Berkeley³

12 Nov 2015-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, an attention mechanism is used to reconstruct a given phrase by reconstructing the given phrase using an attention loss, which can be either latent or optimized directly for ground-truth spatial localization.

...read moreread less

Abstract: Grounding (i.e. localizing) arbitrary, free-form textual phrases in visual content is a challenging problem with many applications for human-computer interaction and image-text reference resolution. Few datasets provide the ground truth spatial localization of phrases, thus it is desirable to learn from data with no or little grounding supervision. We propose a novel approach which learns grounding by reconstructing a given phrase using an attention mechanism, which can be either latent or optimized directly. During training our approach encodes the phrase using a recurrent network language model and then learns to attend to the relevant image region in order to reconstruct the input phrase. At test time, the correct attention, i.e., the grounding, is evaluated. If grounding supervision is available it can be directly applied via a loss over the attention mechanism. We demonstrate the effectiveness of our approach on the Flickr 30k Entities and ReferItGame datasets with different levels of supervision, ranging from no supervision over partial supervision to full supervision. Our supervised variant improves by a large margin over the state-of-the-art on both datasets.

...read moreread less

346 citations

Proceedings Article•DOI•

Semi-Supervised Learning for Neural Machine Translation

[...]

Yong Cheng¹, Wei Xu², Zhongjun He³, Wei He³, Hua Wu³, Maosong Sun², Yang Liu² - Show less +3 more•Institutions (3)

Google¹, Tsinghua University², Baidu³

15 Jun 2016

TL;DR: This work proposes a semi-supervised approach for training NMT models on the concatenation of labeled and unlabeled monolingual corpora data, in which the source- to-target and target-to-source translation models serve as the encoder and decoder, respectively.

...read moreread less

Abstract: While end-to-end neural machine translation (NMT) has made remarkable progress recently, NMT systems only rely on parallel corpora for parameter estimation. Since parallel corpora are usually limited in quantity, quality, and coverage, especially for low-resource languages, it is appealing to exploit monolingual corpora to improve NMT. We propose a semi-supervised approach for training NMT models on the concatenation of labeled (parallel corpora) and unlabeled (monolingual corpora) data. The central idea is to reconstruct the monolingual corpora using an autoencoder, in which the sourceto-target and target-to-source translation models serve as the encoder and decoder, respectively. Our approach can not only exploit the monolingual corpora of the target language, but also of the source language. Experiments on the ChineseEnglish dataset show that our approach achieves significant improvements over state-of-the-art SMT and NMT systems.

...read moreread less

171 citations

Cites background from "Conditional Random Field Autoencode..."

...1 Likewise, given a monolingual corpus of source language S = {x(s)}Ss=1, it is natural to introduce a source autoencoder that aims at reconstructing 1Our definition of auotoencoders is inspired by Ammar et al. (2014)....
[...]
...Autoencoders and their variants have been widely used in unsupervised deep learning ((Vincent et al., 2010; Socher et al., 2011; Ammar et al., 2014), just to name a few)....
[...]

Proceedings Article•DOI•

Deep Neural Networks with Massive Learned Knowledge

[...]

Zhiting Hu¹, Zichao Yang¹, Ruslan Salakhutdinov¹, Eric P. Xing¹•Institutions (1)

Carnegie Mellon University¹

01 Nov 2016

TL;DR: A general framework is developed that enables learning knowledge and its confidence jointly with the DNNs, so that the vast amount of fuzzy knowledge can be incorporated and automatically optimized with little manual efforts.

...read moreread less

Abstract: Regulating deep neural networks (DNNs) with human structured knowledge has shown to be of great benefit for improved accuracy and interpretability. We develop a general framework that enables learning knowledge and its confidence jointly with the DNNs, so that the vast amount of fuzzy knowledge can be incorporated and automatically optimized with little manual efforts. We apply the framework to sentence sentiment analysis, augmenting a DNN with massive linguistic constraints on discourse and polarity structures. Our model substantially enhances the performance using less training data, and shows improved interpretability. The principled framework can also be applied to posterior regularization for regulating other statistical models.

...read moreread less

84 citations

Cites background from "Conditional Random Field Autoencode..."

...Though apparently similar to recent deep structured models such as neural-CRFs (Durrett and Klein, 2015; Ammar et al., 2014; Do et al., 2010), ours is different since we parsimoniously extract features that are necessary for precise and efficient knowledge expression, as opposed to neural-CRFs that learn as rich representations as possible for final prediction....
[...]
...Though apparently similar to recent deep structured models such as neural-CRFs (Durrett and Klein, 2015; Ammar et al., 2014; Do et al., 2010), ours is different since we parsimoniously extract features that are necessary for precise and efficient knowledge expression, as opposed to neural-CRFs that…...
[...]

Proceedings Article•DOI•

Computer network intrusion detection using sequential LSTM Neural Networks autoencoders

[...]

Ali H. Mirza¹, Selin Cosan¹•Institutions (1)

Bilkent University¹

02 May 2018

TL;DR: This paper exploits the dimensionality reduction and feature extraction property of the autoencoder framework to efficiently carry out the reconstruction process and uses the LSTM networks to handle the sequential nature of the computer network data.

...read moreread less

Abstract: In this paper, we introduce a sequential autoencoder framework using long short term memory (LSTM) neural network for computer network intrusion detection. We exploit the dimensionality reduction and feature extraction property of the autoencoder framework to efficiently carry out the reconstruction process. Furthermore, we use the LSTM networks to handle the sequential nature of the computer network data. We assign a threshold value based on cross-validation in order to classify whether the incoming network data sequence is anomalous or not. Moreover, the proposed framework can work on both fixed and variable length data sequence and works efficiently for unforeseen and unpredictable network attacks. We then also use the unsupervised version of the LSTM, GRU, Bi-LSTM and Neural Networks. Through a comprehensive set of experiments, we demonstrate that our proposed sequential intrusion detection framework performs well and is dynamic, robust and scalable.

...read moreread less

78 citations

Cites background from "Conditional Random Field Autoencode..."

...similar in that they are both sequential variants of the standard autoencoder [23]....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Maximum likelihood from incomplete data via the EM algorithm

[...]

Arthur P. Dempster¹, Nan M. Laird¹, Donald B. Rubin¹•Institutions (1)

Harvard University¹

01 Sep 1977-Journal of the royal statistical society series b-methodological

49,597 citations

Proceedings Article•DOI•

Bleu: a Method for Automatic Evaluation of Machine Translation

[...]

Kishore Papineni¹, Salim Roukos¹, Todd Ward¹, Wei-Jing Zhu¹•Institutions (1)

IBM¹

06 Jul 2002

TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.

...read moreread less

Abstract: Human evaluations of machine translation are extensive but expensive. Human evaluations can take months to finish and involve human labor that can not be reused. We propose a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run. We present this method as an automated understudy to skilled human judges which substitutes for them when there is need for quick or frequent evaluations.

...read moreread less

21,126 citations

Proceedings Article•

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

[...]

John Lafferty¹, Andrew McCallum, Fernando Pereira•Institutions (1)

Carnegie Mellon University¹

28 Jun 2001

TL;DR: This work presents iterative parameter estimation algorithms for conditional random fields and compares the performance of the resulting models to HMMs and MEMMs on synthetic and natural-language data.

...read moreread less

Abstract: We present conditional random fields , a framework for building probabilistic models to segment and label sequence data. Conditional random fields offer several advantages over hidden Markov models and stochastic grammars for such tasks, including the ability to relax strong independence assumptions made in those models. Conditional random fields also avoid a fundamental limitation of maximum entropy Markov models (MEMMs) and other discriminative Markov models based on directed graphical models, which can be biased towards states with few successor states. We present iterative parameter estimation algorithms for conditional random fields and compare the performance of the resulting models to HMMs and MEMMs on synthetic and natural-language data.

...read moreread less

13,190 citations

"Conditional Random Field Autoencode..." refers methods in this paper

...Conditional random fields [24] are used to model structure in numerous problem domains, including natural language processing (NLP), computational biology, and computer vision....
[...]

Probabilistic Models for Segmenting and Labeling Sequence Data

[...]

John Lafferty, Andrew McCallum, Fernando Pereira, Kevin Duh

01 Jan 2005

11,364 citations

Proceedings Article•

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization.

[...]

John C. Duchi¹, Elad Hazan², Yoram Singer³•Institutions (3)

University of California, Berkeley¹, IBM², Google³

01 Jan 2010

TL;DR: Adaptive subgradient methods as discussed by the authors dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning, which allows us to find needles in haystacks in the form of very predictive but rarely seen features.

...read moreread less

Abstract: We present a new family of subgradient methods that dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning. Metaphorically, the adaptation allows us to find needles in haystacks in the form of very predictive but rarely seen features. Our paradigm stems from recent advances in stochastic optimization and online learning which employ proximal functions to control the gradient steps of the algorithm. We describe and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal function that can be chosen in hindsight. We give several efficient algorithms for empirical risk minimization problems with common and important regularization functions and domain constraints. We experimentally study our theoretical analysis and show that adaptive subgradient methods outperform state-of-the-art, yet non-adaptive, subgradient algorithms.

...read moreread less

7,244 citations